Question: "featureCounts" of "subRead" does not work with ensembl.gft
0
gravatar for alakatos
3.2 years ago by
alakatos0
alakatos0 wrote:

Hello,

I am analyzing an SE (50) RNAseq dataset (Illumina) for the first time. I used STAR for read alignment. I generated genome indices by STAR using GRCh38_r85.all.fa and Homo_sapiens.GRCh38.85.gtf downloaded from Ensembl (e.g. wget ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz).

$ head -6 Homo_sapiens.GRCh38.85.gtf 
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1   havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; etc

The alignment went smoothly and the result looked OK.

 Started job on |   Sep 20 09:49:47
 Started mapping on |   Sep 20 09:50:27
 Finished on |  Sep 20 10:09:30
 Mapping speed, Million of reads per hour | 88.83
 Number of input reads |    28202438
 Average input read length |    51
 UNIQUE READS:
 Uniquely mapped reads number | 23282219
 Uniquely mapped reads % |  82.55%
 Average mapped length |    50.75
 Number of splices: Total | 1764086
 Number of splices: Annotated (sjdb) |  1747703
 Number of splices: GT/AG | 1745945
 Number of splices: GC/AG | 13045  etc

Then, I wanted to quantify the reads per gene by "featureCounts"

subread-1.5.0-p3-Linux-x86_64/bin/featureCounts -a Homo_sapiens.GRCh38.85.gtf -g gene_id -o counts.txt Aligned.sortedByCoord.out.bam

I did not get any error message but the output file contains only 0.

$ head -10 counts.txt
Geneid  Chr Start   End Strand  Length  Aligned.sortedByCoord.out.bam
ENSG00000223972 1;1;1;1 11869;12613;12975;13221 12227;12721;13052;14409 +;+;+;+ 1735    0
ENSG00000227232 1;1;1;1;1;1;1;1;1;1;1   14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534 14501;15038;15947;16765;17055;17368;17742;18061;18366;24891;29570 -;-;-;-;-;-;-;-;-;-;-   1351    0
ENSG00000278267 1   17369   17436   -   68  0
ENSG00000243485 1;1;1   29554;30267;30976   30039;30667;31109   +;+;+   1021    0
ENSG00000237613 1;1;1   34554;35245;35721   35174;35481;36081   -;-;-   1219    0
ENSG00000268020 1   52473   53312   +   840 0
ENSG00000240361 1   62948   63887   +   940 0
ENSG00000186092 1   69091   70008   +   918 0

 $ head -10 counts.txt.summary 
Status  Aligned.sortedByCoord.out.bam
Assigned    0
Unassigned_Ambiguity    0
Unassigned_MultiMapping 0
Unassigned_NoFeatures   0
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_FragmentLength   0
Unassigned_Chimera  0
Unassigned_Secondary

I am not sure what caused this "hiccup". Could the GTF file be an issue? It worked for STAR.

Your advice and help are highly appreciated.

Thanks, Anita

rna-seq subread ensembl gtf • 1.8k views
ADD COMMENTlink modified 3.2 years ago by GZ1995350 • written 3.2 years ago by alakatos0
0
gravatar for GZ1995
3.2 years ago by
GZ1995350
GZ1995350 wrote:

https://support.bioconductor.org/p/64636/

ADD COMMENTlink written 3.2 years ago by GZ1995350

Thank you for your help. Anita

ADD REPLYlink written 3.2 years ago by alakatos0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 880 users visited in the last hour