"featureCounts" of "subRead" does not work with ensembl.gft
1
0
Entering edit mode
7.6 years ago
alakatos • 0

Hello,

I am analyzing an SE (50) RNAseq dataset (Illumina) for the first time. I used STAR for read alignment. I generated genome indices by STAR using GRCh38_r85.all.fa and Homo_sapiens.GRCh38.85.gtf downloaded from Ensembl (e.g. wget ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz).

$ head -6 Homo_sapiens.GRCh38.85.gtf 
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1   havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; etc

The alignment went smoothly and the result looked OK.

 Started job on |   Sep 20 09:49:47
 Started mapping on |   Sep 20 09:50:27
 Finished on |  Sep 20 10:09:30
 Mapping speed, Million of reads per hour | 88.83
 Number of input reads |    28202438
 Average input read length |    51
 UNIQUE READS:
 Uniquely mapped reads number | 23282219
 Uniquely mapped reads % |  82.55%
 Average mapped length |    50.75
 Number of splices: Total | 1764086
 Number of splices: Annotated (sjdb) |  1747703
 Number of splices: GT/AG | 1745945
 Number of splices: GC/AG | 13045  etc

Then, I wanted to quantify the reads per gene by "featureCounts"

subread-1.5.0-p3-Linux-x86_64/bin/featureCounts -a Homo_sapiens.GRCh38.85.gtf -g gene_id -o counts.txt Aligned.sortedByCoord.out.bam

I did not get any error message but the output file contains only 0.

$ head -10 counts.txt
Geneid  Chr Start   End Strand  Length  Aligned.sortedByCoord.out.bam
ENSG00000223972 1;1;1;1 11869;12613;12975;13221 12227;12721;13052;14409 +;+;+;+ 1735    0
ENSG00000227232 1;1;1;1;1;1;1;1;1;1;1   14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534 14501;15038;15947;16765;17055;17368;17742;18061;18366;24891;29570 -;-;-;-;-;-;-;-;-;-;-   1351    0
ENSG00000278267 1   17369   17436   -   68  0
ENSG00000243485 1;1;1   29554;30267;30976   30039;30667;31109   +;+;+   1021    0
ENSG00000237613 1;1;1   34554;35245;35721   35174;35481;36081   -;-;-   1219    0
ENSG00000268020 1   52473   53312   +   840 0
ENSG00000240361 1   62948   63887   +   940 0
ENSG00000186092 1   69091   70008   +   918 0

 $ head -10 counts.txt.summary 
Status  Aligned.sortedByCoord.out.bam
Assigned    0
Unassigned_Ambiguity    0
Unassigned_MultiMapping 0
Unassigned_NoFeatures   0
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_FragmentLength   0
Unassigned_Chimera  0
Unassigned_Secondary

I am not sure what caused this "hiccup". Could the GTF file be an issue? It worked for STAR.

Your advice and help are highly appreciated.

Thanks, Anita

RNA-Seq subRead ensembl gtf • 3.3k views
ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thank you for your help. Anita

ADD REPLY

Login before adding your answer.

Traffic: 2794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6