feature count at gene and transcript level
0
2
Entering edit mode
3.0 years ago
Prakash ★ 2.2k

Hi all,

I have aligned my RNAseq data to the reference genome and extracted raw count using featureCounts tool. I used two different GTF file from NCBI, one having only Refseq id as gene_id and other having gene symbol as gene_id.

Refseq as gene_id

chr11 mm10_refGene exon 100883834 100885169 0.000000 + . gene_id "NM_001164062"; transcript_id "NM_001164062";

Gene symbol as gene_id

chr11 stdin exon 100883834 100885169 . + . gene_id "Stat5a"; transcript_id "NM_001164062"; exon_number "18"; exon_id "NM_001164062.18"; gene_name "Stat5a";

Surprisingly when I looked into the raw count for above feature that I have highlighted is showing different count from featuteCounts output for these two different GTF file It would be really great if someone could help me understanding these different results or If I am missing something here.

gene_id Count

NM_001164062     0

NM_011488  11

Stat5a  14046


Here is the code that I used to extract the count

featureCounts --verbose -p -B -T 20 -a mm10_refseq.gtf -o Emp Emp_R1.sorted.bam


and IGV browser snapshot of the bam coverage at the highlighted features.

RNA-Seq • 1.5k views
0
Entering edit mode

There are probably many differences between your GTF files in addition to the one line you posted. Based on the formatting, they don't look like they are actually from NCBI.

Based on that screenshot, there might be something wrong with your GTF file. The exons should be connected like they are in the RefSeq track.

0
Entering edit mode

Actually both the gtf file were downloaded from UCSC table browser, First one is NCBI refseq track from table UCSC Refseq (refgene) downloaded as GTF file and second one downloaded as "all fields from selected table" and further converted to GTF format using genePredToGtf

I checked several features and they are having same number of count from both the gtf file but only few features I guess is behaving differently.

0
Entering edit mode

I am not sure how the table browser works in this particular case, but it's possible that it does not convert between formats properly. The initial table may not have all the necessary fields. I would use proper GTF files.

Also, I would modify the initial question to specify how you generated the GTF files. Although some of the contents may be from NCBI, the files are not.

0
Entering edit mode

What I understood that there is some issue with NCBI REFseq GTF file downloaded from UCSC table browser.

The second GTF file having gene symbol as gene_id i generated using following line of code

rsync -a -P rsync://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz ./
gzip -d refGene.txt.gz
cut -f 2- refGene.txt > refGene.input
genePredToGtf file refGene.input mm10refGene.gtf