shall intronic reads be considered in RNAseq count?
2
2
Entering edit mode
9.5 years ago
bcn_yaar ▴ 20

Hi,

I am counting gene expression from a RNA-seq data. Shall I consider the reads that are mapped to introns as well or just the exons only? What is general trend?

Thanks

rna-seq • 4.4k views
ADD COMMENT
0
Entering edit mode

I really don't know what to believe. For example, with single exon rRNA gene (no intron) (mt DNA) FBgn0013686, while multiBamCov from bedtools found as unusually high as 958085 reads, .... the HtSeq count (default option; Ensembl gtf file) counted only 1453 reads. These happened to few other genes as well!

Also, as example, for the gene FBgn0261504 (drosophila melanogaster), with TopHat version 1.2.0 mapping I got no read count (using bedtools multiBamCov) while with latest topHat2 version 2.0.13 I got unusually high as 561450. How such discrepancies possible, while I kept mapping parameters same!!

tophat --solexa-quals -m 2 -g 1 --butterfly-search -p 10 --GTF genes.gtf --no-novel-juncs --output-dir Result $BOWTIE_INDEXES/genome MT.fastq
tophat2 --solexa-quals -m 2 -g 1 -p 10 --GTF genes.gtf --no-novel-juncs --output-dir Result $BOWTIE_INDEXES/genome MT.fastq
ADD REPLY
0
Entering edit mode

htseq-count ignores multimappers, multiBamCov doesn't. There are MANY rRNA copies in the genome.

Regarding tophat, you're using a different aligner between tophat1 and tophat2.

ADD REPLY
4
Entering edit mode
9.5 years ago

Generally, no, you don't count intronic reads. You're interested in measuring mRNA, not pre-mRNA or mRNA + random intronic repeat region showing expression. There's also the genomic contamination issue (though really, that should add little to the overall counts).

Now if you're specifically interested in looking at pre-mRNA ->mRNA processing, then the case would be different.

ADD COMMENT
0
Entering edit mode

if I use the transcript annotation from Ensembl database, then the start and end span of the gene/transcript anyway covers the locations of introns. Now, usually in RNAseq we have exon reads..... as introns are spliced out. That was the motivation to use transcript/gene annotation file. But it is also quite common to see intronic read. So, if I want to use "multiBamCov" from bedtools, should I remove first intronic reads from my BAM files? Or, it is not a good approach to use Transcript/gene annotation file for counting, rather I should use Exon annotation file? - Thanks.

ADD REPLY
1
Entering edit mode

Why not make your life easier and (1) just download the GTF file from Ensembl and then (2) use featureCounts to get your counts?

ADD REPLY
0
Entering edit mode

OK, thanks, I will try this one as well.

I really don't know what to believe. For example, with single exon rRNA gene (no intron) (mt DNA) FBgn0013686, while multiBamCov" from bedtools found as unusually high as 958085 reads, .... the HtSeq count (default option; Ensembl gtf file) counted only 1453 reads. These happened to few other genes as well!!

ADD REPLY
0
Entering edit mode
9.5 years ago

I count reads in exons only as introns are spliced out. This is the default in htseq-count by the way.

ADD COMMENT
0
Entering edit mode

if I use the transcript annotation from Ensembl database, then the start and end span of the gene/transcript anyway covers the locations of introns. Now, usually in RNAseq we have exon reads..... as introns are spliced out. That was the motivation to use transcript/gene annotation file. But it is also quite common to see intronic read. So, if I want to use "multiBamCov" from bedtools, should I remove first intronic reads from my BAM files? Or, it is not a good approach to use Transcript/gene annotation file for counting, rather I should use Exon annotation file? - Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6