Question

Average Percentage Of Rna-Seq Reads Coming From Known Annotation (Ensembl, Refseq,...)

3

Entering edit mode

11.1 years ago

Nicolas Rosewick 11k

Hi,

I've a more general question about RNA-Seq data. So what is usually the average percentage of reads coming from known annotation.

Per example, with 2x50 bp strand-specific human data. After alignment (tophat, STAR,...) what is the percentage of reads that are mapping and what is the percentage of reads coming from known annotation (per example ensembl genes).

Thanks a lot,

N.

read annotation rnaseq • 5.4k views

ADD COMMENT • link updated 11.1 years ago by Mikael Huss 4.8k • written 11.1 years ago by Nicolas Rosewick 11k

score 2 · Answer 1 · 2013-03-19

2

Entering edit mode

11.1 years ago

Malachi Griffith 19k

Here are some examples of RNA-seq libraries generated and sequenced in various ways. The same human sample was used for all 8 approaches. Starting from either total RNA or polyA RNA. Using either Nugen Ovation V2 or the Encore kit for cDNA synthesis. And finally, subjecting the library to an exome capture or not. Yes, I know this is an unusual thing to do with an RNA-seq library. ;). All sequence data are paired 2x100 bp reads. The Encore libraries are strand specific and the Ovation are not. Alignments were by Tophat v2 with Bowtie v2.

The plot shows the proportion of reads that map to known coding regions, known UTR regions, intronic regions, etc. So for total reads mapping to known transcript annotations you would add the UTR and coding components. The annotations are from Ensembl.

RNA-seq read alignments broken down by gene compartments

ADD COMMENT • link 11.1 years ago by Malachi Griffith 19k

0

Entering edit mode

What did you use to visualize this? Excel? the colors are chosen very well

ADD REPLY • link 11.1 years ago by Ying W ★ 4.2k

0

Entering edit mode

Yeah, it was. Not a fan but in this case it did a decent job I guess.

ADD REPLY • link 11.1 years ago by Malachi Griffith 19k

score 0 · Answer 2 · 2013-03-18

0

Entering edit mode

11.1 years ago

Ashutosh Pandey 12k

I don't have a specific number and I don't think anyone can tell you this. If you can tell the organism you are working on and the tissue type, people may give you an idea then. The percentage of reads getting mapped to known annotation may depend on how well your organism is annotated and tissue you are studying. I have analysed mouse hippocampus data and though we think that mouse genome is well annotated but i found interesting novel transcripts getting expressed. Sometimes the library preparation method can go wrong and you may see lot of sequenced not getting mapped at all.

ADD COMMENT • link 11.1 years ago by Ashutosh Pandey 12k

0

Entering edit mode

it's human and it's T-cells. And it's total RNA, rRNA depletion with ribo-zero

ADD REPLY • link 11.1 years ago by Nicolas Rosewick 11k

score 0 · Answer 3 · 2013-03-19

We have used about 80% mapped and, of that, 80% "mRNA fraction" (coding, UTR) as a rough rule of thumb for poly-A selected mRNA in human and mouse tissue and cell lines. The vast majority experiments fall into 70-90% for both those metrics. That seems to fit Malachi's graph pretty well although it is much more comprehensive than this answer :-) The fraction would be expected to be much lower for rRNA depletion because you will observe a lot of of unannotated RNA species.