Question: Average Percentage Of Rna-Seq Reads Coming From Known Annotation (Ensembl, Refseq,...)
3
gravatar for Nicolas Rosewick
7.3 years ago by
Belgium, Brussels
Nicolas Rosewick8.8k wrote:

Hi,

I've a more general question about RNA-Seq data. So what is usually the average percentage of reads coming from known annotation.

Per example, with 2x50 bp strand-specific human data. After alignment (tophat, STAR,...) what is the percentage of reads that are mapping and what is the percentage of reads coming from known annotation (per example ensembl genes).

Thanks a lot,

N.

rnaseq read annotation • 4.3k views
ADD COMMENTlink modified 7.3 years ago by Mikael Huss4.7k • written 7.3 years ago by Nicolas Rosewick8.8k
2
gravatar for Malachi Griffith
7.3 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith18k wrote:

Here are some examples of RNA-seq libraries generated and sequenced in various ways. The same human sample was used for all 8 approaches. Starting from either total RNA or polyA RNA. Using either Nugen Ovation V2 or the Encore kit for cDNA synthesis. And finally, subjecting the library to an exome capture or not. Yes, I know this is an unusual thing to do with an RNA-seq library. ;). All sequence data are paired 2x100 bp reads. The Encore libraries are strand specific and the Ovation are not. Alignments were by Tophat v2 with Bowtie v2.

The plot shows the proportion of reads that map to known coding regions, known UTR regions, intronic regions, etc. So for total reads mapping to known transcript annotations you would add the UTR and coding components. The annotations are from Ensembl.

RNA-seq read alignments broken down by gene compartments

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Malachi Griffith18k

What did you use to visualize this? Excel? the colors are chosen very well

ADD REPLYlink written 7.3 years ago by Ying W4.0k

Yeah, it was. Not a fan but in this case it did a decent job I guess.

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by Malachi Griffith18k
0
gravatar for Ashutosh Pandey
7.3 years ago by
Philadelphia
Ashutosh Pandey12k wrote:

I don't have a specific number and I don't think anyone can tell you this. If you can tell the organism you are working on and the tissue type, people may give you an idea then. The percentage of reads getting mapped to known annotation may depend on how well your organism is annotated and tissue you are studying. I have analysed mouse hippocampus data and though we think that mouse genome is well annotated but i found interesting novel transcripts getting expressed. Sometimes the library preparation method can go wrong and you may see lot of sequenced not getting mapped at all.

ADD COMMENTlink written 7.3 years ago by Ashutosh Pandey12k

it's human and it's T-cells. And it's total RNA, rRNA depletion with ribo-zero

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by Nicolas Rosewick8.8k
0
gravatar for Mikael Huss
7.3 years ago by
Mikael Huss4.7k
Stockholm
Mikael Huss4.7k wrote:

We have used about 80% mapped and, of that, 80% "mRNA fraction" (coding, UTR) as a rough rule of thumb for poly-A selected mRNA in human and mouse tissue and cell lines. The vast majority experiments fall into 70-90% for both those metrics. That seems to fit Malachi's graph pretty well although it is much more comprehensive than this answer :-) The fraction would be expected to be much lower for rRNA depletion because you will observe a lot of of unannotated RNA species.

ADD COMMENTlink written 7.3 years ago by Mikael Huss4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1531 users visited in the last hour