Question: Could high duplication be from ribosomal RNA in RNA-seq samples?
gravatar for mmrcksn
20 months ago by
mmrcksn50 wrote:


I have some paired-end RNA-seq data. My samples were pretty low concentration (~1ng) total RNA from an isolated cell type. For library prep, we did a poly-A capture to select mRNA.

The FASTQC reports show pretty bad duplication (some are as bad as only 2% remaining after deduplication).

I did this command to look at some of the most dominant sequences in my fastqs:

grep -A 1 '@K00179' <sample.fastq>  | head -1000000 | grep -v '^@' | grep -v '^-' | sort | uniq -w 30 -c | sort -n -r | head -100 >> domseqs.100

and found many sequences that, when I searched with BLAST, match with stuff like this:

"Mus musculus clone contig 6 chromocenter region genomic sequence"

These chromocenter sequences also seem to match with rRNA, as further in the results there are things like: "Mus musculus 45S pre-ribosomal RNA (Rn45s), ribosomal RNA", "Mus musculus 28S ribosomal RNA (Rn28s1), ribosomal RNA"

Is it possible that even with polyA capture, rRNA slipped in? What exactly does it mean that I have a bunch of these "contig # chromocenter region genomic sequence" in my RNA-seq data?

ADD COMMENTlink modified 9 weeks ago by Biostar ♦♦ 20 • written 20 months ago by mmrcksn50

Yes, the poly-A capture allow you only to enrich in non poly-A sequences : you don't get rid of all rRNAs and other non-poly-A RNAs. As you probably know, > 90 % of the transcriptome is composed of rRNAs so if you end up with 20% rRNAs or so after poly-A enrichment or ribodepletion, its already quite an improvement.

ADD REPLYlink modified 20 months ago • written 20 months ago by Carlo Yague3.7k

Thanks for your reply! I am pretty new to all this so I just wanted to make sure.

ADD REPLYlink written 20 months ago by mmrcksn50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 874 users visited in the last hour