Question

Could high duplication be from ribosomal RNA in RNA-seq samples?

1

Entering edit mode

7.8 years ago

mmrcksn ▴ 50

Hello,

I have some paired-end RNA-seq data. My samples were pretty low concentration (~1ng) total RNA from an isolated cell type. For library prep, we did a poly-A capture to select mRNA.

The FASTQC reports show pretty bad duplication (some are as bad as only 2% remaining after deduplication).

I did this command to look at some of the most dominant sequences in my fastqs:

grep -A 1 '@K00179' <sample.fastq>  | head -1000000 | grep -v '^@' | grep -v '^-' | sort | uniq -w 30 -c | sort -n -r | head -100 >> domseqs.100

and found many sequences that, when I searched with BLAST, match with stuff like this:

"Mus musculus clone contig 6 chromocenter region genomic sequence"

These chromocenter sequences also seem to match with rRNA, as further in the results there are things like: "Mus musculus 45S pre-ribosomal RNA (Rn45s), ribosomal RNA", "Mus musculus 28S ribosomal RNA (Rn28s1), ribosomal RNA"

Is it possible that even with polyA capture, rRNA slipped in? What exactly does it mean that I have a bunch of these "contig # chromocenter region genomic sequence" in my RNA-seq data?

RNA-Seq rRNA sequencing duplication • 2.6k views

ADD COMMENT • link updated 16 months ago by Carlo Yague 8.6k • written 7.8 years ago by mmrcksn ▴ 50

2

Entering edit mode

Yes, the poly-A capture allow you only to enrich in poly-A sequences : you don't get rid of all rRNAs and other non-poly-A RNAs. As you probably know, > 90 % of the transcriptome is composed of rRNAs so if you end up with 20% rRNAs or so after poly-A enrichment or ribodepletion, its already quite an improvement.

ADD REPLY • link 16 months ago by Carlo Yague 8.6k

0

Entering edit mode

Thanks for your reply! I am pretty new to all this so I just wanted to make sure.

ADD REPLY • link 7.8 years ago by mmrcksn ▴ 50