Could high duplication be from ribosomal RNA in RNA-seq samples?
0
1
Entering edit mode
7.8 years ago
mmrcksn ▴ 50

Hello,

I have some paired-end RNA-seq data. My samples were pretty low concentration (~1ng) total RNA from an isolated cell type. For library prep, we did a poly-A capture to select mRNA.

The FASTQC reports show pretty bad duplication (some are as bad as only 2% remaining after deduplication).

I did this command to look at some of the most dominant sequences in my fastqs:

grep -A 1 '@K00179' <sample.fastq>  | head -1000000 | grep -v '^@' | grep -v '^-' | sort | uniq -w 30 -c | sort -n -r | head -100 >> domseqs.100

and found many sequences that, when I searched with BLAST, match with stuff like this:

"Mus musculus clone contig 6 chromocenter region genomic sequence"

These chromocenter sequences also seem to match with rRNA, as further in the results there are things like: "Mus musculus 45S pre-ribosomal RNA (Rn45s), ribosomal RNA", "Mus musculus 28S ribosomal RNA (Rn28s1), ribosomal RNA"

Is it possible that even with polyA capture, rRNA slipped in? What exactly does it mean that I have a bunch of these "contig # chromocenter region genomic sequence" in my RNA-seq data?

RNA-Seq rRNA sequencing duplication • 2.6k views
ADD COMMENT
2
Entering edit mode

Yes, the poly-A capture allow you only to enrich in poly-A sequences : you don't get rid of all rRNAs and other non-poly-A RNAs. As you probably know, > 90 % of the transcriptome is composed of rRNAs so if you end up with 20% rRNAs or so after poly-A enrichment or ribodepletion, its already quite an improvement.

ADD REPLY
0
Entering edit mode

Thanks for your reply! I am pretty new to all this so I just wanted to make sure.

ADD REPLY

Login before adding your answer.

Traffic: 2591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6