Question: Dealing With Sequence Over Representation In Microrna Rna-Seq Data
gravatar for Sudeep
7.4 years ago by
Sudeep1.6k wrote:

Hi all,

I am working on a set of microRNA rna-seq data. One strange problem that we have noticed while checking the data quality with FastQC is that a large portion of the reads in all samples (roughly 40% to 60%) in all our samples are duplicates of just one read (it comes to around roughly 2-4 million reads in all samples). FastQC tags this sequence as a possible PCR primer. We tried to BLAST this sequence to miRBase (after removing the adapter), but couldn't find a matching microRNA. My colleagues are suggesting that this could be biological, but I am not convinced. So my questions are assuming that FastQC tagging of this read as a PCR primer is a false positive, could it be possible that one microRNA is dominant in all the sequenced samples? and how can we confirm whether it is biological or a problem during sequencing ?

Thank you


We contacted the folks who sequenced our samples (done externally) with the problem I mentioned. After some checking (I don't know the details yet), they informed us that it was an error in library preparation/sequencing step, and agreed to re-sequence our samples. So, thank you all for taking interest.

rna-seq fastqc • 2.4k views
ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Sudeep1.6k

also my own 2 cents - a life scientist is usually like Fox Mulder from the X-Files his motto was I want to believe. As a bioinformatician I feel I am Dana Scully who always skeptical.

ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by Istvan Albert ♦♦ 84k

i just have to see this read

ADD REPLYlink written 7.4 years ago by Jeremy Leipzig19k

that right here, make a new question put your read there and here is a title for it: All my data looks alike. Help me decide: is it a new insight or just a bad run?

ADD REPLYlink written 7.4 years ago by Istvan Albert ♦♦ 84k
gravatar for Sebastian Kurscheid
7.4 years ago by
Australia, ACT, Canberra, ANU
Sebastian Kurscheid300 wrote:

As a first step, I would suggest to also perform a BLAST search of the NCBI nucleotide database in order to identify any other potential source of the sequence. I think there are several possible sources of "contamination" during the preparation of a smallRNA-Seq library (I have personally seen fragments of rRNA which were amplified during the first PCR amplification of the small RNAs which had been size fractionated prior to amplification). The fact that this one sequence so highly abundant indicates that it is a PCR artifact.

A further question is: Does FastQC identify the primer sequence? It should do so, as it has uses a list of oligos as reference to e.g. name the different Illumina adaptors and primers.


ADD COMMENTlink written 7.4 years ago by Sebastian Kurscheid300

Actually we did BLAST on NCBI nucleotide database, but the results were pretty un-conclusive, a lot of hits with very high e-values. And yes, without the adapters trimmed, FastQC identified the primer sequence, but when the adapters were trimmed it did not.

ADD REPLYlink written 7.4 years ago by Sudeep1.6k
gravatar for Jeremy Leipzig
7.4 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

Don't forget to Google this sequence. I had an experience where MegaBLAST failed to identify a tRNA sequence with a post-processed end whereas Google found it mentioned in a paper.

ADD COMMENTlink written 7.4 years ago by Jeremy Leipzig19k

Never thought of that, Thank you.

ADD REPLYlink written 7.4 years ago by Sudeep1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 890 users visited in the last hour