Question

Dealing With Sequence Over Representation In Microrna Rna-Seq Data

0

Entering edit mode

11.2 years ago

Sudeep ★ 1.7k

Hi all,

I am working on a set of microRNA rna-seq data. One strange problem that we have noticed while checking the data quality with FastQC is that a large portion of the reads in all samples (roughly 40% to 60%) in all our samples are duplicates of just one read (it comes to around roughly 2-4 million reads in all samples). FastQC tags this sequence as a possible PCR primer. We tried to BLAST this sequence to miRBase (after removing the adapter), but couldn't find a matching microRNA. My colleagues are suggesting that this could be biological, but I am not convinced. So my questions are assuming that FastQC tagging of this read as a PCR primer is a false positive, could it be possible that one microRNA is dominant in all the sequenced samples? and how can we confirm whether it is biological or a problem during sequencing ?

Thank you

UPDATE:

We contacted the folks who sequenced our samples (done externally) with the problem I mentioned. After some checking (I don't know the details yet), they informed us that it was an error in library preparation/sequencing step, and agreed to re-sequence our samples. So, thank you all for taking interest.

rna-seq fastqc • 3.4k views

ADD COMMENT • link 11.2 years ago by Sudeep ★ 1.7k

1

Entering edit mode

also my own 2 cents - a life scientist is usually like Fox Mulder from the X-Files his motto was I want to believe. As a bioinformatician I feel I am Dana Scully who always skeptical.

ADD REPLY • link 11.2 years ago by Istvan Albert 100k

0

Entering edit mode

i just have to see this read

ADD REPLY • link 11.2 years ago by Jeremy Leipzig 22k

0

Entering edit mode

that right here, make a new question put your read there and here is a title for it: All my data looks alike. Help me decide: is it a new insight or just a bad run?

ADD REPLY • link 11.2 years ago by Istvan Albert 100k

score 1 · Answer 1 · 2013-02-07

As a first step, I would suggest to also perform a BLAST search of the NCBI nucleotide database in order to identify any other potential source of the sequence. I think there are several possible sources of "contamination" during the preparation of a smallRNA-Seq library (I have personally seen fragments of rRNA which were amplified during the first PCR amplification of the small RNAs which had been size fractionated prior to amplification). The fact that this one sequence so highly abundant indicates that it is a PCR artifact.

A further question is: Does FastQC identify the primer sequence? It should do so, as it has uses a list of oligos as reference to e.g. name the different Illumina adaptors and primers.

HTH

score 1 · Answer 2 · 2013-02-07

1

Entering edit mode

11.2 years ago

Jeremy Leipzig 22k

Don't forget to Google this sequence. I had an experience where MegaBLAST failed to identify a tRNA sequence with a post-processed end whereas Google found it mentioned in a paper.