Question: low mapping rate, finding possible source of contamination
gravatar for vaslanzadeh
2.5 years ago by
vaslanzadeh0 wrote:

Hello, I have received small RNA-seq data from a pathogenic bacteria for bioinformatic analysis. After trimming the adaptors with trim_galore, fastq file was mapped (almost 1 million reads) to the reference genome using bowtie and got overall 26% mapping rate (unique + multiple mapped). Mapping rate dose not change much even if I allow two mismatches. For negative control, reads were also mapped to mouse genome which again gives 25% mapping rate, similar to what I get when I align to the bacterial genome. Most of these mapped reads map to rRNA ans tRNAs, this is why mapping to bacterial genome and mouse genome gives similar results. Now, I do not know what are those 75% unmapped reads. It is possible that there was contamination(s) during library preparation, etc. How can I find the source of contamination? Is there a way to BLAST unmapped reads to find out which genome/strain they are probably coming from?


blast rna-seq alignment genome • 1.0k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by vaslanzadeh0

There are a lot of possibilities, with contamination being just one; others include incomplete adapter-trimming. Sometimes fastQC is helpful in this kind of situation (bowtie cannot map low-quality reads, for example); sometimes, using a different aligner helps, and sometimes BLASTing for contaminant organisms is useful. But for example, "After trimming the adaptors with trim_galore" is not informative - you need to describe the command used, the results, and perhaps the length distribution afterward.

ADD REPLYlink written 2.5 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2157 users visited in the last hour