Question: low mapping rate, finding possible source of contamination
0
gravatar for vaslanzadeh
23 months ago by
vaslanzadeh0 wrote:

Hello, I have received small RNA-seq data from a pathogenic bacteria for bioinformatic analysis. After trimming the adaptors with trim_galore, fastq file was mapped (almost 1 million reads) to the reference genome using bowtie and got overall 26% mapping rate (unique + multiple mapped). Mapping rate dose not change much even if I allow two mismatches. For negative control, reads were also mapped to mouse genome which again gives 25% mapping rate, similar to what I get when I align to the bacterial genome. Most of these mapped reads map to rRNA ans tRNAs, this is why mapping to bacterial genome and mouse genome gives similar results. Now, I do not know what are those 75% unmapped reads. It is possible that there was contamination(s) during library preparation, etc. How can I find the source of contamination? Is there a way to BLAST unmapped reads to find out which genome/strain they are probably coming from?

Thanks

blast rna-seq alignment genome • 849 views
ADD COMMENTlink modified 23 months ago • written 23 months ago by vaslanzadeh0

There are a lot of possibilities, with contamination being just one; others include incomplete adapter-trimming. Sometimes fastQC is helpful in this kind of situation (bowtie cannot map low-quality reads, for example); sometimes, using a different aligner helps, and sometimes BLASTing for contaminant organisms is useful. But for example, "After trimming the adaptors with trim_galore" is not informative - you need to describe the command used, the results, and perhaps the length distribution afterward.

ADD REPLYlink written 23 months ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour