Potential Bacteria Contamination - RNA-Seq
Entering edit mode
3.6 years ago
VHahaut ★ 1.2k


I have received a series of human poly-A RNA-seq samples (single-end 75 bp) which display suspicious mapping values. These samples have been mapped with STAR and show +/- 30-50% of reads "unmapped: reads too short". Previous samples done with the same method had only between 5 and 10%.

Despite the sharp drops of uniquely mapping reads the sequencing worked well (many genes detected, mapping to exons, splicing visible, ...).

After careful inspection of the reads I start to suspect a bacterial contamination as:

  1. Many of the blasted reads are a perfect match with E. Coli or other prokaryotes.
  2. These are not ribosomal reads (evaluated with BBDuk).
  3. They do not appear to contain the primers / adapter sequences used in the library preparation.
  4. If I map these reads to a hybrid E. Coli 16S - h38 genome I get 10-100 times more reads mapping to this E. Coli genome in these new samples than in the old ones.

I would like to evaluate the proportion of reads coming from prokaryotes (E. Coli?) in these samples. As I am not familiar with the metagenomics field, I was wondering if someone could recommend a procedure to do so.

I am also open to other suggestions regarding the possible issues with these samples.

Thank you in advance!

RNA-Seq metagenomics • 1.8k views
Entering edit mode

try with fastqscreen. Index the E. coli genome, edit the configuration file. Fastqscreen prints our the contamination levels. Please increase the numbers of reads to be analyzed.

Entering edit mode
3.6 years ago
GenoMax 139k

Use bbsplit.sh from BBMap suite. It is meant to be used when you need to align data to multiple genomes (and bin reads) at the same time. See this page. You can decide what to do with reads that multi-map within and across genomes via ambiguous= and ambiguous2= options. Include refstats= option to get detailed stats.

Samples could have been contaminated depending on how/where they were collected and processed (i.e contamination present in sample/introduced in later steps). If the contamination levels are more or less similar you could still do the analysis as you already discovered. Especially if the samples are not replaceable easily.

Entering edit mode
3.3 years ago

Decotaminer tool investigates the origin of unmapped reads and assigns taxon.

DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2684-x



Login before adding your answer.

Traffic: 1413 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6