Question: Potential Bacteria Contamination - RNA-Seq
gravatar for VHahaut
14 days ago by
VHahaut1.1k wrote:


I have received a series of human poly-A RNA-seq samples (single-end 75 bp) which display suspicious mapping values. These samples have been mapped with STAR and show +/- 30-50% of reads "unmapped: reads too short". Previous samples done with the same method had only between 5 and 10%.

Despite the sharp drops of uniquely mapping reads the sequencing worked well (many genes detected, mapping to exons, splicing visible, ...).

After careful inspection of the reads I start to suspect a bacterial contamination as:

  1. Many of the blasted reads are a perfect match with E. Coli or other prokaryotes.
  2. These are not ribosomal reads (evaluated with BBDuk).
  3. They do not appear to contain the primers / adapter sequences used in the library preparation.
  4. If I map these reads to a hybrid E. Coli 16S - h38 genome I get 10-100 times more reads mapping to this E. Coli genome in these new samples than in the old ones.

I would like to evaluate the proportion of reads coming from prokaryotes (E. Coli?) in these samples. As I am not familiar with the metagenomics field, I was wondering if someone could recommend a procedure to do so.

I am also open to other suggestions regarding the possible issues with these samples.

Thank you in advance!

rna-seq metagenomics • 91 views
ADD COMMENTlink modified 14 days ago by genomax87k • written 14 days ago by VHahaut1.1k

try with fastqscreen. Index the E. coli genome, edit the configuration file. Fastqscreen prints our the contamination levels. Please increase the numbers of reads to be analyzed.

ADD REPLYlink written 14 days ago by cpad011213k
gravatar for genomax
14 days ago by
United States
genomax87k wrote:

Use from BBMap suite. It is meant to be used when you need to align data to multiple genomes (and bin reads) at the same time. See this page. You can decide what to do with reads that multi-map within and across genomes via ambiguous= and ambiguous2= options. Include refstats= option to get detailed stats.

Samples could have been contaminated depending on how/where they were collected and processed (i.e contamination present in sample/introduced in later steps). If the contamination levels are more or less similar you could still do the analysis as you already discovered. Especially if the samples are not replaceable easily.

ADD COMMENTlink modified 14 days ago • written 14 days ago by genomax87k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour