2.4 years ago
sararselitsky wrote:

I've assessed multiple RNA-seq data sets from human, some FFPE, some mRNA-seq, some clinical, some from cell culture, which were library prepped and sequenced at different facilitiies.

I've found that every single sample has reads mapping to the ampicillin resistance gene. I've BLASTed the reads that have mapped and found that they map perfectly to that gene and do not map to the human genome. Is there any technical reason why this would occur? Is this a common spike-in like phiX?

modified 2.4 years ago by Biostar ♦♦ 20 • written 2.4 years ago by sararselitsky10

Reagent/other contamination is a known issue. One paper comes to mind (there may be others. This is a sensitive topic for commercial as well as academic interests).

genomax wrote:

Thank you! The paper is helpful!

If this was from bacterial contamination, would you expect to see bacterial sequences in a polyA enriched prep? Don't bacteria only add a polyA to degrade RNA?

sararselitsky wrote:

Indeed, residual DNA in commercially available polymerases and other enzymes may be a cause of irritation, especially in diagnostics based on exponential amplification reactions like PCR. A no template control is indispensable.Whereas in a sequencing experiment, a few unmapped reads may not be something to worry about.

piet wrote:

Can you post the blast hit(s)? I'm guessing you're simply seeing bacteria either present in your samples or introduced via contamination.

pld wrote:

Here is a read found in multiple samples: GTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATAT The first BLAST hits are all for cloning vectors.

I am seeing general bacteria and cloning vectors as the hits for the reads that map to ampicillin, but in the FASTA I'm mapping to, I also included neomycin and puromycin resistance genes. Those genes are only present when expected. Is the ampicillin resistance gene more common in bacterial populations than the other two?

sararselitsky wrote:
