Hello Friends,
So recently our lab had a batch of different strains of a bacteria whole genome sequenced. We got the short read sequences back from the sequencing facility, and had bit of a surprise. After selecting a random 10k base pairs from each sample and BLASTing them, half of the samples were more similar to a different species of bacteria than the species we had expected to get back.
So I'm thinking, maybe during our DNA extraction process we screwed up by accidentally contaminating half the samples with one individual of the new surprising bacteria species, and then that surprise species got sequenced instead of the one we hoped for. I'm thinking that a good way to check this would be to see if any of these surprising samples are identical in terms of sequence to each other. If so, that would suggest that a single contaminant made its way across several sample tubes we sent to the NGS facility.
What's a good way to check to see if whether all these samples of the suspicious bacterial species identity are completely identical to each other? Each sample is in FASTQ reverse and forward end and paired end 150 bp reads each. Would I have to assemble each genome and then align them against each other using BWA-MEM or Bowtie2 in a pairwise fashion, or is there a faster way?
I would appreciate any input you have. Thank you.
You could use
BBsplit.sh
from BBMap suite with a selection of the genomes you expect to quickly try and bin your reads. You would get an idea of the problem at hand that way and cleaned data that you can use for downstream steps if you wish.Hi thanks for the suggestion, that looks interesting. In this case would I use a couple of my genomes as the references and see if any of the reads bin "heavily" towards one reference versus the other? Thanks.