So recently our lab had a batch of different strains of a bacteria whole genome sequenced. We got the short read sequences back from the sequencing facility, and had bit of a surprise. After selecting a random 10k base pairs from each sample and BLASTing them, half of the samples were more similar to a different species of bacteria than the species we had expected to get back.
So I'm thinking, maybe during our DNA extraction process we screwed up by accidentally contaminating half the samples with one individual of the new surprising bacteria species, and then that surprise species got sequenced instead of the one we hoped for. I'm thinking that a good way to check this would be to see if any of these surprising samples are identical in terms of sequence to each other. If so, that would suggest that a single contaminant made its way across several sample tubes we sent to the NGS facility.
What's a good way to check to see if whether all these samples of the suspicious bacterial species identity are completely identical to each other? Each sample is in FASTQ reverse and forward end and paired end 150 bp reads each. Would I have to assemble each genome and then align them against each other using BWA-MEM or Bowtie2 in a pairwise fashion, or is there a faster way?
I would appreciate any input you have. Thank you.