Hi everyone,
I am working on an amplicon sequencing project were we sequence bacterial populations.
While the approach works great for the identification of the various mutations occurring in different populations, the method is not able to provide any information about the co-evolution of variants withing a single gene/genome, because it is impossible to say whether two mutations are located on one individual chromosome.
One way to collect at least some info on whether SNPs evolve/travel together would be to check if they are present on a single read, this would be very useful for e.g. mutational hot-spots.
Do you have any idea how I could identify SNPs that are supported by single reads (within a bam file)? Ideally by using a vcf file to zoom into the interesting locations?
I came as far as extracting all reads from a bam that map to the coordinates specified in a vcf. However, the result is still pretty messy and to figure out if there are reads that may span two or more SNP locations would be a week of manual work.
Any suggestions are appreciated! Thank you!
Great, thank you! I did some tweaking and ran tests on a small MiSeq sample with known SNPs, the tool does exactly what it's supposed to!