Identify neighboring SNPs located on individual reads
2
0
Entering edit mode
5.1 years ago

Hi everyone,

I am working on an amplicon sequencing project were we sequence bacterial populations.

While the approach works great for the identification of the various mutations occurring in different populations, the method is not able to provide any information about the co-evolution of variants withing a single gene/genome, because it is impossible to say whether two mutations are located on one individual chromosome.

One way to collect at least some info on whether SNPs evolve/travel together would be to check if they are present on a single read, this would be very useful for e.g. mutational hot-spots.

Do you have any idea how I could identify SNPs that are supported by single reads (within a bam file)? Ideally by using a vcf file to zoom into the interesting locations?

I came as far as extracting all reads from a bam that map to the coordinates specified in a vcf. However, the result is still pretty messy and to figure out if there are reads that may span two or more SNP locations would be a week of manual work.

Any suggestions are appreciated! Thank you!

SNP alignment sequencing • 855 views
ADD COMMENT
2
Entering edit mode
5.1 years ago

The word you are looking for is phasing. I'm not sure if there are things I overlook for bacterial genetics, but WhatsHap works very well for human applications.

ADD COMMENT
1
Entering edit mode

Great, thank you! I did some tweaking and ran tests on a small MiSeq sample with known SNPs, the tool does exactly what it's supposed to!

ADD REPLY
1
Entering edit mode
5.1 years ago

call the bam with GATK haplotype caller , the phased haplotypes (reads on the same strand) will be in the FORMAT/PG field

ADD COMMENT

Login before adding your answer.

Traffic: 2608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6