Question

How To Call Variants (Snp, Indel, Sv) On A Bac Contig Aligned With Bwa Mem To A Reference?

2

Entering edit mode

10.9 years ago

William ★ 5.3k

I aligned a BAC contig (assembled from sanger sequences) to a reference genome using BWA-mem. The output alignments are very similar to the best end to end alignment I got from aligning the bac contig to the reference with Blat.

The nice thing is that the output is in a Bam file (for visualization and parsing) and that inversions and translocations (of parts of the bac contig vs the reference ) are also supported by BWA-mem.

But how do I now interpret the alignments made by BWA-mem as SNP's, InDels and SV's that the BAC contig has versus the reference?

SNP's and InDels are kind of obvious to see in the data. But because the alignment of the BAC contig is given as multiple separate alignments it is kind of hard to see what is going on SV wise.

I want to use the variants gathered from the BAC sequence to estimate a FN and FP rate for the same strain sequenced and variant called with short read data.

snp indel sv • 3.8k views

ADD COMMENT • link 10.9 years ago by William ★ 5.3k

0

Entering edit mode

I think that since the MEM method is pretty new there are few tools that handle this type of representation for the alignments.

ADD REPLY • link 10.9 years ago by Istvan Albert 100k

0

Entering edit mode

The output is a valid Bam file so maybe samtools is able to call / extract the snp's and indels. Otherwise I could parse them myself from the cigar strings and the fasta using Picard.

Also the hardclip information for the start and end of each separate alignment gives me the query sequence start and end site for each alignment. Extracting the SNP and InDels is probably the easy part but I am not sure yet how to handle the potential SV's.

ADD REPLY • link 10.9 years ago by William ★ 5.3k