From multiple VCF files to multiple sequence alignment?
0
0
Entering edit mode
8.0 years ago
Peter vH ▴ 130

Hi there

I have multiple VCF files generated from variant calling on sequenced bacteria (M. tuberculosis). I would like to create a multiple sequence alignment file (as a step towards computing a phylogeny of the samples) by combining the reference genome with the VCFs. Before I put time and effort into creating a script to do this, is there an existing solution? I see that workflows such as SNPhylo compute an alignment with MUSCLE before doing tree construction - I'm trying to avoid that step.

Thanks, Peter

alignment VCF bacterial • 5.2k views
ADD COMMENT
0
Entering edit mode

Please check this post. The comment by natasha provides a good solution

ADD REPLY
0
Entering edit mode

I'm not quite sure how? The tools suggested in those threads, vcf-consensus and FastaAlternateReferenceMaker in the other, produce a single FASTA output from a single VCF input and don't deal with gaps created when considering the alignment between sequences having insertions and deletions.

ADD REPLY
0
Entering edit mode

I have not used FastaAlternateReferenceMaker but iterated vcf-consensus -s <sample_name> to generate fasta file for each sample and then do the alignment. The new version also used IUPAC codes so that heterozygous genotypes can be encoded. Gaps are usually ignored in alignment so should not matter but I explicitly don't know how indels and rearrangements are handled by vcf-consensus.

ADD REPLY
0
Entering edit mode

So you'd do iterative vcf-consensus followed by MUSCLE? SNPhylo seems to do something like that. I'll experiment and compare it with the script I've written.

ADD REPLY
0
Entering edit mode

Yes. But it was a chloroplast genome and results were good. The advantage being no heterozygous as heteroplasmy was not detected.

ADD REPLY

Login before adding your answer.

Traffic: 868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6