Question: Generating maximum likelihood trees from multi-sample VCF files
gravatar for rc16955
23 months ago by
rc1695570 wrote:

Hi all,

I have whole-genome sequencing data for 100 individuals of my study species and would like to construct a maximum likelihood tree of them. So far I've called SNPs using bcftools and have all samples in a single multi-fasta VCF file. Are you aware of any software that can take a multi-fasta VCF file as input and use it to build a maximum likelihood tree?

Previously, I have used single-sample VCF files to generate separate fasta files for each samples using vcf-consensus, and then aligned these with Mafft and made trees from alignments with RaxML. The problem with this is that it loses information about heterozygosity - vcf-consensus simply always uses the ALT allele and even if it did use IUPAC ambiguity codes for heterozygous sites, I don't think that RaxML can handle these.

For reference, there's about 170,000 SNP sites in a genome of about 41Mb. Within each sample generally about a third of sites are heterozygous.

Sorry in advance for any gaps in understanding revealed by this question, and sorry if this has been answered before (I did find a few similar questions but sadly these didn't have answers).

Thanks in advance!

ADD COMMENTlink written 23 months ago by rc1695570
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1921 users visited in the last hour