Question: Generating maximum likelihood trees from multi-sample VCF files
gravatar for rc16955
4 months ago by
rc1695560 wrote:

Hi all,

I have whole-genome sequencing data for 100 individuals of my study species and would like to construct a maximum likelihood tree of them. So far I've called SNPs using bcftools and have all samples in a single multi-fasta VCF file. Are you aware of any software that can take a multi-fasta VCF file as input and use it to build a maximum likelihood tree?

Previously, I have used single-sample VCF files to generate separate fasta files for each samples using vcf-consensus, and then aligned these with Mafft and made trees from alignments with RaxML. The problem with this is that it loses information about heterozygosity - vcf-consensus simply always uses the ALT allele and even if it did use IUPAC ambiguity codes for heterozygous sites, I don't think that RaxML can handle these.

For reference, there's about 170,000 SNP sites in a genome of about 41Mb. Within each sample generally about a third of sites are heterozygous.

Sorry in advance for any gaps in understanding revealed by this question, and sorry if this has been answered before (I did find a few similar questions but sadly these didn't have answers).

Thanks in advance!

ADD COMMENTlink written 4 months ago by rc1695560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour