How to create a phylogenetic tree from 30 VCF files
1
0
Entering edit mode
2.0 years ago
lgspeight • 0

Hello,

I have 30 vcf files from 30 individuals that were created from RNA-Seq alignments on the mitochondria genome of the hard clam (Mercenaria mercenaria). I am trying to Identify SNPs and create haplogroups to look at the maternal relationship between all my individuals (30).

My questions are:

Do I merge all my vcf files together first? (bcftools merge). If so, how do I know which sample had which SNPS at which location?

Do I create haplotypes for each vcf/individual first? Then do I combine samples into one big dataset.

I have been using pegas on R, but there are so many commands I don't really know where to start.

In the end I want a phylogenetic tree that shows the relationship among my sample/indivudals from the mitochondrial SNPs. What program or package to use here? What does the structure of my data need to look like to create a tree?

I have access to R, an HPC system (but installing knew software can be a pain), and MEGA. I am also working on a Mac.

SNPS VCF Phlyogentic Haplogroups trees • 1.6k views
ADD COMMENT
0
Entering edit mode

Hi

Once you have merged the VCF files using BCFtools, you can simply use the online program VCF2Poptree (https://github.com/sansubs/vcf2pop) published in PEER J (https://peerj.com/articles/8213/). It is pretty stright forward and very simple to use by clicking a few options and submit.

ADD REPLY
2
Entering edit mode
2.0 years ago
liorglic ★ 1.4k

To answer some of your questions:

Do I merge all my vcf files together first? (bcftools merge). If so, how do I know which sample had which SNPS at which location?

Yes, use bcftools merge. You don't need to know anything - bcftools will take care of everything for you. Sample names from each VCF will be kept in the merged VCF. You may want to filter your merged VCF to get rid of low quality variants or ones with low MAF.

Do I create haplotypes for each vcf/individual first? Then do I combine samples into one big dataset.

I'm not sure I understand why you need the haplotypes. This is not usually part of a phylogenetic analysis, AFAIK. If you need it for something else, then this is a separate analysis.

I am not familiar with the R pegas package, however there are several ways to infer the phylogeny from a VCF. For example, the software IQ-TREE can do it - see the relevant section here and this useful tool. Another option is the SNPhylo pipeline, and there are more sophisticated ways as well.

ADD COMMENT
1
Entering edit mode

You don't need to know anything - bcftools will take care of everything for you.

Until you run into downstream operations that are ALT allele specific and your samples' ALT alleles are not a perfect overlap.

Use vt to decompose and normalize variants before using bcftools merge.

ADD REPLY

Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6