Phylogenetic tree construction from mitochondrial SNPs
2
1
Entering edit mode
9.2 years ago
vasilislenis ▴ 150

Hello everybody,

I am trying to generate the phylogenetic tree of 9 sheep breeds based on the snps of their mitochodrial genomes.

The steps that I follow are:

  1. Calculation of the snps of all the individuals with three different tools: samtools, freebayes, gatk.
  2. Calculation of the intersection of the snps for each individual.
  3. Merging them all in one vcf file in order to generate a pedigree file (PED) with all the genotypes (I used vcftools for the merging and PDSpider for the vcf to ped conversion).
  4. Distant matrix generation with PLINK.

In that point I am getting the following error from PLINK:

ERROR: No nonmissing markers for individuals

Unfortunately I cannot understand this error and obviously I cannot find a way to fix it.

I would appreciate your help with that or some other suggestion about the steps that I follow.

Thank you very much in advance,

Vasilis

SNP phylogenetic-tree mitochondrial-genome • 2.8k views
ADD COMMENT
2
Entering edit mode
9.2 years ago
Brice Sarver ★ 3.8k

As the error suggests, you cannot have any missing data for any individuals when you try to make a distance matrix in Plink. When you do variant calling, try outputting all sites. This way, all sites will be reported even if they are not variant with respect to the reference.

Alternatively (and this is my recommendation), get a sequence for each individual (in GATK, use FastaAlternateReferenceMaker or another tool) and used model-based phylogenetics: fit a model, estimate a tree under that model, get your nodal support values. Once you have a fasta for each of the 9 breeds, you can use more sophisticated approaches than just making a distance tree.

Hope this helps.

ADD COMMENT
0
Entering edit mode

Thank you very much! I see your point to use the reference as a backbone and generate the sequences by using the variants.

But, since I am quite new in phylogenetic field, could you give me a little bit support with the model-based phylogenetics?

I mean if you have the time to guide me with the steps that I should follow, or/and to recommend me some tools, that would be GREAT!

Thank you very much for one more time.

ADD REPLY
1
Entering edit mode
9.2 years ago

As brice.sarver said, couple of options are using GATK FastaAlternateReferenceMaker. The other option is use vcftools and remove SNPs that are missing in any of the samples. It has some option like --max-missing. Then use PLINK.

ADD COMMENT

Login before adding your answer.

Traffic: 1914 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6