Question: Phylogenetic tree construction from mitochondrial SNPs
gravatar for vasilislenis
5.6 years ago by
United Kingdom
vasilislenis130 wrote:

Hello everybody,

I am trying to generate the phylogenetic tree of 9 sheep breeds based on the snps of their mitochodrial genomes.

The steps that I follow are:

  1. Calculation of the snps of all the individuals with three different tools: samtools, freebayes, gatk.
  2. Calculation of the intersection of the snps for each individual.
  3. Merging them all in one vcf file in order to generate a pedigree file (PED) with all the genotypes (I used vcftools for the merging and PDSpider for the vcf to ped conversion).
  4. Distant matrix generation with PLINK.

In that point I am getting the following error from PLINK:

ERROR: No nonmissing markers for individuals

Unfortunately I cannot understand this error and obviously I cannot find a way to fix it.

I would appreciate your help with that or some other suggestion about the steps that I follow.


Thank you very much in advance,




ADD COMMENTlink modified 5.6 years ago by geek_y11k • written 5.6 years ago by vasilislenis130
gravatar for Brice Sarver
5.6 years ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

As the error suggests, you cannot have any missing data for any individuals when you try to make a distance matrix in Plink. When you do variant calling, try outputting all sites. This way, all sites will be reported even if they are not variant with respect to the reference.

Alternatively (and this is my recommendation), get a sequence for each individual (in GATK, use FastaAlternateReferenceMaker or another tool) and used model-based phylogenetics: fit a model, estimate a tree under that model, get your nodal support values. Once you have a fasta for each of the 9 breeds, you can use more sophisticated approaches than just making a distance tree.

Hope this helps.

ADD COMMENTlink written 5.6 years ago by Brice Sarver3.5k

Thank you very much! I see your point to use the reference as a backbone and generate the sequences by using the variants.

But, since I am quite new in phylogenetic field, could you give me a little bit support  with the model-based phylogenetics?

I mean if you have the time to guide me with the steps that I should follow, or/and to recommend me some tools, that would be GREAT!

Thank you very much for one more time.


ADD REPLYlink written 5.6 years ago by vasilislenis130
gravatar for geek_y
5.6 years ago by
geek_y11k wrote:

As brice.sarver said, couple of options are using GATK FastaAlternateReferenceMaker. The other option is use vcftools and remove SNPs that are missing in any of the samples. It has some option like --max-missing. Then use PLINK.

ADD COMMENTlink written 5.6 years ago by geek_y11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 849 users visited in the last hour