Question: Do I need to reconstruct haplotypes from SNP data to calculate nucleotide diversity?
gravatar for suzyhocking
2.5 years ago by
suzyhocking0 wrote:


I'm new to working with SNP data and I'm quite confused about how best to analyse what I have. I work with a haploid species

My SNP files are in text format:


chr1 5 A -

chr1 12 T G

etc... for about 400,000 SNPs and 20 samples. The reason I use this format is because I use customise scripts that do extra quality control and calculate the likely base at each position based on read depth, I have no option of doing this another way. SNPs are filtered at <10% missingness in the dataset

I want to work out genome-wise nucleotide diversity based on these SNPS. My questions are:

1) for nucleotide diversity (pi): do I need to reconstruct whole genome haplotypes for each sample by substituting each appropriate base of the reference with the alternative 'SNP base' for each sample?

2) If so - any suggestions on how to do this? I've found tools that work with VCF files but not the text files I have

3) Otherwise, can I calculate pi based only on SNP data? This doesn't seem like a valid method to me.

4) I can't seem to find a programme to find pi/theta that will work with text files - I can happily reformat them within a text format - but I can't convert them to VCF.

Any clarifications of advice would be very much welcomed! Thanks

ADD COMMENTlink written 2.5 years ago by suzyhocking0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1444 users visited in the last hour