Traffic: 205 ip/hr
Question: tools for calculating LD for NGS genomic data and generating LD decay plot
 
1
 
 

Dear all in BioStar,

I have benefit much from your kind helps and directions. Thanks a lot. Here, I still want to get much more from you.

I have my NGS population genomic data (haplotypic data) in VCF format. I just took advantage of the excellent functionalities of VCFtools (option, --hap-r2) to calculate LD of pairwise SNPs. But, it is somewhat slow, usually one week for one chromosome.

(1) Here I would like to hearing your opinion on selecting a right tools for such LD (pairwise r-square) calculation. Other tools or efficient ways (2) And also, your advice or experience on generating LD decay plot are appreciated.

Thanks a lot for your helps in advance.

Best,

#

My objective is to estimate the decay of LD, by resampling a starting point 10,000 times on a chromosome (here chromosome 1) for all the individuals in a VCF file (mydata.vcf.gz). My data are haplotypic data (phased). The pariwise haplotypic r2 need to be calculated for every pair of SNPs within 25-kb of this starting point.

log in to commentrevisions • 1 bookmark • permalink similar posts • request help via email
 
1

Since you expect LD to decay within the 25Kb window, you don't really need to calculate r2, say, two distant SNPs on two ends of the chromosome. How about splitting it up into small chunks?

log in to reply • written 16 months ago by Haibao Tang  2,510210

2 answers

 
2
 
 

For LD calculations you may use PLINK, See the LD calculations section here.

To generate LD decay plot, you can use extended haplotype homozygozity (EHH) approach, See: EHH calculator here. Manuscript is available here.

 

Dear Khader, Thanks a lot for your kind advice. I will try PLINK, test if it may be more faster than functionalities in VCFtools.

LD decay plot in EHH you pointed is not what I want. I want to estimate the decay of LD in a 25000kb genome interval, by resampling a starting point 10,000 times on a chromosome (here chromosome 1), and then make a non-linear regression between the r-squares and genomic distance. So, do you have any more advice to me?

log in to reply • written 23 months ago by Jianfengmao  2108
 

Not sure if you will see a dramatic change in speed, please give a feedback here on your experience using PLINK. I dont have experience on other methods for decay calculation, but I know there are methods based on Bayesian approach (for example: see http://www.ncbi.nlm.nih.gov/pubmed/16826521 and http://www.ncbi.nlm.nih.gov/pubmed/17563311)

log in to reply • written 23 months ago by Khader Shameer  13,41011140
 
 
0
 
 

Hi There,

Don't really have experience calculating LD neither ploting it but perhaps this link in R-bloggers can help you. The author poste some R code for estimating the decay of LD according to Hill and Weir,

http://www.r-bloggers.com/estimate-decay-of-linkage-disequilibrium-with-distance/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29

Hopefully it's of help, Regards,

J.Rodrigo

 
Log in to add a post