Question: linkage disequilibrium analysis
0
gravatar for tarek.mohamed
21 days ago by
tarek.mohamed120
tarek.mohamed120 wrote:

Hi All,

Using 1000 genomes database, I have downloaded genotype data for 99 individuals for couple of thousands of SNPs distributed across different chromosomes, I have this data in one vcf file. I want to perform linkage disequilibrium analysis between all of these SNPs, I need the r2 and the d' values as well.

What tool you recommend for such analysis?

Thanks Tarek

ld snps different chromosomes • 163 views
ADD COMMENTlink modified 21 days ago by Kevin Blighe9.0k • written 21 days ago by tarek.mohamed120
1
gravatar for Kevin Blighe
21 days ago by
Kevin Blighe9.0k
Europe/Americas
Kevin Blighe9.0k wrote:

Short way (quick)

If you have a VCF already, you can just use VCFtools in order to do a very simple linkage disequilibrium (LD) analysis: http://vcftools.sourceforge.net/documentation.html#ld


Long way (more flexibility and comprehensive)

Another, more roundabout approach would be to get your data from VCF to PLINK format, where you could do a more comprehensive analysis. You could have followed my tutorial (Produce PCA bi-plot for 1000 Genomes Phase III in VCF format ), which includes the downloading of all 1000 Genomes Phase III data in VCF format and then converting them into PLINK format.

Here is further information for conducting LD analysis in PLINK:

If you follow my tutorial, you'll have the entire 1000 Genomes Phase III samples in PLINK, and from there you can easily filter in/out your samples of interest. See here for details: https://www.cog-genomics.org/plink/1.9/filter

For using a dataset correctly in PLINK, you should create a custom FAM file that matches your dataset and then specify this when performing LD analysis with --fam MyCustom.fam. A FAM file contains 7 columns:

  1. Family ID (FID)
  2. Individual ID (IID)
  3. Paternal ID (PID)
  4. Maternal ID (MID)
  5. Gender (1, male; 2, female)
  6. Phenotype/Disease status (1, control; 2, case/disease)

The file ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_g1k.ped, which you'll get if you follow my tutorial, already contains this information, so, use that and filter out what you don't need.

Also, when reading your data from VCF/BCF into PLINK, it is critical that you specify a sample order file so that PLINK reads the samples in the order that you want and in the order that matches the samples as listed in your custom FAM. A sample command is:

plink --noweb --bcf My.bcf --keep-allele-order --indiv-sort file SampleSort.list --vcf-idspace-to _ --const-fid --allow-extra-chr 0 --split-x b37 no-fail --make-bed --out PlinkDataForLD

The file mentioned in this command after the --indiv-sort file command-line parameter, SampleSort.list, contains 2 columns, like this:

0  NA0165
0  NA0169
et cetera

Then, to do LD analysis in PLINK:

plink --file PlinkDataForLD --r2 --ld-window-kb 1000 --ld-window 100000 --ld-window-r2 0 --fam MyCustom.fam
ADD COMMENTlink modified 21 days ago • written 21 days ago by Kevin Blighe9.0k

Hi Kevin,

Thanks for your reply. Actually, I tried vcftools, but I got negative values for r2 which of course does not make sense! I am going try the long way approach, and I will let you know the updates.

Thanks, T

ADD REPLYlink written 21 days ago by tarek.mohamed120

Hi Tarek,

Okay, on reflection, you may not require the complex part of creating the custom FAM, considering that all of your samples will be 'healthy' 1000 Genomes samples. The LD analysis will just look at all samples in the dataset and not use information on phenotype, gender, etc.

In that case, you possibly just need to do this:

plink --noweb --vcf My.vcf --keep-allele-order --vcf-idspace-to _ --const-fid --allow-extra-chr 0 --split-x b37 no-fail --make-bed --out PlinkDataForLD

plink --file PlinkDataForLD --r2 --ld-window-kb 1000 --ld-window 100000 --ld-window-r2 0

PLINK is a very good and comprehensive analysis tool, though.

Respond here if you need help or want me to look at anything.

Kevin

ADD REPLYlink modified 21 days ago • written 21 days ago by Kevin Blighe9.0k

Sorry, if I have bed/bim/fam files from 16SrRNAs and a phenotypes file like screenshot, does that mean that I should correlate these files with phenotypes files as OTU?

Yes, this an OTU, I should first convert that to PLINK format.

ADD REPLYlink modified 20 days ago • written 21 days ago by Fereshteh2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour