Linkage Disequilibrium Between Structural Variants (Sequence Data)
1
2
Entering edit mode
11.5 years ago
Abdel ▴ 410

Is there software available to calculate LD between structural variants from sequence data (preferably insertions and/or deletions)? Preferably for variants stored in .vcf format, but suggestions for other formats are also welcome!

If not, how would one go about computing those? (references to literature are welcome as well!)

My eventual goal is actually to prune the SV dataset for LD, so if you have anything useful to say about that, that's welcome as well. Many thanks!

ld next-gen • 3.8k views
ADD COMMENT
3
Entering edit mode

I'm assuming you are talking about small indels (<10bp or so)? You might take a look at this paper (and related ones). http://genome.cshlp.org/content/21/6/830.full

ADD REPLY
0
Entering edit mode

Yes, I do mean indels! Thanks for the paper!

ADD REPLY
0
Entering edit mode

I read the paper, and they talk about their results, but unfortunately there is no description of how they calculated LD... :-(

ADD REPLY
2
Entering edit mode
10.9 years ago
Ryan D ★ 3.4k

The answer is yes. We will download data from 1000 genomes, keep the SVs, and then do LD calculations. In this example I will do this for all SVs in 100kb of a known gene:

Download a VCF of the SVs, convert to plink format with VCFtools, then to bed/bim/fam format :

tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr22.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 22:38853527-39888783 > genotypes.vcf 
vcftools --vcf ~/genotypes.vcf --plink-tped --out plinkformat
plink --tfile plinkformat --make-bed --out tmp

Now pull out the SVs--here they are labeled with esv in the bim file:

cat tmp.bim | grep esv | cut -f2 > sv.txt

That gives me the following SVs

cat sv.txt
esv2672287
esv2660594
esv2665859
esv2666691
esv2658587
esv2670964
esv2676044
esv2673637
esv2676660
esv2667824

Extract these SVs from the plink file and keep the new bed bim and fam with SVs only in a new file:

plink --tfile plinkformat --extract tmp --make-bed --out sv

Now just run a check for LD and set your r2 threshold in the last argument

plink --bfile sv --r2  --ld-window-kb 1000 --ld-window 99999 --ld-window-r2 0.5

This gives the following result:

 CHR_A         BP_A        SNP_A  CHR_B         BP_B        SNP_B           R2
    22     39294101   esv2660594     22     39294104   esv2665859     0.986685
    22     39358280   esv2666691     22     39360536   esv2658587     0.892472
    22     39358280   esv2666691     22     39374101   esv2670964      0.81834
    22     39360536   esv2658587     22     39374101   esv2670964     0.789573
ADD COMMENT

Login before adding your answer.

Traffic: 4034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6