vcf variant selection
1
0
Entering edit mode
9 months ago

I have vcf files that I want to convert into .bed files with plink to use for proxy search. One issue I am having is that each variant id must be unique. In these vcf's, the multi-allelic variants are formatted as bi-allelic records. Here is an example :

tabix gnomad.genomes.v3.1.2.hgdp_tgp.chr6.vcf.bgz chr6:29440751-29440751 | cut -f 1-5
chr6    29440751    rs2074464   A   C
chr6    29440751    rs2074464   A   G
chr6    29440751    rs2074464   A   T

I know that with bcftools, you can simply keep the first occurrence of a variant with -d, but this is problematic for LD calculations. I would like to be able to ensure that the record that gets preserved has the highest allele frequencies of the all the records with that ID, not simply the first occurrence of the variant. This way I will have a better chance of having high r2 values when I calculate LD between this multi-allelic variant and another variant. Is this possible?

bcftools ld plink vcf • 533 views
ADD COMMENT
0
Entering edit mode

I think it can be done with a little work, but first an easier option - can you just collapse the biallelic sites to multiallelic sites, so that you have unique IDs? Does your downstream software support multiallelic sites?

You can collapse sites this way with bcftools, vcflib, or other tools, it's pretty standard.

If not, somebody may cook up more complex solution..

ADD REPLY
0
Entering edit mode
9 months ago
jena ▴ 290

Oh wait - do you have this problem with plink specifically? Because plink up to 1.9 can only handle biallelic sites and drops any multiallelic site or even sites with repeated indices IIRC.

But plink 2.0 handles both cases, or at most you may need to collapse to multiallelic sites, which plink 2 definitely handles.

ADD COMMENT

Login before adding your answer.

Traffic: 1753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6