How to remove duplicate SNP IDs from VCF files?
1
0
Entering edit mode
3.2 years ago

Hello, I have a VCF file containing ~900K SNPs. There are some tri-allelic SNPs in the dataset. So, I used bcftools to split those tri-allelic SNPs into biallelic SNPs. Then I tried to remove the duplicated SNP IDs that were generated after splitting the tri-allelic SNPs. I used bcftools as well but the code is not removing the Duplicated SNP IDs. I used the following command for splitting the tri-allelic SNPs:

bcftools norm -m-any --output output.vcf input.vcf

And for removing the duplicate SNPs, I used the following code:

bcftools norm --remove-duplicates --output output.vcf input.vcf

Can anyone please suggest to resolve this issue?

SNP genome sequence • 2.0k views
ADD COMMENT
0
Entering edit mode

I believe you have to tell bcftools to remove snps inside the argument. Can you plz try this: bcftools norm --rm-dup snps --output output.vcf input.vcf Just made a test here and it worked. I'm using bcftools V 1.10.2

ADD REPLY
0
Entering edit mode

I have tried this withs snps/all. But still showing the same output.

ADD REPLY
0
Entering edit mode
3.2 years ago

You can handle these in different ways:

Kevin

ADD COMMENT
0
Entering edit mode

I saw this post and used the --remove-duplicates command. But it is not working at all.

ADD REPLY

Login before adding your answer.

Traffic: 1523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6