Question: how to remove duplicate SNP rows in vcf using bcftools norm
0
gravatar for evelyn
12 months ago by
evelyn130
evelyn130 wrote:

Hello,

I am trying to remove duplicate SNP rows from a multiple sample vcf file. SNPs have different positions but multiple duplicate rows. I tried using

bcftools norm -d in.vcf -o out.vcf

but it does not work. Is there any other way to remove duplicates from vcf file that does not change the file format. Thank you!

snp • 1.0k views
ADD COMMENTlink modified 1 day ago by brianaloredana0 • written 12 months ago by evelyn130
0
gravatar for finswimmer
12 months ago by
finswimmer14k
Germany
finswimmer14k wrote:

Shouldn't it be:

bcftools norm -D in.vcf -o out.vcf

(uppercase D) ?

ADD COMMENTlink written 12 months ago by finswimmer14k

@finswimmer, thank you! I have tried D as well but it just results in the same as input file without removing duplicates.

ADD REPLYlink written 12 months ago by evelyn130
2

Hm, do you have an example of your vcf file?

This works for me:

##fileformat=VCFv4.2
##contig=<ID=chr1,length=249250621>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1
chr1    977330  rs2799066   T   C   225 PASS    .   GT  0/1
chr1    977330  rs2799066   T   C   225 PASS    .   GT  0/1
$ bcftools norm -D in.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=249250621>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_normVersion=1.10.1+htslib-1.10.2
##bcftools_normCommand=norm -D 1.vcf; Date=Fri Feb  7 21:22:54 2020
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1
chr1    977330  rs2799066   T   C   225 PASS    .   GT  0/1
Lines   total/split/realigned/skipped:  2/0/0/0
ADD REPLYlink written 12 months ago by finswimmer14k

Thank you! I will check again.

ADD REPLYlink written 12 months ago by evelyn130
0
gravatar for brianaloredana
1 day ago by
brianaloredana0 wrote:

That's also pretty strange for me! Neither bcftools norm nor bcftools concat did not remove the duplicates from my vcf file.

That's why I applied to another solution.

grep "#" myfile.vcf > header                   ## here you separate the header of
                                               ## your vcf file
grep -v "#" myfile.vcf | sort | uniq >> header ## here firstly you separate the vcf file 
                                               ## apart the header part, then sort it
                                               ## and remove the duplicates by using
                                               ## uniq command. Lastly you pass the
                                               ## output to the header.

I checked the file if it is still compatible to work with bcftools. Yes! It is!

ADD COMMENTlink modified 1 day ago by Ram32k • written 1 day ago by brianaloredana0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1643 users visited in the last hour
_