Question: Editing vcf file using bcftools
0
gravatar for janhuang.cn
3.2 years ago by
janhuang.cn150
janhuang.cn150 wrote:

I want to use bcftool (Version: 1.0 (using htslib 1.0)) to edit a vcf file, and then export a updated vcf file or bed file (bed is preferred).

There are several things that I want to do, and I found some relevant command from https://samtools.github.io/bcftools/bcftools.html. But I don't know how to put them together. Particularly I do not even know how to load the original vcf file.

I also found a previous post (Extract subset of samples from multigenome vcf file) on similar topic, but I still do not understand the command there.

bcftools view -Oz -S sample.txt $file > /get/inthis/dir/output_"${i##*/}"_.vcf.gz

1) Subset a sample based on a txt file. This txt file include the sample I want to keep in the vcf.

-S, --samples-file FILE

2) Keep the snps

-v, --types snps

3) Keep only snps with maf > 0.05

I did not find relevant command for this.

4) remove duplicate snp

-d, --rm-dup snps

or

-c, --collapse snps
subset snp bcftools maf vcf • 3.5k views
ADD COMMENTlink written 3.2 years ago by janhuang.cn150

So what's the problem? Are you getting an error? bcftools filter is the command you'll need to filter by MAF, assuming it's one of your INFO fields. Any particular reason you're using such an old version of the tools? The current version is 1.5.

You won't be able to output in BED format with bcftools, you'll need to use something like BEDOPS' vcf2bed tool to make that conversion.

ADD REPLYlink written 3.2 years ago by jared.andrews077.9k

I used this command to subset the European sample from the all sample vcf (ALL.genotypes.vcf.gz), and export the vcf of European sample (EUR.genotypes.vcf.gz).

bcftools view --samples-file EUR.txt --force-samples --types snps --exclude "MAF[0]<0.05" --output-file EUR.genotypes.vcf.gz ALL.genotypes.vcf.gz

But I also want to remove duplicate snps. So I used the below command. However, bcftools did not return anything.

bcftools norm --remove-duplicates snps --output rmvdup_EUR.genotypes.vcf.gz EUR.genotypes.vcf.gz
ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by janhuang.cn150

So to clarify, after the first command, you still have output, but lose it after the second? What is snps doing in that command? The --remove-duplicates parameter doesn't require you to specify the type of record if I remember correctly.

You could also use the vcfuniq command from VCFutils to do this.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by jared.andrews077.9k

Thank you. You are right, I should not put snps after --remove-duplicates . The below command works.

bcftools norm --remove-duplicates --output rmvdup_EUR.genotypes.vcf.gz EUR.genotypes.vcf.gz

I now use bcftools view to export a vcf file for European population, and then use bcftools norm to remove duplicates, then export a second vcffile. But can I use bcftools view and bcftools norm in the same command? I do not actually need the first vcf file.

ADD REPLYlink written 3.2 years ago by janhuang.cn150

Yes, you can do both commands in one line with UNIX piping:

bcftools view --samples-file EUR.txt --force-samples --types snps --exclude "MAF[0]<0.05" ALL.genotypes.vcf.gz | bcftools norm --remove-duplicates --output rmvdup_EUR.genotypes.vcf.gz -

That should work and remove the need for the intermediate file. Glad you got it working.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by jared.andrews077.9k

I See. Thank you so much!

ADD REPLYlink written 3.2 years ago by janhuang.cn150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1704 users visited in the last hour