Question: bcftools filtering error
0
gravatar for bsmith030465
8 months ago by
bsmith030465140
United States
bsmith030465140 wrote:

Hi,

Apologies for the newbie question! I was trying to get some summary numbers from my vcf file. I wanted:

  1. Summary numbers for all variants that PASS and are INDELS (per sample)
  2. Summary numbers for all variants that PASS and are SNVs (per sample)

I tried the command:

bcftools query -i 'FILTER=="PASS" && TYPE=="INDEL"' vcf_file.vcf > pass.indel.vcf

but that seems to give an error (usage document pops up).

EDIT (to include error(?)/console output):

About:   Extracts fields from VCF/BCF file and prints them in user-defined format
Usage:   bcftools query [options] <A.vcf.gz> [<B.vcf.gz> [...]]

Options:
-e, --exclude <expr>              exclude sites for which the expression is true (see man page for details)
-f, --format <string>             see man page for details
-H, --print-header                print header
-i, --include <expr>              select sites for which the expression is true (see man page for details)
-l, --list-samples                print the list of samples and exit
-o, --output-file <file>          output file name [stdout]
-r, --regions <region>            restrict to comma-separated list of regions
-R, --regions-file <file>         restrict to regions listed in a file
-s, --samples <list>              list of samples to include
-S, --samples-file <file>         file of samples to include
-t, --targets <region>            similar to -r but streams rather than index-jumps
-T, --targets-file <file>         similar to -R but streams rather than index-jumps
-u, --allow-undef-tags            print "." for undefined tags
-v, --vcf-list <file>             process multiple VCFs listed in the file

Examples:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' file.vcf.gz

So, my questions:

  1. What is the mistake in this query?

  2. Would piping it to 'bcftools stats' (instead of writing to file) give the sample wise count (for indels), or is there an easier way to get that info.

  3. How should I re-work the query to include all SNVs (instead of the indels).

many thanks!

bcftools next-gen samtools • 435 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by bsmith030465140

What is the error? can you add it to the question ?

ADD REPLYlink written 8 months ago by Medhat8.2k

It's probably that bcftools query expects a format option. query is a formatter after all.

ADD REPLYlink written 8 months ago by RamRS21k
1
gravatar for RamRS
8 months ago by
RamRS21k
Houston, TX
RamRS21k wrote:

Why are you using query? You are trying to subset variants, you should be using bcftools view. The -v option will let you pick SNVs/InDels.

bcftools view -v snps -f PASS vcf_file.vcf >snps_vcf.vcf
bcftools view -v indels -f PASS vcf_file.vcf >indels_vcf.vcf
ADD COMMENTlink modified 8 months ago • written 8 months ago by RamRS21k

I was trying to replicate the manual : bcftools filtering

Example in link:

$ bcftools query -i'QUAL>20 && DP>10' -f'%CHROM %POS %QUAL %DP\n' file.bcf
ADD REPLYlink written 8 months ago by bsmith030465140

The -f tag is worth noting. bctools query is principally a formatter. I'm not sure why the -i filter won't work though.

ADD REPLYlink written 8 months ago by RamRS21k

So, according to the example, the -f flag would specify which columns to include, right?

Shouldn't '-f PASS vcf_file.vcf' also include which column we want to apply the 'PASS' to?

Sorry for the newbie questions!

ADD REPLYlink written 8 months ago by bsmith030465140

The -f flag is a pre-defined format, so you would not be able to use arbitrary values in there. Examples would show you that all format strings begin with a %.

ADD REPLYlink written 8 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1922 users visited in the last hour