Question: Bcftools view options for SNPs calling.
3
gravatar for Eddie_I
4.2 years ago by
Eddie_I30
Australia
Eddie_I30 wrote:

Hi there,

Newbie to the bioinformatic area and looking for some help.

I'm trying to use bcftools to do some SNPs calling from a bcf file created out of "samtools mpileup".

In a lot of posts I see the bcftools view options as "-bvcg", but when I try it I get the error 

bcftools: unknown option -- b

In reading I see that bcftools have changed their view options with the more recent versions (i'm runnning 1.2), so was wondering if any one can translate the old options into what to use for the new options values. I'm trying :-

"-O b -v snps -g het"

but since I have never used bcftools before I don't know if that would get me the same output.

Thanks for your assistance in advance.

Eddie

 

mpileup snp samtools bcftools • 18k views
ADD COMMENTlink modified 14 months ago by Kevin Blighe49k • written 4.2 years ago by Eddie_I30
2

Hi!

There is no "-b" option in bcftools manual

"-O b" (in your second command line) means to get output as a compressed BCF file.

 

To do SNP calling, I usually use the command line:

bcftools call --skip-variants indels --multiallelic-caller --variants-only  -O v <output.bcf> -o <output.vcf>

 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Evgeniia Golovina990

Thanks for the reply.

I will give bcftools call a try.

yes, I was trying to use the command : "bcftools view -bvcg ...", but it seem those were the valid options for an older version of bcftools.

I found this list of options for bcftools view (an old version, which explained the -bvcg to me):

Usage: bcftools view [options] <in.bcf> [reg]

Input/output options:

       -A        keep all possible alternate alleles at variant sites
       -b        output BCF instead of VCF
       -D FILE   sequence dictionary for VCF->BCF conversion [null]
       -F        PL generated by r921 or before (which generate old ordering)
       -G        suppress all individual genotype information
       -l FILE   list of sites (chr pos) or regions (BED) to output [all sites]
       -L        calculate LD for adjacent sites
       -N        skip sites where REF is not A/C/G/T
       -Q        output the QCALL likelihood format
       -s FILE   list of samples to use [all samples]
       -S        input is VCF
       -u        uncompressed BCF output (force -b)

Consensus/variant calling options:

       -c        SNP calling (force -e)
       -d FLOAT  skip loci where less than FLOAT fraction of samples covered [0]
       -e        likelihood based analyses
       -g        call genotypes at variant sites (force -c)
       -i FLOAT  indel-to-substitution ratio [-1]
       -I        skip indels
       -m FLOAT  alternative model for multiallelic and rare-variant calling, include if P(chi^2)>=FLOAT
       -p FLOAT  variant if P(ref|D)<FLOAT [0.5]
       -P STR    type of prior: full, cond2, flat [full]
       -t FLOAT  scaled substitution mutation rate [0.001]
       -T STR    constrained calling; STR can be: pair, trioauto, trioxd and trioxs (see manual) [null]
       -v        output potential variant sites only (force -c)

Contrast calling and association test options:

       -1 INT    number of group-1 samples [0]
       -C FLOAT  posterior constrast for LRT<FLOAT and P(ref|D)<0.5 [1]
       -U INT    number of permutations for association testing (effective with -1) [0]
       -X FLOAT  only perform permutations for P(chi^2)<FLOAT [0.01]

but the version 1.2 bcftools view options are :

$ bcftools view

About:   VCF/BCF conversion, view, subset and filter VCF/BCF files.
Usage:   bcftools view [options] <in.vcf.gz> [region1 [...]]

Output options:
    -G,   --drop-genotypes              drop individual genotype information (after subsetting if -s option set)
    -h/H, --header-only/--no-header     print the header only/suppress the header in VCF output
    -l,   --compression-level [0-9]     compression level: 0 uncompressed, 1 best speed, 9 best compression [-1]
    -o,   --output-file <file>          output file name [stdout]
    -O,   --output-type <b|u|z|v>       b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
    -r, --regions <region>              restrict to comma-separated list of regions
    -R, --regions-file <file>           restrict to regions listed in a file
    -t, --targets [^]<region>           similar to -r but streams rather than index-jumps. Exclude regions with "^" prefix
    -T, --targets-file [^]<file>        similar to -R but streams rather than index-jumps. Exclude regions with "^" prefix

Subset options:
    -a, --trim-alt-alleles        trim alternate alleles not seen in the subset
    -I, --no-update               do not (re)calculate INFO fields for the subset (currently INFO/AC and INFO/AN)
    -s, --samples [^]<list>       comma separated list of samples to include (or exclude with "^" prefix)
    -S, --samples-file [^]<file>  file of samples to include (or exclude with "^" prefix)
        --force-samples           only warn about unknown subset samples

Filter options:
    -c/C, --min-ac/--max-ac <int>[:<type>]      minimum/maximum count for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -f,   --apply-filters <list>                require at least one of the listed FILTER strings (e.g. "PASS,.")
    -g,   --genotype [^]<hom|het|miss>          require one or more hom/het/missing genotype or, if prefixed with "^", exclude sites with hom/het/missing genotypes
    -i/e, --include/--exclude <expr>            select/exclude sites for which the expression is true (see man page for details)
    -k/n, --known/--novel                       select known/novel sites only (ID is not/is '.')
    -m/M, --min-alleles/--max-alleles <int>     minimum/maximum number of alleles listed in REF and ALT (e.g. -m2 -M2 for biallelic sites)
    -p/P, --phased/--exclude-phased             select/exclude sites where all samples are phased
    -q/Q, --min-af/--max-af <float>[:<type>]    minimum/maximum frequency for non-reference (nref), 1st alternate (alt1), least frequent
                                                   (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles [nref]
    -u/U, --uncalled/--exclude-uncalled         select/exclude sites without a called genotype
    -v/V, --types/--exclude-types <list>        select/exclude comma-separated list of variant types: snps,indels,mnps,other [null]
    -x/X, --private/--exclude-private           select/exclude sites where the non-reference alleles are exclusive (private) to the subset samples

So just trying to find the new options variable that will get me the same output.

 

ADD REPLYlink written 4.2 years ago by Eddie_I30

Hi Evgeniia,

Gave bcftools call a try and it worked for me.

Thanks for your help.

Eddie

ADD REPLYlink written 4.2 years ago by Eddie_I30

You are welcome! :)

ADD REPLYlink written 4.2 years ago by Evgeniia Golovina990
1
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

For anybody else arriving at this thread:

1, call variants

samtools mpileup --redo-BAQ --min-BQ 30 --per-sample-mF \
--output-tags DP,AD -f "${Ref_FASTA}" \
--BCF "${repBAM1}" "${repBAM2}" "${repBAM3}" "${repBAM4}" | \
bcftools call --multiallelic-caller --variants-only -Ob > out.bcf ;

2, filter the variants further

#Options:
#   -Q INT  minimum RMS mapping quality for SNPs [10]
#   -d INT  minimum read depth [2]
#   -D INT  maximum read depth [10000000]
#   -a INT  minimum number of alternate bases [2]
#   -w INT  SNP within INT bp around a gap to be filtered [3]
#   -W INT  window size for filtering adjacent gaps [10]
#   -1 FLOAT    min P-value for strand bias (given PV4) [0.0001]
#   -2 FLOAT    min P-value for baseQ bias [1e-100]
#   -3 FLOAT    min P-value for mapQ bias [0]
#   -4 FLOAT    min P-value for end distance bias [0.0001]
#   -e FLOAT    min P-value for HWE (plus F<0) [0.0001]
#   -p      print filtered variants

bcftools view -Ov out.bcf | misc/vcfutils.pl varFilter -d 18 -w 1 -W 3 -a 1 \
-1 0.05 -2 0.05 -3 0.05 -4 0.05 \
-e 0.05 -p > out.filt.vcf ;

misc/vcfutils.pl should come bundled with BCFtools.

You can then add further tags to the VCF/BCF by following this: A: How to use bcftools to calculate AF INFO field from AC and AN in VCF?

Kevin

ADD COMMENTlink modified 8 months ago • written 14 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1822 users visited in the last hour