bcftools consensus equivalent for VarScan output
1
0
Entering edit mode
3.2 years ago
RBright21 ▴ 10

I have been carrying out variant calling after alignment using bcftools and extracting a consensus fasta file for small viral genomes for downstream analysis using

bcftools consensus -f ref.fa test.vcf > consensus.fa

I want more control over the variant calling parameters but can't figure out a way to do this using bcftools call from the manual. I have found that VarScan offers more options for minimum coverage, minimum reads, minimum quality etc... but I then can't find a way to extract a consensus fasta file from the output file.

I have two questions

  1. Please could anyone help me to extract a consensus fasta file from a VarScan pileup2cns output OR
  2. Please could anyone help me to find out how I can set parameters such as minimum variant coverage, minimum variant frequency using the bcftools call function?

Thanks

Assembly sequencing varscan • 1.5k views
ADD COMMENT
2
Entering edit mode
3.2 years ago

I would pipe bcftools call output into bcftools view or bcftools filter, and do the filtering there, before ultimately piping into bcftools consensus. Something like:

bcftools mpileup ... |\
  bcftools call ... |\
    bcftools filter ... |\
      bcftools consensus -f ref.fa - > consensus.fa ;

...or, indeed:

bcftools mpileup ... |\
  bcftools call ... |\
    bcftools filter ... -Oz > out.vcf.gz ;
tabix -p vcf out.vcf.gz ;
bcftools consensus -f ref.fa out.vcf.gz > consensus.fa ;

Take a look at the manual page for bcftools filter: http://samtools.github.io/bcftools/bcftools.html#filter

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin Thanks for your response. I have looked at bcftools filter in the manual before but if I am completely honest I don't really understand it - I'm new to command line so its quite overwhelming. Do I need to use expressions? Say if I use -i QUAL>20 will that only give me those variants with quality scores above 20?

I'll go back over the manual again and try to make sense of it but if you know of any tutorials for this function which may help I'd be extremely grateful Thanks

ADD REPLY
1
Entering edit mode

Yes, for example, this will filter include those variants with QUAL > 20 and position read-depth > 20:

bcftools filter --include 'QUAL>20 && FORMAT/DP>20' my.vcf ;

There are examples of expressions at the bottom of the manual page, but I admit that it's a lot to digest for those just getting into this: http://samtools.github.io/bcftools/bcftools.html#expressions

For every VCF that you have, you should know what is the data that is encoded in the VCF. You can know this by looking at the information in the header. Common tags are AD, DP, AF, et cetera.

bcftools view -h my.vcf ;

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
...
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##contig=<ID=chr1,length=249250621,assembly=hg19>
...
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample_N31_T1
ADD REPLY
1
Entering edit mode

Hi Kevin Thanks so much for this, its really helpful. I will have a go back through the manual with this information and see what I can achieve.

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6