Not returning any alleles when using bcftools view -q
1
0
Entering edit mode
3 months ago
kshaffman • 0

I am trying to filter by allele frequency using bcftools view -q but the output does not give me any sites, all it gives me is the vcf header. I am using a cancer cell line and I am trying to use an allele frequency cut-off to make sure that there would be at least one copy per cell in the population. I chose 0.1 based on my lit review, does this seem reasonable.

Here is the command I used

bcftools view -q 0.1 "my_file.vcf" 

Output was the vcf header. The whole output is pasted at the bottom of the page

Any idea how I can return AF > 0.1. and yes I did make sure that there were AF > 0.1 in my data.

##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20220429
##source=lofreq call --verbose --ref /cvmfs/data.galaxyproject.org/byhand/hg38/sam_index/hg38.fa --call-indels --min-cov 30 --max-depth 1000000 --use-orphan --min-bq 30 --min-alt-bq 30 --min-mq 20 --max-mq 255 --min-jq 0 --min-alt-jq 0 --def-alt-jq 0 --sig 0.01 --bonf dynamic --no-default-filter -r chr1:1-124478211 -o /jetstream/scratch0/main/jobs/42329427/_job_tmp/lofreq2_call_parallelp63o5c08/0.vcf.gz reads.bam
##reference=/cvmfs/data.galaxyproject.org/byhand/hg38/sam_index/hg38.fa
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">
##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">
##FILTER=<ID=min_snvqual_78,Description="Minimum SNV Quality (Phred) 78">
##FILTER=<ID=min_indelqual_63,Description="Minimum Indel Quality (Phred) 63">
##FILTER=<ID=min_dp_10,Description="Minimum Coverage 10">
##FILTER=<ID=sb_fdr,Description="Strand-Bias Multiple Testing Correction: fdr corr. pvalue > 0.001000">
##FILTER=<ID=min_snvqual_93,Description="Minimum SNV Quality (Phred) 93">
##FILTER=<ID=min_indelqual_78,Description="Minimum Indel Quality (Phred) 78">
##SnpEffVersion="4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpEffCmd="SnpEff  -i vcf -o vcf -stats /corral4/main/objects/7/3/3/dataset_733993dd-8ceb-4959-b46f-53737d762bcd.dat -csvStats /corral4/main/objects/e/b/8/dataset_eb87c49d-8609-46f0-b181-4fcdf08aed44.dat GRCh38.p7.RefSeq /corral4/main/objects/3/f/7/dataset_3f74fe07-e0ec-4ce9-bf2a-89b8806d8ca6.dat "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpSiftCmd="SnpSift Filter -f /corral4/main/objects/6/d/a/dataset_6da04259-64b4-4853-82a2-43ce88b651db.dat -e /corral4/main/jobs/042/523/42523967/configs/tmpo4_pwef0 --inverse"
##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ( EFF[*].IMPACT = 'LOW')">
##SnpSiftCmd="SnpSift Filter -f /corral4/main/objects/8/3/5/dataset_835a16c9-7404-4049-a838-f5867ab3d8d3.dat -e /corral4/main/jobs/042/523/42523974/configs/tmpcitxn7nn --inverse"
##contig=<ID=chr1>
##contig=<ID=chr2>
##contig=<ID=chr3>
##contig=<ID=chr4>
##contig=<ID=chr5>
##contig=<ID=chr6>
##contig=<ID=chr7>
##contig=<ID=chr8>
##contig=<ID=chr9>
##contig=<ID=chr10>
##contig=<ID=chr11>
##contig=<ID=chr12>
##contig=<ID=chr13>
##contig=<ID=chr14>
##contig=<ID=chr15>
##contig=<ID=chr16>
##contig=<ID=chr17>
##contig=<ID=chr18>
##contig=<ID=chr19>
##contig=<ID=chr20>
##contig=<ID=chr21>
##contig=<ID=chr22>
##contig=<ID=chrX>
##contig=<ID=chrY>
##bcftools_isecVersion=1.11+htslib-1.11
##bcftools_isecCommand=isec -p dir -n -1 -c all /u/home/k/kshaffma/vcf_only_med_and_high/revert_only_med_and_high.vcf.gz /u/home/k/kshaffma/vcf_only_med_and_high/parent_only_med_and_high.vcf.gz; Date=Wed May 25 13:16:57 2022
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##bcftools_viewVersion=1.11+htslib-1.11
##bcftools_viewCommand=view -q 0.1 /u/home/k/kshaffma/Uniq/parents_uniq_with_annotations.vcf; Date=Mon Jun  6 16:53:51 2022
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
frequency bcftools vcf filtering allele • 382 views
ADD COMMENT
0
Entering edit mode
3 months ago
cfos4698 ▴ 510

What version of bcftools are you using? I think with the -q flag bcftools recalculates the allele frequency itself, and is perhaps not getting it right based on the lofreq output.

I just tried filtering a VCF file from lofreq with the following:

bcftools view -q 0.4 -Q 0.5 myfile.vcf.gz

The output should have been a single SNP, but three were returned. Piping the output into bcftools +fill-tags showed that bcftools thought the allele frequencies were of the three SNPs was 0.5 (wrong).

Instead, try using the include flag (-i) to get what you want. For example:

bcftools view -i 'AF > 0.1' yourfile.vcf.gz

ADD COMMENT
0
Entering edit mode

Hi, thank you. This fixed my problem. So is what you are saying @cfos4698 that the -q tag is incompatible with the loqreq alignment tool?

ADD REPLY
0
Entering edit mode

Feel free to mark my answer as accepted ;)

I think -q is just designed with the whole bcftools ecosystem in mind, i.e. that the output of bcftools call will be piped to bcftools view. The -q flag doesn't seem to work well with lofreq's default output from my experiences, and might not work with other tools very well either, unless you reformat the INFO columns to mimic bcftools call's output.

ADD REPLY

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6