Bcftools view. FREQ inclusion error.
2
0
Entering edit mode
4.9 years ago
tcom111 ▴ 10

Dear all, I designed a scipt (running from Jupyter notebook, Python3) that takes vcf files of choice (by some string filters) and generates their gzipped version after filtering using BCFTOOL view -i * -t** supplied by the user. When using an inclusion of depth e.g DP > 100, the script works and filters all instances <=100.

Example:

bcftools view -Oz /path/file.vcf -o /path/file.vcf.gz  -i 'DP>100' -t 5:25000000-25001000


*This yield the expected outcome. A filtered VCF file.

However I get errors when trying FREQ > 0.1 instead:

Example

bcftools view -Oz /path/file.vcf -o /path/file.vcf.gz   -i 'FREQ>0.1' -t 5:25000000-25001000
Wrong operator in string comparison: FREQ > 0.3  [(null),0.39%]
[E::main_vcfindex] unknown filetype; expected bgzip compressed VCF or BCF
[E::main_vcfindex] was the VCF/BCF compressed with bgzip?


I've tried many instances of this line but nothing worked (e.g. -i FMT/FREQ>0.1). Does someone know how to solve this issue? (I originally planned to run both inclusion together with an "&" operator.

Thanks!

bcftools annovar FREQ variant calling • 2.3k views
0
Entering edit mode

Does your VCF even have the FREQ variable encoded? Do you not mean AF (allele frequency)?

0
Entering edit mode

0
Entering edit mode

Can you follow-up on the answer given by sm.hashemin? My time is currently very limited.

Is FREQ defined in your VCF header? Can you paste a few sample variants?

3
Entering edit mode
4.9 years ago
sm.hashemin ▴ 90

Hi there. You must define FREQ as

FORMAT= ID=FREQ,Number=.,Type=Float,Description=...


in the VCF header! As long as FREQ is an integer you can not do arithmetic stuff with it! Besto

0
Entering edit mode

##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">


I switched the "String" to "Float" and also tried replacing the "1" to "." I doesn't work and also generates this error during the generation of the filtered file using the bcftools view command:

[E::vcf_parse_format] Invalid character '%' in 'FREQ' FORMAT field at  ....


Should I omit the "<" as well? Thanks again.

1
Entering edit mode

Hello tcom111,

please use the code button (the one with 101010) to format code. That makes your post more readable.

• There are to many " in your header line. Look at the value of description. That looks weird.
• Take care of upper and lower cases. freq is not the same as FREQ.

So the correct header line should be like this:

##FORMAT=<ID=FREQ,Number=1,Type=Float,Description="variant allele frequency">


fin swimmer

0
Entering edit mode

tcom111, can you please try the suggestions by both finswimmer and sm.hashemin and then please report back here about what has actually solved your problem?

0
Entering edit mode

Hello Kevin,

have you edited tcomm111 post to correct maybe some wrong parsing through to forum software? At the time of writing my comment, the header line that tcom111 presented was written in lower cases and there were to many " Now it isn't anymore and my comment made little sense.

fin swimmer

0
Entering edit mode

I did edit some posts in this thread in response to what you had said, finswimmer, in relation to wrapping text/code via the 101 010 button. When we use this function, however, some formatting can become hidden or lost, including quotation marks and parentheses positioned in a certain way.

I would not worry about some of your comments becoming redundant. This is common on this website. Comments frequently look out of place.

0
Entering edit mode

To clarify: I only edited tcom111's posts by wrapping certain text with the 101 010 button. tcom111 her/himself may have edited further things.

0
Entering edit mode

Hi there try using the

 bcftools filter --include 'FREQ>0.1' inputfile.vcf --output outputfile.vcf


hope in works Best Mo

2
Entering edit mode
4.8 years ago
tcom111 ▴ 10

[[Problem solved]]

Dear all,

Same as before, I switched to Float in accordance with finswimmer's suggestion:

##FORMAT=<ID=FREQ,Number=1,Type=Float,Description="variant allele frequency">


The problem remained. However I figured out that the "%" makes the calculation (<,>,=) faulty. To cope this, I modified the VCF file where the "FREQ" data appears, and removed the "%" character from the entire file. This solved the issue (done in a short python script).

NOTE: It should be considered however when writing the inclusion/exclusion command: Consider the number in the FREQ position is already in it percentage value (e.g. write -i FREQ>50) for VAF>50%. I guess another way to handle it is by calculation of the VAF on the fly with the read integers of REF and ALT (I didn't try it but it should work)

1
Entering edit mode

Great - thanks for coming back with a detailed answer. Feel free to 'Accept' (your own answer) and/or up-vote other comments and answers that have helped. Note that sm.hashemin also suggested to use Float, even though that appeared to be just part of the issue.

0
Entering edit mode

Hello tcom111,

I should have asked this already, but the problem seems to me so obvious that I didn't...

How was your vcf generated? Even if you solved your problem now, could you please post the complete header of the vcf and the first few variants, that we can see how your original, unmodified file looks like?

fin swimmer

0
Entering edit mode

I didnt get how your FREQ field looks like in your vcf file. Is there a % after the number?