Question: Difference between | and || in filtering expressions by bcftools
3
gravatar for finswimmer
23 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello all,

bcftools have these logical operators that can be used in filtering expressions:

&& (same as &), ||,  |

What's the difference between || and |? Can someone provide an example and/or usecase for clarification?

The manual have this example:

QUAL>10 |  FMT/GQ>10   .. true for sites with QUAL>10 or a sample with GQ>10, but selects only samples with GQ>10
QUAL>10 || FMT/GQ>10   .. true for sites with QUAL>10 or a sample with GQ>10, plus selects all samples at such sites

But this is not clear to me.

fin swimmer

bcftools vcf • 972 views
ADD COMMENTlink modified 22 months ago • written 23 months ago by finswimmer13k

sounds like the the single | works like the linux pipe symbol in a cmdline and the double || as the 'or operator'

ADD REPLYlink written 23 months ago by lieven.sterck7.9k

Hm, if it would work like linux pipe where is the difference to &&?

I guess I also have a problem with the description "selects only/all samples". Normaly I select a whole vcf line and not just a sample. Which subcommand of bcftools is able to select specific samples based on expression?

Furthermore the man page said && (same as &), but in the examples this seems to be not the same:

FMT/DP>10  & FMT/GQ>10 .. both conditions must be satisfied within one sample
FMT/DP>10 && FMT/GQ>10 .. the conditions can be satisfied in different samples

I'm confused. The only thing that seems to be clear to me, is that | vs. || or & vs && makes differences in multisample vcf's.

fin swimmer

ADD REPLYlink written 23 months ago by finswimmer13k

Could this be related to short-circuiting by any chance?

ADD REPLYlink written 23 months ago by RamRS27k

Hm, looks quite good. But I'm still not sure.

I decided to crosspost directly on bcftools github.

Let's wait and see.

ADD REPLYlink written 23 months ago by finswimmer13k

Tagging: lh3

ADD REPLYlink written 23 months ago by finswimmer13k
4
gravatar for pd3
22 months ago by
pd340
pd340 wrote:

Say your VCF contains the per-sample depth and genotype quality annotations and you want to include only sites where one or more samples have big enough coverage (DP>10) and genotype quality (GQ>20). The expression -i 'FMT/DP>10 & FMT/GQ>20' selects sites where the conditions are satisfied within the same sample:

bcftools query -i'FMT/DP>10 & FMT/GQ>20' -f'%POS[\t%SAMPLE:DP=%DP GQ=%GQ]\n' file.bcf

49979   SampleA:DP=10 GQ=50     SampleB:DP=20 GQ=40

On the other hand, if you need to include sites where both conditions met but not necessarily in the same sample, use the && operator rather than &:

bcftools query -i'FMT/DP>10 && FMT/GQ>20' -f'%POS[\t%SAMPLE:DP=%DP GQ=%GQ]\n' file.bcf

31771   SampleA:DP=10 GQ=50     SampleB:DP=40 GQ=20
49979   SampleA:DP=10 GQ=50     SampleB:DP=20 GQ=40

This example is taken from http://samtools.github.io/bcftools/howtos/filtering.html


EDIT: (inserted by a mod)

Answer given on github:

Well, sorry to demonstrate the difference on & and && instead of | of ||, but it's the same priniciple.

The manual page says it all:

QUAL>10 |  FMT/GQ>10   .. true for sites with QUAL>10 or a sample with GQ>10, but selects only samples with GQ>10
QUAL>10 || FMT/GQ>10   .. true for sites with QUAL>10 or a sample with GQ>10, plus selects all samples at such sites

Or you can try to run yourself:

$ bcftools query -f'[%POS %SAMPLE %DP\n]\n' -i'FMT/DP=19 | FMT/DP="."' test/view.filter.vcf 
3162006 A 19

3162007 A .
3162007 B .

$ bcftools query -f'[%POS %SAMPLE %DP\n]\n' -i'FMT/DP=19 || FMT/DP="."' test/view.filter.vcf 
3162006 A 19
3162006 B 1

3162007 A .
3162007 B .
ADD COMMENTlink modified 22 months ago by finswimmer13k • written 22 months ago by pd340

Hello pd3,

thank you for your response. Unfortunately this isn't an answer to my initial question about the differences between | and ||.

As I wrote a little bit later the difference between & and && is clearly explained in the manual, even if the description "&& (same as &)" is a bit misleading.

But you showed me a use case about selected samples and not only variants. Have to think more about it. But I guess somewhere there is the answer ... :)

fin swimmer

ADD REPLYlink written 22 months ago by finswimmer13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1735 users visited in the last hour