Question: bcftools isec -n operators
0
gravatar for Begonia_pavonina
5 weeks ago by
Begonia_pavonina10 wrote:

I am still very confused by the use of the bcftools isec -n flag.

According to the manual: https://samtools.github.io/bcftools/bcftools.html#isec):

 -n, --nfiles [+-=]INT|~BITMAP
    output positions present in this many (=), this many or more (+), this many or fewer (-), or the exact same (~) files

But after a few trials, I have realized that -n+2 and -n=10 with an input of 10 files do not output the same results.

Has anyone any clues about the use of these operator?

bcftools intersect • 108 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Begonia_pavonina10
2

-n+2 and -n=10 with an input of 10 files do not output the same results

Of course not. -n+2 translates to "present in 2 or more among the 10 files". -n=10 translates to "present in all 10 files". Why do you expect them to have the same results?

ADD REPLYlink written 5 weeks ago by RamRS25k

Thank you RamRS, I simply had not understood the correct function of the operator.

So, to reformulate, in a command with 10 files:

-n+2 translates "variants present in 2 or more files among the 10 files" or "all the variants that are in at least 2 files"

-n=2 translates "variants present in exactly 2 files among the 10 files"

-n-2 translates "variants present in 2 or less files among the 10 files" or "variants not shared by more than 2 files"

-n~2 translates "the variants that are altogether shared by 2 files among the 10 files"

Is it correct?

I think I am still a bit confused by the operator ~

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Begonia_pavonina10
1

The ~ operator us used with a BITMAP, not the INT part. The example given in the section shows how it can be used. If you have a bunch of files (10 in your case), and you wish to say not only how many files, but also which files the entry should be a part of, you can use the ~ operator.

Find all entries present in 5 files: -n=5

Find all entries present in the 2nd, 3rd, 6th, 7th and 9th files: -n~0110011010 (see how the 1s denote the files to be used)

ADD REPLYlink written 5 weeks ago by RamRS25k

Thanks a lot RamRS, it is very clear now!

It would be handy to have the very same example you have provided in the bcftools manual.

ADD REPLYlink written 5 weeks ago by Begonia_pavonina10
1

There is a better example in there:

Print a list of records which are present in A and B but not in C and D

bcftools isec -n~1100 -c all A.vcf.gz B.vcf.gz C.vcf.gz D.vcf.gz
  
ADD REPLYlink written 5 weeks ago by RamRS25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 758 users visited in the last hour