bcftools isec -n operators
0
0
Entering edit mode
4.3 years ago

I am still very confused by the use of the bcftools isec -n flag.

According to the manual: https://samtools.github.io/bcftools/bcftools.html#isec):

 -n, --nfiles [+-=]INT|~BITMAP
    output positions present in this many (=), this many or more (+), this many or fewer (-), or the exact same (~) files

But after a few trials, I have realized that -n+2 and -n=10 with an input of 10 files do not output the same results.

Has anyone any clues about the use of these operator?

bcftools intersect • 4.0k views
ADD COMMENT
3
Entering edit mode

-n+2 and -n=10 with an input of 10 files do not output the same results

Of course not. -n+2 translates to "present in 2 or more among the 10 files". -n=10 translates to "present in all 10 files". Why do you expect them to have the same results?

ADD REPLY
2
Entering edit mode

Thank you RamRS, I simply had not understood the correct function of the operator.

So, to reformulate, in a command with 10 files:

-n+2 translates "variants present in 2 or more files among the 10 files" or "all the variants that are in at least 2 files"

-n=2 translates "variants present in exactly 2 files among the 10 files"

-n-2 translates "variants present in 2 or less files among the 10 files" or "variants not shared by more than 2 files"

-n~2 translates "the variants that are altogether shared by 2 files among the 10 files"

Is it correct?

I think I am still a bit confused by the operator ~

ADD REPLY
3
Entering edit mode

The ~ operator is used with a BITMAP, not the INT part. The example given in the section shows how it can be used. If you have a bunch of files (10 in your case), and you wish to say not only how many files, but also which files the entry should be a part of, you can use the ~ operator.

Find all entries present in 5 files: -n=5

Find all entries present in the 2nd, 3rd, 6th, 7th and 9th files: -n~0110011010 (see how the 1s denote the files to be used)

ADD REPLY
2
Entering edit mode

@Ram Clear and Concise. Thanks for the comment!

ADD REPLY
1
Entering edit mode

Thanks a lot RamRS, it is very clear now!

It would be handy to have the very same example you have provided in the bcftools manual.

ADD REPLY
2
Entering edit mode

There is a better example in there:

Print a list of records which are present in A and B but not in C and D

bcftools isec -n~1100 -c all A.vcf.gz B.vcf.gz C.vcf.gz D.vcf.gz
  
ADD REPLY

Login before adding your answer.

Traffic: 1558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6