bcftools isec -n operators
0
0
Entering edit mode
3.2 years ago

I am still very confused by the use of the bcftools isec -n flag.

According to the manual: https://samtools.github.io/bcftools/bcftools.html#isec):

 -n, --nfiles [+-=]INT|~BITMAP
output positions present in this many (=), this many or more (+), this many or fewer (-), or the exact same (~) files


But after a few trials, I have realized that -n+2 and -n=10 with an input of 10 files do not output the same results.

Has anyone any clues about the use of these operator?

bcftools intersect • 3.1k views
3
Entering edit mode

-n+2 and -n=10 with an input of 10 files do not output the same results

Of course not. -n+2 translates to "present in 2 or more among the 10 files". -n=10 translates to "present in all 10 files". Why do you expect them to have the same results?

1
Entering edit mode

Thank you RamRS, I simply had not understood the correct function of the operator.

So, to reformulate, in a command with 10 files:

-n+2 translates "variants present in 2 or more files among the 10 files" or "all the variants that are in at least 2 files"

-n=2 translates "variants present in exactly 2 files among the 10 files"

-n-2 translates "variants present in 2 or less files among the 10 files" or "variants not shared by more than 2 files"

-n~2 translates "the variants that are altogether shared by 2 files among the 10 files"

Is it correct?

I think I am still a bit confused by the operator ~

2
Entering edit mode

The ~ operator is used with a BITMAP, not the INT part. The example given in the section shows how it can be used. If you have a bunch of files (10 in your case), and you wish to say not only how many files, but also which files the entry should be a part of, you can use the ~ operator.

Find all entries present in 5 files: -n=5

Find all entries present in the 2nd, 3rd, 6th, 7th and 9th files: -n~0110011010 (see how the 1s denote the files to be used)

2
Entering edit mode

@Ram Clear and Concise. Thanks for the comment!

1
Entering edit mode

Thanks a lot RamRS, it is very clear now!

It would be handy to have the very same example you have provided in the bcftools manual.

2
Entering edit mode

There is a better example in there:

Print a list of records which are present in A and B but not in C and D

bcftools isec -n~1100 -c all A.vcf.gz B.vcf.gz C.vcf.gz D.vcf.gz