I have been using BEDtools intersect to determine on-target proportions of reads from BAM files in a BED. I used the -c
flag which specifies that "For each entry in A, report the number of hits in B" which is exactly what I want. Output is in the form:
chr1 9999 10062 HISEQ:529:C71NKANXX:3:1305:12386:7195/2 0 ...
So in this case, the read does not overlap any feature in my BED file. I looked into this as colleagues were querying the method and found that the output from BEDtools stops at 60. Is this a local maximum to save time? There is nothing about this in documentation.
Also while I am asking, is there a better way to determine on-target reads and bases? Intersect seemed to be able to give me both (using -wao
flag and increment on col 16 for bases). Picard HSmetrics is good but lacks on-target reads values.
I do not think there is such limit:
Output:
I did test as you have and get no limit, using the -abam flag for BAM input may output MAPQ, limit of which is 60. Haven't had time to make a 'fake' BAM to tet if this is what is output, had hoped someone else came across this issue and could clarify.
Thanks for you input.
Bedtools intersect has
-abam
option ?If you want to get the on target reads, then maybe
That should give you all the reads in the required region (e.g. on target reads)
I used:
and just divide by total reads to get off-target, which gives on-target obviously (half time of samtools method).
Thanks for suggestion though.