Question: Count reads within region
2
gravatar for Karol Pal jr.
5.7 years ago by
Czech Republic
Karol Pal jr.20 wrote:

Hi,

I'd like to ask for help with finding an efficient way of counting reads from a bam file that lie within an interval (from a bed file). The problem is I only want reads that lie entirely within a given interval (no matter how long they are, or what percentage of the given interval they cover). The intervals may be overlapping. I'm dealing with amplicon sequencing. Currently, the only way I am able to do this is by separately intersecting (bedtools) the bam with each region in my bed file and then using samtools -c . This approach however takes too long. 

To me this seems like a very basic problem which I believe must have been solved but I'm unable to google the right solution.

Thanks for any suggestions.

ADD COMMENTlink modified 5.7 years ago by Alex Reynolds31k • written 5.7 years ago by Karol Pal jr.20
5
gravatar for Pierre Lindenbaum
5.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

using my tool samjs https://github.com/lindenb/jvarkit/wiki/SamJS

 

samtools view -bu -F 4  input.bam seq2:250-300 |\
java -jar jvarkit-git/dist-1.133/samjs.jar -e 'record.alignmentStart >= 250 && record.alignmentEnd <= 300'  |\
samtools view -Sc -
ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Pierre Lindenbaum134k
5
gravatar for Alex Reynolds
5.7 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Using BEDOPS bedmap --count:

$ bedmap --echo --count --fraction-map 1 intervals.bed <(bam2bed < reads.bam) > answer.bed

The file answer.bed contains each interval and the number of reads that are contained entirely within that interval.

As long as inputs are not nested (overlapping is fine), you can add the --faster option and gain a further performance boost:

$ bedmap --faster ...
ADD COMMENTlink modified 16 months ago by Ram32k • written 5.7 years ago by Alex Reynolds31k

Thanks,

your solution is indeed the fastest. However, I eventually returned to my original bedtools intersect solution adding Pierre's tweak with samtools view -bu which is a fast command on an indexed bam and reduces the searching space efficiently. It also allowed me to play with the flag option and print forward and reverse reads separately. Thanks again guys.

ADD REPLYlink modified 16 months ago by Ram32k • written 5.7 years ago by Karol Pal jr.20

Thanks for the feedback!

ADD REPLYlink written 5.7 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1726 users visited in the last hour
_