Question: Getting Number Of Reads In Intervals With Bedtools
gravatar for user
7.6 years ago by
United States
user850 wrote:

What is the correct way to get the total number of reads strictly contained in each interval in a GFF from a BAM file while enforcing strandedness? What I am looking for is very close to this intersectBed feature:

-c    For each entry in A, report the number of overlaps with B.
    - Reports 0 for A entries that have no overlap with B.
    - Overlaps restricted by -f and -r.

Except that I'd like the number of overlaps in A for each entry in B (i.e. the other way around). If I do:

intersectBed -abam mybam.bam -b mygff.gff -s -f 1 -wb

Then my understanding is that this will report the entry in B for each overlap with A. But I'd like each entry in B to be outputted exactly once, with the number of reads from A that are contained strictly within it. I'm not sure how to enforce strict containment here.

Is coverageBed the solution to this? Or multicov? I'm not sure how to enforce strict containment using coverageBed - it's not clear to me if that's the default from the docs. Thanks.

bedtools bam samtools rna-seq • 6.4k views
ADD COMMENTlink modified 7.6 years ago by Aaronquinlan11k • written 7.6 years ago by user850
gravatar for Alex Reynolds
7.6 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Another way to do this is to use the bedmap application from the BEDOPS suite for this, where your intervals are reference regions and reads are the map regions.

First, convert the GFF to sorted BED with the gff2bed script and sort-bed:

$ < mygff.gff | sort-bed - > myGff.bed

Convert the reads from BAM to sorted BED with the bam2bed script:

$ bam2bed.csh < mybam.bam | sort-bed - > myBam.bed

With bedmap, add the --echo operator to output each reference interval from myGff.bed. Add the --fraction-map 1 operator to enforce the overlap criteria that mapped reads must be contained entirely within a reference interval. Add the --count operator to yield the number of mapped reads within each reference interval, which are printed adjacent to the reference interval data.

In sum, as an example:

$ bedmap --echo --fraction-map 1 --count myGff.bed myBam.bed > myAnswer.bed

If you need to apply strandedness criteria, such as shifting reverse-strand reads by a base, the bedops --range L:R operator can be used to pre-adjust element coordinates asymmetrically, e.g. bedops --range -1:-1 --everything foo.bed > bar.bed will reduce each start and stop coordinate by one base. Here, you might split your inputs by strand, pre-adjust a stranded input, run two bedmap operations on each input and then collate the two results at the end.

ADD COMMENTlink modified 9 months ago by RamRS27k • written 7.6 years ago by Alex Reynolds30k

Thanks, but I'd like to do this with bedtools if possible...

ADD REPLYlink written 7.6 years ago by user850

No problem. Hopefully it helps someone else trying to do the same thing.

ADD REPLYlink written 7.6 years ago by Alex Reynolds30k
gravatar for Aaronquinlan
7.6 years ago by
United States
Aaronquinlan11k wrote:

Currently, the coverage tool cannot enforce fractional coverage. However, one strategy would be to use intersect to enforce the strict containment and then pass only those alignments that meet this criteria to the coverage tool. The example below demonstrates this strategy.

bedtools intersect -abam mybam.bam -b mygff.gff -s -f 1 -ubam | \
    bedtools coverage -abam - -b mygff.gff
ADD COMMENTlink written 7.6 years ago by Aaronquinlan11k

thanks. are there plans to add fractional coverage to coverage? would be nice because then you can use the pipe to pass an on-the-fly processed gff, e.g. cmd gff | bedtools coverage -abam mybam.bam -b - - right now the pipe is taken up by the bam

ADD REPLYlink written 7.6 years ago by user850

Perhaps look into mkfifo and named pipes:

ADD REPLYlink written 7.6 years ago by Alex Reynolds30k

Fantastic! Thank you again

ADD REPLYlink written 7.5 years ago by user850
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour