Hi there,
This question is very basic but I need to ensure that I'm going on the right way. I need to calculate the number of reads falling inside my bed intervals and the number of reads falling outside them. After reading this thread (Counting Number Of Bam Reads Directly Within Set Of Intervals With Bedtools), I decided to try with this command:
intersectBed -abam my_file.bam -b my_file.bed -wa -f 1 | coverageBed -abam stdin -b my_file.bed
I would like to know what is the difference between using the previous command, or using only the second part:
coverageBed -abam my_file.bam -b my_file.bed
The output is quite different for some hits:
- First command output:
1 50331576 50331667 (.. gene names..) 0 0 91 0.0000000 1 39845848 39846030 (..gene names..) 70 178182 0.9780220
- Second command output:
1 50331576 50331667 (..gene names..) 47 91 91 1.0000000 1 39845848 39846030 (..gene names..) 143 182182 1.0000000
I think that for first command I get only those reads falling strictly within interval, while for the second one also include reads that partially covering the intervals? This is true?
For other hand, I would like also to get the number of reads falling outside the intervals. I can make a new bed file using bedtools complement, but if I use -v option of bedtools intersect would be OK? Like this:
intersectBed -v -abam my_file.bam -b my_file.bed -wa -f 1 | coverageBed -abam stdin -b my_file.bed
Thanks in advance.
Hi Pierre, one question, using your first command, I get the reads mapping only within interval (strictly within interval), or also include reads that partially covering the intervals? Thank you