Bedtools Coverage Read Counts
1
0
Entering edit mode
6.1 years ago
gtasource ▴ 60

Using Bedtools makewindows, I generated a file that split up the genome by 5000kilobase windows. Using Bedtools Coverage, I then found how many reads fell into these 5000kb windows from a specific BAM file. Now that I am looking at the Bedtools Coverage results, I see that the following pieces of information are given:

1.) The number of features in B that overlapped (by at least one base pair) the A interval. 2.) The number of bases in A that had non-zero coverage from features in B. 3.) The length of the entry in A. 4.) The fraction of bases in A that had non-zero coverage from features in B.

For example, at Chromosome 1, loci 0 to 1000, I may see an output of the following:

CHR1 0 1000 3  30  100 0.3000000

With 3 being the number of features in B that overlapped (by at least one base pair) the A interval. With 30 being The number of bases in A that had non-zero coverage from features in B. With 100 being the length of the entry in A With 0.3000000 being the fraction of bases in A that had non-zero coverage from features in B.

If I only care about the number of reads that fall into a specific window, should I only be focused on #1 (The number of features in B that overlapped (by at least one base pair) the A interval)? In this case, being the number 3?

bedtools • 4.5k views
ADD COMMENT
2
Entering edit mode
6.1 years ago

Yes, for your situation, you want the first number, i.e., 3 features of B have overlapped the A feature (chr1:0-1000) by at least 1 base. You can modify the level of overlap, of course. Would it make sense to count something that only overlaps a 5000bp window by just a single base, for example? This is where you may additionally want to use the final figure (0.3), which indicates that only 30% of the A feature was covered by B features. This could be something like a 2-pass filtering procedure.

This simple logic is actually the exact same as that used by, for example, featureCounts, which counts reads over a GTF/GFF file. I and other colleagues have used BEDTools coverage in the past for producing raw counts from Cufflinks / StringTie-generated GTFs and BAMs. For particular RNA-seq experiments, BEDTools coverage actually does the exact same as featureCounts.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6