Question: Bedtools Coverage Read Counts
0
gravatar for gtasource
19 months ago by
gtasource20
gtasource20 wrote:

Using Bedtools makewindows, I generated a file that split up the genome by 5000kilobase windows. Using Bedtools Coverage, I then found how many reads fell into these 5000kb windows from a specific BAM file. Now that I am looking at the Bedtools Coverage results, I see that the following pieces of information are given:

1.) The number of features in B that overlapped (by at least one base pair) the A interval. 2.) The number of bases in A that had non-zero coverage from features in B. 3.) The length of the entry in A. 4.) The fraction of bases in A that had non-zero coverage from features in B.

For example, at Chromosome 1, loci 0 to 1000, I may see an output of the following:

CHR1 0 1000 3  30  100 0.3000000

With 3 being the number of features in B that overlapped (by at least one base pair) the A interval. With 30 being The number of bases in A that had non-zero coverage from features in B. With 100 being the length of the entry in A With 0.3000000 being the fraction of bases in A that had non-zero coverage from features in B.

If I only care about the number of reads that fall into a specific window, should I only be focused on #1 (The number of features in B that overlapped (by at least one base pair) the A interval)? In this case, being the number 3?

bedtools • 1.8k views
ADD COMMENTlink modified 19 months ago by Kevin Blighe49k • written 19 months ago by gtasource20
2
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

Yes, for your situation, you want the first number, i.e., 3 features of B have overlapped the A feature (chr1:0-1000) by at least 1 base. You can modify the level of overlap, of course. Would it make sense to count something that only overlaps a 5000bp window by just a single base, for example? This is where you may additionally want to use the final figure (0.3), which indicates that only 30% of the A feature was covered by B features. This could be something like a 2-pass filtering procedure.

This simple logic is actually the exact same as that used by, for example, featureCounts, which counts reads over a GTF/GFF file. I and other colleagues have used BEDTools coverage in the past for producing raw counts from Cufflinks / StringTie-generated GTFs and BAMs. For particular RNA-seq experiments, BEDTools coverage actually does the exact same as featureCounts.

Kevin

ADD COMMENTlink modified 19 months ago • written 19 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 751 users visited in the last hour