I need to calculate coverages for each position in several regions. I use following command:
coverageBed -abam bam_file -b name_of_bed -d
But finally the coordinates from .bed file are not sorted, in the result file! It reduces the opportunity to use cost-efficient algorithms.
Could you tell me what I am doing wrong? Is there a better way to calculate coverages for each position and keep them in the same order as in original bed file?
BED-file starts with:
chr1 762097 762270 chr1 865591 865791
(and so on, ascending order of coordinates in chromosome)
So I would like to have coverage output to be in the same order. What I have:
first line in the output file:
1 39845644 39846247 1 0
1 917230 917836 516 11
(weird order, neither ascending nor descending)
I re-sorted bam file just in case it was not sorted. It did not help.
UPDATE: tried to calculate coverage using bedTools v. 2.25 and
-sorted flag. Previously output was:
1 39845744 39846147 1 4 1 39845744 39846147 2 4 1 39845744 39846147 3 4
1 10001 10077 HWUSI-EAS1693_0012:7:41:10492:2084#0/1 0 + 10001 10077 0,0,0 1 76, 0, 1 0 1 10001 10077 HWUSI-EAS1693_0012:7:41:10492:2084#0/1 0 + 10001 10077 0,0,0 1 76, 0, 2 0 1 10001 10077 HWUSI-EAS1693_0012:7:41:10492:2084#0/1 0 + 10001 10077 0,0,0 1 76, 0, 3 0
WTF is this?
HWUSI? Code for v.2.15:
coverageBed -abam Sample.realigned.dm.recalibrated.bam -b Regions.bed -d > output.txt
Code for v.2.25:
covBed -abam Sample.realigned.dm.recalibrated.bam -b Regions.bed -d -sorted > output.sorted.txt
How was your bed file sorted? Couldn't you just use the sort command to sort them how you'd like? Or if you'd rather re-sort them to match your original bed file you could just read both into R and use the match function (merge should work too).
They are sorted lexicographically. The smaller position on chromosome is - the higher the region is.
So to get this right, your bam and bed input is sorted, but your output doesn't come out sorted (because bedtools automatically sorts in its preferred sort order) as you'd like? Could you post a small section of the resulting bedfile and the input bed file and an example of what you want the output to be so we can help.
I updated my question. And I have one bam file and one bed file. So I would like to obtain number of reads from bam file covering probes from bed file and to have them sorted as in the bed file.
Try @dariober's solution and if that does not work i'll try to give you a R solution.
I'm trying now. The files are approx 5 GB each. I can sort output files with a script, but I a) do not want to re-write the file with coverages (more memory + time consumption) b) do not understand how and why bedtools mixes the order of bed specified regions. It makes no sense.
Your bed file is 5GB!? In this case you definitely want to use the -sorted option that comes with latest versions of bedtools otherwise it's going to take an awful amount of memory and time (and you get sorted output). By the way, if you stream the output of coverageBed to sort I think the overhead of re-sorting is going to be relatively small compared to the actual counting.
Sorry, had a limit of 5 messages per day on this forum so could not answer yesterday. Yes, they are exome-seq data. I tried your gnu version - it worked for ages, I waited several hours and then interrupted it. Now will try 2.25 version, hope it will work.