How to bin bed files (non-standard) into 10000bp window size?
1
0
Entering edit mode
12 months ago
kabir.deb ▴ 80

Hi Biostars,

I have one thousand non-standard bed files which are exported by GRanges in R.

bedgr <- GRanges(bed[,1], IRanges(bed[,2], bed[,3])) 
export.bed(bedgr, "tmp.bed", "bed")

I've been seeking for a way to bin these by the 10000 bp window size while keeping uniform bins consistent across sample results. In my new bed file, I'll have four columns, the fourth of which should show the number of reads in each bin.

I found an example post in Biostars: How can I bin my bed files into 500bp bins?

For doing so, first I created a small bed file for testing,

head file.txt
    chr1    33  92  
    chr1    52  118  
    chr1    53  99  
    chr1    361 405 
    chr1    632 688 
    chr1    2000    2100
    chr2    1   91  
    chr2    22  118 
    chr2    93  199  
    chr2    319 425 

then I ran the following command from the previous Biostars post

time srun bedtools makewindows -g file.txt -w 500 > test.win.bed

Now the issue is that the outcome test.win.bed file doesn't have the separate chromosomes in the first column, all became chr1. It's true for 24 chromosomes file as well, all chromosomes' names become chr1 and its looks like,

head test.win.bed

chr1    0   500
chr1    500 1000
chr1    1000    1500
chr1    1500    2000
chr1    0   500
chr1    500 1000
chr1    1000    1500
chr1    1500    2000
chr1    0   500
chr1    500 1000

Further, when I try to run the map command

time srun bedtools map -a file.txt  -b test.win.bed -c 4 -o sum

it is unable to read the file.txt and it shows the error:

Error: unable to open file or unable to determine types for file file.txt

srun: error: hal0279: task 0: Exited with exit code 1
- Please ensure that your file is TAB delimited (e.g., cat -t FILE).
- Also ensure that your file has integer chromosome coordinates in the
  expected columns (e.g., cols 2 and 3 for BED).

Any suggestions would be really appreciated.

Thanks in advance,
Deb

bedtools makewindows • 664 views
ADD COMMENT
2
Entering edit mode
12 months ago

Your genome file used as -g file.txt is not valid for the purpose

the genome file should contain chromosome name and size as tab separated fields, a separate chromosome per line,

chr1 2000
chr2 1000
chr3 4000
[...]

what you use as genome file appears to be another bedfile

ADD COMMENT
0
Entering edit mode

Hi,

Thank you for correcting me; now I know where I went wrong.

Thanks,

Deb

ADD REPLY

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6