Question: Split bed file per sequence length
0
gravatar for cg2827
4.3 years ago by
cg28270
United States
cg28270 wrote:

I need to split my bed file into files with the same sequence length, but my original input file is not the whole chromosome, but a list of annotations with variable lengths and gaps between them.  BedTools windowMaker will split the fragments into the requested windows size only if the original fragment is larger than the window, but in my case it does not work as I want.

For instance, suppose I have as an input the following:

chr1    0    90

chr1    149    200

chr1    249    300

chr1    310    510

And want a bed files with 100bp such as

File 1:

chr1    0     90

chr1   149  159

File 2: 

chr1   159   200

chr1   249   300

chr1   310   318

And so on...

Or something like:

chr1    0     90     block1

chr1   149  159   block1

chr1   159   200  block2

chr1   249   300  block2

chr1   310   318  block2

Bedtools outputs this instead:

chr1    0    90

chr1    149    200

chr1    249    300

chr1    310    410

chr1    410    510

Is there a way to define the output based on sequence length instead of windows?

 

bed • 2.0k views
ADD COMMENTlink modified 4.3 years ago by Alex Reynolds28k • written 4.3 years ago by cg28270

Sorry, but I don't get what you want to achieve. In file1 there are coordinates from 0 to 159 and there lengths are 90,10. In file2 coordinates are from 159 to 318 and lengths are: 41,51,8. So where is this "100bp"?

ADD REPLYlink written 4.3 years ago by PoGibas4.8k

90+10=100

41+51+8=100

What I want to achieve is to have always 100bp in each file, whatever the number of line is.

ADD REPLYlink written 4.3 years ago by cg28270
0
gravatar for Alex Reynolds
4.3 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

You could write a script in awk or Perl to do this pretty easily. Just read the start and stop positions, track how many bases are "left" in a current window/"block" to print out, and loop through all your elements until there are none left.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Alex Reynolds28k

Thanks Alex. I thought about using bedtools to make windows of 1bp and then split the files according to the number of sites I want per output file - say 100 lines (=bp) according to the example - but this might be cumbersome. Do you envision something more straightforward?  

ADD REPLYlink written 4.3 years ago by cg28270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 567 users visited in the last hour