Question: How to extend variable length intervals to the same final length?
1
gravatar for Ian
2.0 years ago by
Ian5.4k
University of Manchester, UK
Ian5.4k wrote:

I have a large set of footprint intervals that range from 11 to 25bp For the purpose of motif discovery I would like to extend all intervals to, for example, 50bp. Intervals should be extended equally from both sides. I would usually use 'bedtools slop' for fixed length intervals, but this would not appear to work with variable length.

It would be great if anyone could advise me how to use bedtools, or something else. I have a nagging feeling I am missing something obvious, so apologies in advance!

ADD COMMENTlink modified 2.0 years ago by Alex Reynolds27k • written 2.0 years ago by Ian5.4k
2
gravatar for Alex Reynolds
2.0 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Here's a way that I think should extend both ends of BED elements to the desired target length:

$ TARGET_LENGTH=50
$ awk -vF=${TARGET_LENGTH} 'BEGIN{ OFS="\t"; }{ len=$3-$2; diff=F-len; flank=int(diff/2); upflank=downflank=flank; if (diff%2==1) { downflank++; }; print $1, $2-upflank, $3+downflank; }' in.bed | sort-bed - > out.bed

Non-even length elements or a non-even target length will require flank lengths that are unequal. Sounds like this is not a problem.

You might adjust the logic to randomly pick which of upflank or downflank to decrement or increment in this case, so that you don't impart a bias from this adjustment (esp. if original elements are stranded, like footprints that will ultimately be mapped to TF binding sites or other stranded elements), e.g.:

$ TARGET_LENGTH=50
$ awk -vF=${TARGET_LENGTH} 'BEGIN{ OFS="\t"; }{ len=$3-$2; diff=F-len; flank=int(diff/2); upflank=downflank=flank; if (diff%2==1) { if (rand() >= 0.5) { downflank++; } else { upflank--; } }; print $1, $2-upflank, $3+downflank; }' in.bed | sort-bed - > out.bed
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Alex Reynolds27k

Thank you for your answer! I was going to ask how it handles odd lengths. It is OK if one side has an extra base, as long as the final length is the same.

ADD REPLYlink written 2.0 years ago by Ian5.4k

Thanks for the addition. After discussing this with a colleague this morning it was pointed out that finding the mid-point of each region and then extending out works equally well. I knew I had missed something!

ADD REPLYlink written 2.0 years ago by Ian5.4k

Yes, either way gets you to the same answer, but you'd still need to shift the midpoint up or down a base when dividing an even-numbered length in half.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour