how to adjust size of intervals in bed12 format?
1
1
Entering edit mode
7.3 years ago
mt1022 ▴ 310

I have a file containing genomic intervals in bed12 format. I want to adjust the size of each interval (specifically, remove N base from both end of each interval). bedtools slop seems only works with bed6. Is there a convenient way to achive this for bed12 intervals?

bedtools alignment bed • 2.4k views
ADD COMMENT
0
Entering edit mode
7.3 years ago

When you say from both ends, you mean from start and end (2d and 3d columns) ?

If so this should work with awk if N=10:

awk '{print  $1 "\t" ($2+10) "\t" ($3-10) "\t" $4  "\t" $5 ...  "\t" $12}' file

EDIT : So the answer to my question was "No, I want to do that for all blocks". This is a little harder but here is a possible solution :

cat -n file.bed12 |         # add an unique identifier  
bedtools expand -c 12,13 |  # expand file based on blocks
# filter out blocks < 20 and reduce the others by 10 on each end
awk 'BEGIN {OFS="\t"}; $12 >20 {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12-20,$13+10}' |
bedtools groupby -g 1,2,3,4,5,6,7,8,9,10,11 -c 12,13 -o collapse,collapse | # collapse back into bed12 file
cut -f 2-13 > file.final.bed12 # suppress identifier

It worked with this test :

# original bed12 entry :
echo "chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,9,1189, 0,739,1347," | tr " " "\t"
chr1    11873   14409   uc001aaa.3  0   +   11873   11873   0   3   354,9,1189, 0,739,1347,

# cropped blocks
echo "chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,9,1189, 0,739,1347," | tr " " "\t"| cat -n | bedtools expand -c 12,13 | awk 'BEGIN {OFS="\t"}; $12 >20 {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12-20,$13+10}' | bedtools groupby -g 1,2,3,4,5,6,7,8,9,10,11 -c 12,13 -o collapse,collapse | cut -f 2-13
chr1    11873   14409   uc001aaa.3  0   +   11873   11873   0   3   334,1169    10,1357
ADD COMMENT
0
Entering edit mode

Hi. You should consider the size of each block (column 11). if the first block is shorter than 10, then the start should adjusted based on the original start and the start position of the next block (start posn of each block is in column 12).

ADD REPLY
0
Entering edit mode

I see. I don't know any tool that can do that directly. If you know a bit of R or python, you can always parse the file manually but I guess you want to avoid that if you asked this question in the first place.

Where does your bed12 file come from ? If it is an option, it could be easier to go back to an expended file (where the blocks are on separated lines, like the exons in a GTF file), do the "slop", then convert it back to bed12.

ADD REPLY
0
Entering edit mode

I didn't intend to slop each block. For example, if first block is 8nt in length, I want to discard this block and then remove 2 nt from next block so that I removed 8 + 2 nt = 10nt in total. Anyway, I have written a script for this: bed12_slop.py. I haven't test it on all possible extreme conditions yet.

ADD REPLY
0
Entering edit mode

Ok, sorry I misunderstood. Good job for your script.

Carlo

ADD REPLY

Login before adding your answer.

Traffic: 2260 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6