Question: how to adjust size of intervals in bed12 format?
1
gravatar for mt1022
2.5 years ago by
mt1022200
China
mt1022200 wrote:

I have a file containing genomic intervals in bed12 format. I want to adjust the size of each interval (specifically, remove N base from both end of each interval). bedtools slop seems only works with bed6. Is there a convenient way to achive this for bed12 intervals?

alignment bed bedtools • 865 views
ADD COMMENTlink modified 2.5 years ago by Carlo Yague4.6k • written 2.5 years ago by mt1022200
0
gravatar for Carlo Yague
2.5 years ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

When you say from both ends, you mean from start and end (2d and 3d columns) ?

If so this should work with awk if N=10:

awk '{print  $1 "\t" ($2+10) "\t" ($3-10) "\t" $4  "\t" $5 ...  "\t" $12}' file

EDIT : So the answer to my question was "No, I want to do that for all blocks". This is a little harder but here is a possible solution :

cat -n file.bed12 |         # add an unique identifier  
bedtools expand -c 12,13 |  # expand file based on blocks
# filter out blocks < 20 and reduce the others by 10 on each end
awk 'BEGIN {OFS="\t"}; $12 >20 {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12-20,$13+10}' |
bedtools groupby -g 1,2,3,4,5,6,7,8,9,10,11 -c 12,13 -o collapse,collapse | # collapse back into bed12 file
cut -f 2-13 > file.final.bed12 # suppress identifier

It worked with this test :

# original bed12 entry :
echo "chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,9,1189, 0,739,1347," | tr " " "\t"
chr1    11873   14409   uc001aaa.3  0   +   11873   11873   0   3   354,9,1189, 0,739,1347,

# cropped blocks
echo "chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,9,1189, 0,739,1347," | tr " " "\t"| cat -n | bedtools expand -c 12,13 | awk 'BEGIN {OFS="\t"}; $12 >20 {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12-20,$13+10}' | bedtools groupby -g 1,2,3,4,5,6,7,8,9,10,11 -c 12,13 -o collapse,collapse | cut -f 2-13
chr1    11873   14409   uc001aaa.3  0   +   11873   11873   0   3   334,1169    10,1357
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Carlo Yague4.6k

Hi. You should consider the size of each block (column 11). if the first block is shorter than 10, then the start should adjusted based on the original start and the start position of the next block (start posn of each block is in column 12).

ADD REPLYlink written 2.5 years ago by mt1022200

I see. I don't know any tool that can do that directly. If you know a bit of R or python, you can always parse the file manually but I guess you want to avoid that if you asked this question in the first place.

Where does your bed12 file come from ? If it is an option, it could be easier to go back to an expended file (where the blocks are on separated lines, like the exons in a GTF file), do the "slop", then convert it back to bed12.

ADD REPLYlink written 2.5 years ago by Carlo Yague4.6k

I didn't intend to slop each block. For example, if first block is 8nt in length, I want to discard this block and then remove 2 nt from next block so that I removed 8 + 2 nt = 10nt in total. Anyway, I have written a script for this: bed12_slop.py. I haven't test it on all possible extreme conditions yet.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by mt1022200

Ok, sorry I misunderstood. Good job for your script.

Carlo

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Carlo Yague4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour