Question: How To Intersect A Range With Single Positions
3
gravatar for roll
6.3 years ago by
roll310
United Kingdom
roll310 wrote:

Hi,

Is there any tool that will find whether a position fall into a certain region other than awk or bedtools intersect? That is, lets's assume i have a range with position

chr1:10000-20000

and i have a list of positions

chr1 100 1 2 3
chr1 12000 1 2 3
chr1 15000 1 2 3
chr1 25000 1 2 3

So that it will catch chr1 12000 1 2 3 and chr1 15000 1 2 3

I am trying this with awk but it is too slow considering i have lots of positions and ranges. I thought about bedtools intersect but then again, i have some extra information that i dont know how to keep that extra information if i convert the single positions into bed format.

Is there any tools that does this faster than awk?

bed intersect • 3.8k views
ADD COMMENTlink modified 6.3 years ago by Neilfws48k • written 6.3 years ago by roll310

If you want to keep that information in a BED file, you can just separate the last 3 columns by commas and make them the 4th column. You'll have to reparse the output, of course.

ADD REPLYlink written 6.3 years ago by Devon Ryan95k

OR write a short script that will merge the new intersect file with the original file with the extra information.

ADD REPLYlink written 6.3 years ago by Ashutosh Pandey12k
4
gravatar for dariober
6.3 years ago by
dariober11k
WCIP | Glasgow | UK
dariober11k wrote:

Convert your positions to bed as you said and use intersectBed to give all the columns in your a and b files with options -wa and -wb (or I miss something?):

cat region.bed 
chr1    10000    20000

cat pos.txt 
chr1    100    1    2    3
chr1    12000    1    2    3
chr1    15000    1    2    3
chr1    25000    1    2    3

 awk 'BEGIN{OFS="\t"} {print $1, $2-1, $2, $3, $4, $5}' pos.txt \
    | intersectBed -a - -b region.bed -wa -wb

chr1    11999    12000    1    2    3    chr1    10000    20000
chr1    14999    15000    1    2    3    chr1    10000    20000

Hope this helps!
Dario

ADD COMMENTlink modified 8 months ago by RamRS27k • written 6.3 years ago by dariober11k

Great. Thanks for the info. It has been a while since I have updated myself :-)

ADD REPLYlink written 6.3 years ago by Ashutosh Pandey12k

I've been trying to use this example for my situation and it prints to the screen well but when I try to output it to a file, it doesn't work. This is what I've tried:

awk 'BEGIN{OFS="\t"} \
    {print $1, $2-1,$2, $3, $4, $5, $6, $7, $8, $9, $10}' Nonsensetranslated.txt \
    | intersectBed -a - -b final_starts_stops.bed -wa -wb > done.txt

My error informs me:

awk: can't open file
 input record number 15577, file
 source line number 1
ADD REPLYlink modified 8 months ago by RamRS27k • written 5.1 years ago by christylynn0020
1
gravatar for Alex Reynolds
6.3 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Using BEDOPS bedmap might be helpful, to instead get an answer in a more concise range-to-mapped-positions format:

$ sort-bed ranges.unsorted.bed > ranges.bed
$ awk '{print $1"\t"$2"\t"$2+1"\t"$3"\t"$4"\t"$5}' positions.txt \
    | sort-bed - \ 
    | bedmap --echo --echo-map ranges.bed - \
    > answer.bed

Results will show each range, along with any mapped positions that overlap that range:

chr1   10000    20000 | chr1   12000 12001  1  2  3;  chr1   15000 15001  1  2  3

You can strip the extra column out with downstream tools (another awk statement, for instance).

ADD COMMENTlink modified 8 months ago by RamRS27k • written 6.3 years ago by Alex Reynolds30k
1
gravatar for Neilfws
6.3 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Another solution: the findOverlaps() method from the R package IRanges. It's very fast, once you get everything into the right format and figure out the R syntax.

Some example code, using your example data:

library(IRanges)
# first, create data frames
df1 <- data.frame(chr = "chr1", start = 10000, end = 20000)
df2 <- data.frame(chr = rep("chr1", 4), posn = c(100, 12000, 15000, 250000), x = rep(1, 4), y = rep(2, 4), z = rep(3, 4))
# next, create IRanges objects
df1.ir <- IRanges(start = df1$start, end = df1$end, names = df1$chr)
df2.ir <- IRanges(start = df2$posn, end = df2$posn, names = df2$chr)
# convert the "subject" to an interval tree (much faster)
df1.tree <- IntervalTreedf1.ir)
# now find the overlaps
o <- findOverlapsdf2.ir, df1.tree, type = "within")
df2[o@queryHits,]
#       chr  posn x y z
#   2  chr1 12000 1 2 3
#   3  chr1 15000 1 2 3
ADD COMMENTlink modified 8 months ago by RamRS27k • written 6.3 years ago by Neilfws48k

Just adding on to this answer. Please see here. Basically its better to use GRanges when chromosomal context is required.

ADD REPLYlink written 19 months ago by satyanarayan.iiitm0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 737 users visited in the last hour