Question: Filtering data base on multiple conditions
0
gravatar for 5594487
4.3 years ago by
55944870
55944870 wrote:

Hello everyone, I am trying to filter the data set "Data2" base on if the "start" value in Data2 fails into the range of the "start" value and the "end" value of Data2. Meanwhile they must be on a same chromosome (the value of the first column "chr"). Here's what I got:

Data 1:

 chr    start      end 
 chr1  4543784  4543829     
 chr1  9760745  9760786    
 chr1  9898702  9898959 
 chr1 12578847 12578879 
 chr1 12662062 12662207 
 chr1 12797766 12798818
 ..........
 chr9 123344149 123345127  
 chr9 123388337 123389640 
 chrY    347178    347228 
 chrY   2876752   2877980 
 chrY   2886982   2888373
 chrY   2890052   2892628

Data 2:

 chr   start
 chr1 3102347 
 chr1 3111668 
 chr1 3521852 
 chr1 3681676
 chr1 3801983 
 chr1 3802020
 ................
 chrY 2891128 
 chrY 2891544
 chrY 2892532 
 chrY 2892627 
 chrY 2895794 
 chrY 2896222

The "chr" value must be the same, so I tried the follows:

Filtered<- Data2[Data2$chr == Data1$chr, Data2$start >= Data1$start, Data2$start <= Data1$end]
Filtered<- Data2[Data2$chr == Data1$chr | Data2$start >= Data1$start | Data2$start <= Data1$end]
Filtered<- subset(Data2, Data2$chr == Data1$chr | Data2$start >= Data1$start | Data2$end <= Data1$end)

None of them work. I started using R since last week so this question may seems silly to many. I have been googling and scratching my head since yesterday. Thank you very much in advance for any advise!

R • 1.0k views
ADD COMMENTlink modified 4.3 years ago by RamRS30k • written 4.3 years ago by 55944870
3

Why not bedtools intersect? Here.

ADD REPLYlink written 4.3 years ago by venu6.7k
1

or at the very least, Bioconductor::GenomicRanges

ADD REPLYlink written 4.3 years ago by russhh5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1718 users visited in the last hour