Filtering data base on multiple conditions
0
0
Entering edit mode
7.7 years ago
5594487 • 0

Hello everyone, I am trying to filter the data set "Data2" base on if the "start" value in Data2 fails into the range of the "start" value and the "end" value of Data2. Meanwhile they must be on a same chromosome (the value of the first column "chr"). Here's what I got:

Data 1:

 chr    start      end 
 chr1  4543784  4543829     
 chr1  9760745  9760786    
 chr1  9898702  9898959 
 chr1 12578847 12578879 
 chr1 12662062 12662207 
 chr1 12797766 12798818
 ..........
 chr9 123344149 123345127  
 chr9 123388337 123389640 
 chrY    347178    347228 
 chrY   2876752   2877980 
 chrY   2886982   2888373
 chrY   2890052   2892628

Data 2:

 chr   start
 chr1 3102347 
 chr1 3111668 
 chr1 3521852 
 chr1 3681676
 chr1 3801983 
 chr1 3802020
 ................
 chrY 2891128 
 chrY 2891544
 chrY 2892532 
 chrY 2892627 
 chrY 2895794 
 chrY 2896222

The "chr" value must be the same, so I tried the follows:

Filtered<- Data2[Data2$chr == Data1$chr, Data2$start >= Data1$start, Data2$start <= Data1$end]
Filtered<- Data2[Data2$chr == Data1$chr | Data2$start >= Data1$start | Data2$start <= Data1$end]
Filtered<- subset(Data2, Data2$chr == Data1$chr | Data2$start >= Data1$start | Data2$end <= Data1$end)

None of them work. I started using R since last week so this question may seems silly to many. I have been googling and scratching my head since yesterday. Thank you very much in advance for any advise!

R • 1.5k views
ADD COMMENT
3
Entering edit mode

Why not bedtools intersect? Here.

ADD REPLY
1
Entering edit mode

or at the very least, Bioconductor::GenomicRanges

ADD REPLY

Login before adding your answer.

Traffic: 2771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6