Question: find overlapping sequences and report min start and max end positions
0
gravatar for onkarnath89
4 months ago by
onkarnath890 wrote:

I have a fasta/bed file in which there are some sequences which overlap with each other.

for eg:

1   57267   59067

1   63165   63758

1   63298   64137

1   67285   67596

here you can notice "1 63298 64137" is overlapping with "1 63165 63758". Now I want to obtain one file where minimum start position and maximun end position of these are reported instead of all these regions expected output:

1   57267   59067

1   63165   64137

1   67285   67596

this is just an example from that file. there are some locations where 4-5 sequence regions overlaps.

Kindly help

rna-seq sequence next-gen R • 132 views
ADD COMMENTlink modified 4 months ago by genomax54k • written 4 months ago by onkarnath890
2
gravatar for Nicolas Rosewick
4 months ago by
Belgium, Brussels
Nicolas Rosewick6.5k wrote:

In R use GenomicRanges and its reduce fonction :

a <- read.table("file.bed",sep="\t",as.is=T,header=F)
colnames(a) <- c("chr","start","end")
a.gr <- GRanges(a$chr,IRanges(a$start,a$end))
# use reduce() to merge overlapping ranges
res <-  reducea.gr) 
write.table("out.bed",as.data.frame(res),col.names=F,row.names=F,quote=F,sep="\t")

Here is the result :

   1 57267 59067  1801      *
   1 63165 64137   973      *
   1 67285 67596   312      *
ADD COMMENTlink modified 4 months ago • written 4 months ago by Nicolas Rosewick6.5k

Thank you Nicolas for this suggestion. It worked

I found one more solution which was quite helpful.

bedtools merge -i input.bed -c 4 -o collapse > merged_unique.bed

Hope this will help others too.

ADD REPLYlink modified 4 months ago • written 4 months ago by onkarnath890

As you tagged the question with the R tag I guessed it should be in R. But using bedools is also good

ADD REPLYlink written 4 months ago by Nicolas Rosewick6.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour