Entering edit mode
12.6 years ago
Rad
▴
810
Hello everyone,
I have a Bed file with 5 columns where the 4th is a unique ID and the 5th is a geneID
I was trying to play with bedtools to cluster this bed file by gene ID and output a single line for each gene with the range (chr start end) of the region. Basically I want to cluster intervals. Example
chr1 10 1000 ID1 GeneID1
chr1 20 1300 ID2 GeneID1
chr1 1400 1600 ID3 GeneID1
I'm trying to get an output like
chr1 10 1600 GeneID1
Can anyone tell me if playing with bedtools is the best way of doing this or is it possible just by awk ? any idea ?
Thank you
Awesome, I added "drop table if exists t;" before table creation so that we can use it within a script Thx man !
you can also add some indexes on chrom and name if you have a large input...
I just could not believe this awesomeness.