command to make a coordinate file from the header
2
0
Entering edit mode
6 weeks ago
harry ▴ 30

I have one file which looks like below: As you can see there is a name after and before pipe(|). From that Isolate coordinate and make a file of the chromosome, start and end coordinates.

exon2_ENST00000559151_15-44409519-44409847|>exon2_ENST00000559151_15-44409519-44409847
exon8_ENST00000264596_4-177353308-177353728|>exon8_ENST00000264596_4-177353308-177353728
exon4_ENST00000261220_12-95494056-95494217|exon6_ENST00000261220_12-95496004-95496098
exon6_ENST00000438023_9-6880012-6880061|exon8_ENST00000438023_9-6893095-6893232
exon5_ENST00000219481_16-410243-410367|exon7_ENST00000219481_16-410972-411076
exon6_ENST00000244230_2-71139795-71139862|exon8_ENST00000244230_2-71144428-71144538
exon2_ENST00000316218_3-123089168-123089294|exon3_ENST00000316218_3-123092355-123092442
exon2_ENST00000309794_17-82459992-82460072|exon5_ENST00000309794_17-82472564-82472698
exon2_ENST00000462685_2-73932598-73932683|exon5_ENST00000462685_2-73958146-73958245


As in the 3rd row, you can see these coordinates are before (|) 12-95494056-95494217 and 12-95496004-95496098 these are after (|) so I want to make a 3 column file in which 1st column is 12 and the second column is the lowest number from the before (|) and in 3rd column is the highest number from the after (|). like 12 95494056 95496098. Likewise, it does for all the names and makes the chromosome, start and end coordinates column as below.

15  44409519    44409847
4   177353308   177353728
12  95494056    95496098
9   6880012 6893232
16  410243  411076
2   71139795    71144538
3   123089168   123092442
17  82459992    82472698
2   73932598    73958245


Is it possible to do so by any command, I did search it but can't find anything which can do like this. Thanks in advance

chromosome coordinate • 216 views
3
Entering edit mode
6 weeks ago
awk -F "[_|-]+"  '{print $3,(int($4)<int($9)?$4:$9),(int($5)>int($10)?$5:$10)}' < input.txt  ADD COMMENT 2 Entering edit mode 6 weeks ago $ awk -F '_|-|\|' -v OFS="\t" '{print $3,$4,$5;print$8,$9,$10}' test.txt  | datamash -g 1 min 2 max 3

15  44409519    44409847
4   177353308   177353728
12  95494056    95496098
9   6880012 6893232
16  410243  411076
2   71139795    71144538
3   123089168   123092442
17  82459992    82472698
2   73932598    73958245