Split and recombine data
2
1
Entering edit mode
2.8 years ago
ersan ▴ 10

I have a data like that. I want to remove last splitted part and merge them.

aa1|bb1|cc1|dd1|ee1|ff1 ~ aa2|bb2|cc2|dd2|ee2|ff2
zz1|yy1|xx1|vv1|uu1|tt1 ~ zz2|yy2|xx2|vv2|uu2|tt2

output data:

aa1,bb1,cc1,dd1,ee1 </br> aa2,bb2,cc2,dd2,ee
zz1,yy1,xx1,vv1,uu1 </br> zz2,yy2,xx2,vv2,uu2

Bioinformatics context - Orphanet:

Input:

590|Congenital myasthenic syndrome|1-9 / 1 000 000|Autosomal dominant<br>or&nbsp;Autosomal recessive|Infancy<br>Neonatal|254190 254210 254300 601462 603034 605809 608930 608931 610542 614198 614750 615120 616040 616227 616228 616304 616313 616314 616321 616322 616323 616324 616325 616326 616330 616720 617143~98913|Postsynaptic congenital myasthenic syndromes|-|-|-|254300 601462 605809 608930 608931 614198 615120 616304 616313 616314 616321 616322 616323 616324 616325 616326 616720~98914|Presynaptic congenital myasthenic syndromes|-|Autosomal dominant<br>or&nbsp;Autosomal recessive|-|254210 615120 616040 616330 616720 617143

Output:

https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=590" target="_blank">Congenital myasthenic syndrome [1-9 / 1 000 000] [Infancy-Neonatal]
https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=98913" target="_blank">Postsynaptic congenital myasthenic syndromes [-] [-]
https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=98914" target="_blank">Presynaptic congenital myasthenic syndromes [-] [-]
split R • 607 views
ADD COMMENT
1
Entering edit mode

Side note: </br> is not a valid HTML tag, <br> is. Like <img> and <hr>, <br> does its functionality all by itself and does not need a closing tag. Even <br /> is XHTML and no longer relevant.

ADD REPLY
1
Entering edit mode
2.8 years ago
zx8754 10k

Split on "~", then split on "|", and paste all bits together:

Note: This website is rendering the output wrong, the output you will see on your R console might be different. But the solution is still valid, you might want to adjust rows where I am pasting the bits together to match your exact required output.

x <- "590|Congenital myasthenic syndrome|1-9 / 1 000 000|Autosomal dominant
orĀ Autosomal recessive|Infancy
Neonatal|254190 254210 254300 601462 603034 605809 608930 608931 610542 614198 614750 615120 616040 616227 616228 616304 616313 616314 616321 616322 616323 616324 616325 616326 616330 616720 617143~98913|Postsynaptic congenital myasthenic syndromes|-|-|-|254300 601462 605809 608930 608931 614198 615120 616304 616313 616314 616321 616322 616323 616324 616325 616326 616720~98914|Presynaptic congenital myasthenic syndromes|-|Autosomal dominant
orĀ Autosomal recessive|-|254210 615120 616040 616330 616720 617143" sapply(strsplit(x, "~", fixed = TRUE)[[ 1 ]], function(i){ d <- strsplit(i, "|", fixed = TRUE)[[ 1 ]] paste0('https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=', d[ 1 ], 'target="_blank">', d[ 2 ], ' [', d[ 3 ], '] [', d[ 5 ], ']') }, USE.NAMES = FALSE) [1] "https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=590target=\"_blank\">Congenital myasthenic syndrome [1-9 / 1 000 000] [Infancy
Neonatal]" [2] "https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=98913target=\"_blank\">Postsynaptic congenital myasthenic syndromes [-] [-]" [3] "https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=98914target=\"_blank\">Presynaptic congenital myasthenic syndromes [-] [-]"
ADD COMMENT
1
Entering edit mode
2.8 years ago

output:

$ sed 's/\(.*\)|\w\+\s~\s\(.*\)|.*/\1 \<\/br\> \2/g;s/|/,/g' test.txt 

aa1,bb1,cc1,dd1,ee1 </br> aa2,bb2,cc2,dd2,ee2
zz1,yy1,xx1,vv1,uu1 </br> zz2,yy2,xx2,vv2,uu2

input:

$ cat test.txt 

aa1|bb1|cc1|dd1|ee1|ff1 ~ aa2|bb2|cc2|dd2|ee2|ff2
zz1|yy1|xx1|vv1|uu1|tt1 ~ zz2|yy2|xx2|vv2|uu2|tt2

for URL:

output:

$ sed 's/\~/\n/g;s/|/\t/g' file.txt | cut -f1,2 | sed 's/\(.*\)\t\(.*\)/https\:\/\/www.orpha.net\/consor\/cgi-bin\/OC_Exp\.php\?lng=EN\&Expert\=\1" target="_blank">\2 \[-\]\[-\]/g'

https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=590" target="_blank">Congenital myasthenic syndrome [-][-]
https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=98913" target="_blank">Postsynaptic congenital myasthenic syndromes [-][-]
https://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=EN&Expert=98914" target="_blank">Presynaptic congenital myasthenic syndromes [-][-]

input;

$ cat file.txt 

590|Congenital myasthenic syndrome|1-9 / 1 000 000|Autosomal dominant<br>or&nbsp;Autosomal recessive|Infancy<br>Neonatal|254190 254210 254300 601462 603034 605809 608930 608931 610542 614198 614750 615120 616040 616227 616228 616304 616313 616314 616321 616322 616323 616324 616325 616326 616330 616720 617143~98913|Postsynaptic congenital myasthenic syndromes|-|-|-|254300 601462 605809 608930 608931 614198 615120 616304 616313 616314 616321 616322 616323 616324 616325 616326 616720~98914|Presynaptic congenital myasthenic syndromes|-|Autosomal dominant<br>or&nbsp;Autosomal recessive|-|254210 615120 616040 616330 616720 617143
ADD COMMENT

Login before adding your answer.

Traffic: 1271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6