parsing problem (awk/bash/python/R)
1
0
Entering edit mode
6.5 years ago
lessismore ★ 1.3k
AT1G05230   ppa Pp3c2_10790
Pp3c14_15980
Pp3c1_24020
Pp3c17_16910
AT1G05230   ppe Prupe.3G218500
AT1G05230   ptr Potri.002G230200
Potri.014G152000

hello, here you have 3 columns ( query ID, ortholog species, ortholog ID) when a gene has multiple orthologs then other ones are displayed on the following lines

how could i reach this output?

AT1G05230   ppa Pp3c2_10790
AT1G05230   ppa Pp3c14_15980
AT1G05230   ppa Pp3c1_24020
AT1G05230   ppa Pp3c17_16910
AT1G05230   ppe Prupe.3G218500
AT1G05230   ptr Potri.002G230200
AT1G05230   ptr Potri.014G152000

thanks in advance

bash awk python • 928 views
ADD COMMENT
0
Entering edit mode

Correct me if I'm wrong, but this just looks like you would like to sort the file by the first column?

ADD REPLY
0
Entering edit mode

Nope, first column contains the query

AT1G05230   ppa Pp3c2_10790
Pp3c14_15980
Pp3c1_24020
Pp3c17_16910

i want to repeat the query and the target species ID for the last 3 lines which in this case contain the other 3 orthologs

ADD REPLY
2
Entering edit mode
6.5 years ago
5heikki 11k
awk 'BEGIN{OFS=FS="\t"}{if(NF==3){FIRST=$1;SECOND=$2;print $1,$2,$3}else{print FIRST,SECOND,$1}}' input.tsv

Edit. The less code the better..

awk 'BEGIN{OFS=FS="\t"}{if(NF==3){FIRST=$1;SECOND=$2}{print FIRST,SECOND,$NF}}' input.tsv
ADD COMMENT

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6