MSA ID replacement with a list ID
1
0
Entering edit mode
6.5 years ago
lessismore ★ 1.3k

Hey all,

i have 2 files: 1 is an MSA

>tr|B4GH81|B4GH81_MAIZE
    MCGILAVLGC----------SDCS--QARR-AR----ILACSRR------------------------------------LKHRGPDWSGLYQH------------------------------------------------------------------EGNFLAQQRLAIVSPLSGDQPLFNEDRTVVV-------VANGEIYNHKNVR--KQFT-GAH--    
>tr|A0A1BAAANT6|A0A1BAAANT6_SORBI
    MCGILAVLGC----------SDWS--QARR-AR----VLACARR------------------------------------LKHRGPDWSGLYQH------------------------------------------------------------------EGNFLAQQRLAIVSPLSGDQPLFNEDRTVVV-------VANGEIYNHKNIR--KQFT-GTH--NFTTGSDCEVIIPLYEKYGENFVDMLDGVFAFVLYDTRDRT------YVAARDAIGVNPLYIGWFVVG--------------------LE-GSPDLKAAREVADYLGTIHHEFHFTV-----    
>tr|K3TTT2J1|K3TTT2J1_SETIT
    --------------------------------------------------------------------------------LRHRGPDWSGLHCH------------------------------------------------------------------QDCYLAHQRLAIVDPTSGDQLLYNEDKSVVV-------TVNGEIYNHEELK--AKL--TTH--KFQTVSDCEVIAHLYEEYGEEFVDMLDGMFAFVLLDTRDKS------FIAARDAIGICPLYMGWGLDGSVWFSSEMKALSDDCERFITFPPGHLRWYLHIKKG-SGLRRWFNLPWFL-----E--SI-PST-PYNPLLLQGMFEK----------------

It goes like that for some hundred sequences. The second file is a list of IDs which i would like to replace these one i previously showed in the MSA in this case:

B4GH81_Zea mays
A0A1BAAANT6_Sor bicolor
K3TTT2J1_Set italica

So basically i want to write something that for each line in the list file splits the correct IDs by the underscore and then search the first string in the file. If it finds a match then replace that header with the one in the list.

Hope it is clear, thanks in advance

bash awk python • 1.1k views
ADD COMMENT
1
Entering edit mode
6.5 years ago

use the -f option of sed which uses a list of patterns to replace:

 sed -f <( awk -F _  '{printf("/^>/s/|%s|/|%s|/\n",$1,$2);}' file2 ) file1
ADD COMMENT
0
Entering edit mode

It gives me this

>tr|Zea mays|B4GH81_MAIZE
ADD REPLY
0
Entering edit mode

i solved adding

sed 's\tr|\\g' | cut -d '_' -f1 | awk -F '\||>' '{if (/^>/) print ">" $3 "_" $2; else print $0}'

to your code. thanks

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6