Question: Translate long names of proteins to short names
0
gravatar for muk.smita
18 months ago by
muk.smita0
muk.smita0 wrote:

I have a list of proteins which are identified as following:

sp|P20930|FILA_HUMAN Filaggrin OS=Homo sapiens OX=9606 GN=FLG PE=1 SV=3
sp|Q5D862|FILA2_HUMAN Filaggrin-2 OS=Homo sapiens OX=9606 GN=FLG2 PE=1 SV=1
sp|P29508|SPB3_HUMAN Serpin B3 OS=Homo sapiens OX=9606 GN=SERPINB3 PE=1 SV=2
sp|Q08188|TGM3_HUMAN Protein-glutamine gamma-glutamyltransferase E OS=Homo sapiens OX=9606 GN=TGM3 PE=1 SV=4
sp|P31025|LCN1_HUMAN Lipocalin-1 OS=Homo sapiens OX=9606 GN=LCN1 PE=1 SV=1
sp|P62805|H4_HUMAN Histone H4 OS=Homo sapiens OX=9606 GN=HIST1H4A PE=1 SV=2

Can I translate these identifiers in a more manageable form?

sequence • 322 views
ADD COMMENTlink modified 18 months ago by Nicolas Rosewick9.3k • written 18 months ago by muk.smita0

What is for you 'more manageable'? Would that be the uniprot name (e.g., P20930), or Full name (e.g., Filaggrin), or Gene Symbol (FLG)? Be more specific please.

ADD REPLYlink written 18 months ago by Benn8.1k

I would like to know how I can shorten the protein identity to Full name and also gene symbol.

Thank you

ADD REPLYlink written 18 months ago by muk.smita0

You can use awk to extract these.

For full name something like this will work.

cat file.txt | awk 'BEGIN { FS="HUMAN " } { print $2 }' | awk '{ FS=" OS=" } { print $1 }'

For the gene symbols something like this.

cat file.txt | awk 'BEGIN { FS="GN=" } { print $2 }' | awk '{ FS=" PE=" } { print $1 }'
ADD REPLYlink written 18 months ago by Benn8.1k
0
gravatar for Nicolas Rosewick
18 months ago by
Belgium, Brussels
Nicolas Rosewick9.3k wrote:

use cut to extract the second field in your file i.e the UNIPROT id : P20930, Q5D862, etc...

cut -d "|" -f 2

-d defines the separator. Here "|"

-f defines which field you select. Here the 2nd one

ADD COMMENTlink modified 18 months ago • written 18 months ago by Nicolas Rosewick9.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1650 users visited in the last hour
_