Question: Mapping gene names to GO ids
0
gravatar for Ric
3.3 years ago by
Ric330
Australia
Ric330 wrote:

Hello, I downloaded goa_uniprot_all.gaf 4.4.GB) as shown below:

!gaf-version: 2.1
!
!This file contains all GO annotations and gene product information for proteins in the UniProt KnowledgeBase (UniProtKB),
!IntAct protein complexes, and RNAcentral identifiers.
!
!Generated: 2017-09-25 14:48 
!GO-version: http://purl.obolibrary.org/obo/go/releases/2017-09-23/extensions/go-plus.owl
!
UniProtKB       OEL25522.1  moeA5           GO:0003824      GO_REF:0000002  IEA     InterPro:IPR015421|InterPro:IPR015422   F       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                
UniProtKB       XP_021321391.1  moeA5           GO:0003870      GO_REF:0000002  IEA     InterPro:IPR010961      F       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                    
UniProtKB       ABQ44355.1  moeA5           GO:0009058      GO_REF:0000002  IEA     InterPro:IPR004839      P       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                   
UniProtKB       XP_004953070.1  moeA5           GO:0030170      GO_REF:0000002  IEA     InterPro:IPR004839|InterPro:IPR010961   F       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro                
UniProtKB       XP_004953070.1  moeA5           GO:0033014      GO_REF:0000002  IEA     InterPro:IPR010961      P       MoeA5   A0A000_9ACTN|moeA5      protein taxon:35758     20170923        InterPro

I have also a file which contain my mapped trinity contings to swissprot as shown below:

target_id       ens_gene
lcl|ScwjSwM_1   OEL25522.1
lcl|ScwjSwM_2   XP_021321391.1
lcl|ScwjSwM_3   ABQ44355.1
lcl|ScwjSwM_4   XP_004953070.1

To get the mapping the 2nd columns from both files could be used to create a file which has the following 2 columns (contig names, GO ids) e.g. lcl|ScwjSwM_4,GO:0030170|GO:0033014. What would be the best way to do it?

Thank you in advance.

go rna-seq R • 1.1k views
ADD COMMENTlink modified 3.3 years ago by e.rempel980 • written 3.3 years ago by Ric330
0
gravatar for Pierre Lindenbaum
3.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

To get the mapping the 2nd columns from both files could be used to create a file which has the following 2 columns (contig names, GO ids) e.g. lcl|ScwjSwM_4,GO:0030170|GO:0033014. What would be the best way to do it?

use join

join -t $'\t' -1 2 -2 2 \
  <(sort -t $'\t' -k2,2  goa_uniprot_all.gaf ) \
  <(sort -t $'\t' -k2,2 swissprot.tsv )

followed by a cut command to extract the column...

ADD COMMENTlink written 3.3 years ago by Pierre Lindenbaum133k
0
gravatar for e.rempel
3.3 years ago by
e.rempel980
Germany, Heidelberg
e.rempel980 wrote:

Hi,

if you would like to use R for your analysis, you could use command merge.

ADD COMMENTlink written 3.3 years ago by e.rempel980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2598 users visited in the last hour
_