Question: Is there any script for retrieving GO entries from PFAM GO entries
0
gravatar for vahapel
4.9 years ago by
vahapel170
Turkey
vahapel170 wrote:

Dear All,

Recently, I did de novo RNA-seq experiment for a non-model fish species and assembled transcripts for functional annotations. Before GO annotation, I blasted my assembled transcripts (1e-10) to InterProScan 5 and got PFAM IDs (File 1). Then, I downloaded GO annotations for each PFAM entries (from there; http://www.geneontology.org/page/download-mappings) and got PFAM IDs with corresponding GO mappings (File 2). I want to compare these two files and save the PFAM IDs with GO annotations for my file 1 entries:

File 1: (my file for analyses, as you notice that PF00015 and PF00016 are not absent in my file)

PF00001
PF00002
PF00003
PF00004
PF00005
PF00006
PF00007
PF00008
PF00009
PF00010
PF00011
PF00012
PF00013
PF00014
PF00017
PF00018

File 2: (downloaded file from the database PFAM IDs with GO annotations)

PF00001    GO:0004930
PF00001    GO:0007186
PF00001    GO:0016021
PF00002    GO:0004930
PF00002    GO:0007186
PF00002    GO:0016021
PF00003    GO:0004930
PF00003    GO:0007186
PF00003    GO:0016021
PF00004    GO:0005524
PF00005    GO:0005524
PF00005    GO:0016887
PF00006    GO:0005524
PF00008    GO:0005515
PF00009    GO:0003924
PF00009    GO:0005525
PF00010    GO:0046983
PF00013    GO:0003723
PF00014    GO:0004867
PF00015    GO:0004871
PF00015    GO:0007165
PF00015    GO:0016020
PF00016    GO:0000287
PF00017    GO:0005515
PF00018    GO:0005515

My question is that is there any perl or phyton script for comparing these two files and generating an output including File 1 PFAM IDs with GO annotations.

Thank you all in advance

Best regards

-vahap-

rna-seq • 1.7k views
ADD COMMENTlink modified 4.9 years ago by SES8.2k • written 4.9 years ago by vahapel170
1
gravatar for pld
4.9 years ago by
pld4.8k
United States
pld4.8k wrote:

Here's an approach:

import sys
import csv

def add_safe(dict, key, val):
    try:
        dict[key].append(val)

    except KeyError:
        dict[key] = [val]

def __main__():
    infi  = sys.argv[1]
    mapfi = sys.argv[2]
    outfi = sys.argv[3]

    with open(infi, 'rb') as fi:
        raw_pfams = [x[0] for x in list(csv.reader(fi, delimiter='\t'))]

    with open(mapfi, 'rb') as fi:
        raw_map = list(csv.reader(fi, delimiter='\t'))[:-1]

    id_map = dict()
    for id in raw_map:
        add_safe(id_map, id[0], id[1])

    with open(outfi, 'w') as fi:
        for id in raw_pfams:
            try:
                fi.write(id + '\t' + ';'.join(id_map[id]) + '\n')
            except KeyError:
                print 'ERROR: {} not found in reference map!'.format(id)

__main__()

 

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by pld4.8k

Dear joe.cornish826,

Many thanks for the prompt reply. But, unfortunately, I could not execute this script :( , because I am newbie ! Do you give me a bit additional info for how to execute it ? Many thanks again.

Best regards,

-vahap-

ADD REPLYlink written 4.9 years ago by vahapel170

Usage is as follows, assuming you save the script as pfam-to-go.py

python pfam-to-go.py <file 1 from your post> <file 2 from your post> <desired output file>
 

ADD REPLYlink written 4.9 years ago by pld4.8k
1
gravatar for SES
4.9 years ago by
SES8.2k
Vancouver, BC
SES8.2k wrote:

You should be able to do this with a simple join:

join <(sort -nk1.3,1.7 file1) <(sort -nk1.3,1.7 file2)

The sort is just to be safe, and perhaps this command could even be shortened/improved. Note that you can get at these mappings pretty easy with HMMER2GO without developing custom scripts/wrappers. Assuming you have a file of sequences to search for domains, you can find matches with:

hmmer2go search -i genes_orfs.faa -d Pfam-A.hmm

and then you can map GO terms to your matches:

hmmer2go mapterms -i genes_orfs_Pfam-A.tblout -p pfam2go -o genes_orfs_Pfam-A_GO.tsv

I didn't provide much description about what is going on with those commands, but hopefully that helps (feel free to ask questions!).

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by SES8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1273 users visited in the last hour