Protein to gene name conversion
0
0
Entering edit mode
2.7 years ago
mm2568 • 0

I have an R data frame containing peptides in UniProtKB ID format. I used the Uniprot online conversion tool to convert these IDs to gene names. The output is a .txt file with the protein ID in column 1 and gene name in column 2. The issue is that the tool didn't find gene names for all of the protein IDs, so it's not a 1 to 1 list. There may be some peptides that don't have gene names assigned to them.

Examples of the peptide names: 
Q6SPF0
P52701
Q8NAF0
H0Y6H0
P19338
P19338

So, I want to iterate through the column of my peptide list in R and if there is a match in the .txt file, I want to create a new column in my data frame in R and paste the gene name. I would appreciate some help as to how to do this.

Is there a way I could do this all in R rather than having to use the online converter tool?

ID R • 514 views
ADD COMMENT
0
Entering edit mode

Using EntrezDirect:

$ more id.txt
Q6SPF0
P52701
Q8NAF0
P19338

$ for i in `cat id.txt`; do printf ${i}"\t"; esearch -db protein -query ${i} | elink -target gene | esummary | xtract -pattern DocumentSummary -element Name; done
Q6SPF0  SAMD1
P52701  MSH6
Q8NAF0  ZNF579
P19338  NCL

If there is no entry for a particular accession it generates a nasty error. No way around that.

ADD REPLY

Login before adding your answer.

Traffic: 2378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6