Question: How to get protein ID from gene ID (batch entrez)
0
gravatar for alansoffan
5.2 years ago by
alansoffan0
United Kingdom
alansoffan0 wrote:

Hi

can someone suggest me How to get protein ID from gene ID (batch entrez).

I have hundreds of gene name like  AaeL_AAEL004207  with gene ID 5564359. Manually we can get the protein ID one by one, the problem I have hundreds of that, obviously it seem not a good idea, any one can suggest me..?

thanks

 

 

gene • 3.5k views
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by alansoffan0

Thanks a lot for the suggestions,..well I haven't try that hopefully it will work

ADD REPLYlink written 5.2 years ago by alansoffan0
2
gravatar for 5heikki
5.2 years ago by
5heikki8.6k
Finland
5heikki8.6k wrote:

With Entrez Direct:

epost -db gene -id 5564359 | elink -target protein | efetch -format uid
157105044

You can include multiple gene IDs (at least 500) in the -id part, separated by commas. Here's a script:

#!/bin/bash
exist=$(which epost)
if [ $(echo $? != 0) ]
then
echo "Entrez Direct not in \$PATH"
exit
fi

if [ -n "$1" ]
then
split -l 500 $1 input.

for f in input.*
do
ids=$(cat $f | tr "\n" ",")
epost -db gene -id $ids | elink -target protein | efetch -format uid > $f.output
paste $f $f.output > $f.result
rm $f $f.output
done

cat *.result > $1.output
rm *.result

else
echo "Usage: sh convertGeneIDs listOfGeneIDs\nOutput: geneID\tproteinID"
fi
ADD COMMENTlink modified 4 months ago by RamRS25k • written 5.2 years ago by 5heikki8.6k

I was puzzled by the

if [ -n "$1" ]

line, which turns out to mean "if non-empty string"

ADD REPLYlink modified 4 months ago by RamRS25k • written 5.2 years ago by Nancy Ouyang170

non-empty first argument ;)

ADD REPLYlink written 5.2 years ago by 5heikki8.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 841 users visited in the last hour