Question: Remove/Identify deprecated IDs from a list of Entrez IDs programatically
0
gravatar for salamandra
3.2 years ago by
salamandra170
salamandra170 wrote:

If I have a list of Entrez IDs, how do I identify programatically those that are deprecated in order to remove them from the list?

 

entrez deprecated remove • 1.3k views
ADD COMMENTlink modified 3.2 years ago by Pierre Lindenbaum112k • written 3.2 years ago by salamandra170

and example of deprecated ID please ?

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum112k

638800 is an example: http://www.ncbi.nlm.nih.gov/gene/?term=638800.

Another example is this: 639384 http://www.ncbi.nlm.nih.gov/gene/?term=639384

ADD REPLYlink written 3.2 years ago by salamandra170

so it's a GENE /entrez id

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum112k
1
gravatar for Pierre Lindenbaum
3.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz  "comprehensive information about GeneIDs that are no longer current"

extract and sort this list of ID 

curl "ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz" | gunzip -c | tail -n+1 | cut -f 3 | LC_ALL=C sort

sort your list on the ID colum

and use linux join to remove those IDs from your list. http://linux.die.net/man/1/join

ADD COMMENTlink written 3.2 years ago by Pierre Lindenbaum112k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 718 users visited in the last hour