Question: Remove/Identify deprecated IDs from a list of Entrez IDs programatically
0
gravatar for salamandra
4.0 years ago by
salamandra230
salamandra230 wrote:

If I have a list of Entrez IDs, how do I identify programatically those that are deprecated in order to remove them from the list?

 

entrez deprecated remove • 1.6k views
ADD COMMENTlink modified 4.0 years ago by Pierre Lindenbaum121k • written 4.0 years ago by salamandra230

and example of deprecated ID please ?

ADD REPLYlink written 4.0 years ago by Pierre Lindenbaum121k

638800 is an example: http://www.ncbi.nlm.nih.gov/gene/?term=638800.

Another example is this: 639384 http://www.ncbi.nlm.nih.gov/gene/?term=639384

ADD REPLYlink written 4.0 years ago by salamandra230

so it's a GENE /entrez id

ADD REPLYlink written 4.0 years ago by Pierre Lindenbaum121k
1
gravatar for Pierre Lindenbaum
4.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz  "comprehensive information about GeneIDs that are no longer current"

extract and sort this list of ID 

curl "ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz" | gunzip -c | tail -n+1 | cut -f 3 | LC_ALL=C sort

sort your list on the ID colum

and use linux join to remove those IDs from your list. http://linux.die.net/man/1/join

ADD COMMENTlink written 4.0 years ago by Pierre Lindenbaum121k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour