Question: Remove/Identify deprecated IDs from a list of Entrez IDs programatically
0
gravatar for salamandra
3.0 years ago by
salamandra130
salamandra130 wrote:

If I have a list of Entrez IDs, how do I identify programatically those that are deprecated in order to remove them from the list?

 

entrez deprecated remove • 1.2k views
ADD COMMENTlink modified 3.0 years ago by Pierre Lindenbaum110k • written 3.0 years ago by salamandra130

and example of deprecated ID please ?

ADD REPLYlink written 3.0 years ago by Pierre Lindenbaum110k

638800 is an example: http://www.ncbi.nlm.nih.gov/gene/?term=638800.

Another example is this: 639384 http://www.ncbi.nlm.nih.gov/gene/?term=639384

ADD REPLYlink written 3.0 years ago by salamandra130

so it's a GENE /entrez id

ADD REPLYlink written 3.0 years ago by Pierre Lindenbaum110k
1
gravatar for Pierre Lindenbaum
3.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum110k wrote:

ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz  "comprehensive information about GeneIDs that are no longer current"

extract and sort this list of ID 

curl "ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz" | gunzip -c | tail -n+1 | cut -f 3 | LC_ALL=C sort

sort your list on the ID colum

and use linux join to remove those IDs from your list. http://linux.die.net/man/1/join

ADD COMMENTlink written 3.0 years ago by Pierre Lindenbaum110k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 667 users visited in the last hour