Remove/Identify deprecated IDs from a list of Entrez IDs programatically
1
0
Entering edit mode
6.3 years ago
salamandra ▴ 410

If I have a list of Entrez IDs, how do I identify programatically those that are deprecated in order to remove them from the list?

 

entrez remove deprecated • 2.2k views
ADD COMMENT
0
Entering edit mode

and example of deprecated ID please ?

ADD REPLY
0
Entering edit mode

638800 is an example: http://www.ncbi.nlm.nih.gov/gene/?term=638800.

Another example is this: 639384 http://www.ncbi.nlm.nih.gov/gene/?term=639384

ADD REPLY
0
Entering edit mode

so it's a GENE /entrez id

ADD REPLY
1
Entering edit mode
6.3 years ago

ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz  "comprehensive information about GeneIDs that are no longer current"

extract and sort this list of ID 

curl "ftp://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz" | gunzip -c | tail -n+1 | cut -f 3 | LC_ALL=C sort

sort your list on the ID colum

and use linux join to remove those IDs from your list. http://linux.die.net/man/1/join

ADD COMMENT

Login before adding your answer.

Traffic: 2679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6