Find all gene ID from a genome through Entrez
2
0
Entering edit mode
10.5 years ago
maxi_zu • 0

Hey,

I want to find all the gene of a specific genome through a genome number accession.

I need to fin all the Gene ID

I try to deal with Entrez and BioPython, so I first do that :

from Bio import Entrez
handle = Entrez.esearch(db="nuccore",term="NC_003210")
record = Entrez.read(handle)
print record
{u'Count': '1', u'RetMax': '1', u'IdList': ['16802048'], u'TranslationSet': [], u'RetStart': '0', u'QueryTranslation': ''}

I only have one IdList, and I don't find a way to have all the gene id from this genome.

Someone could help me please?

gene entrez ncbi • 3.4k views
ADD COMMENT
1
Entering edit mode
10.5 years ago

NC_003210 is Listeria monocytogenes EGD-e . Taxon-ID=169963 =

http://www.ncbi.nlm.nih.gov/gene/?term=txid169963[Organism:noexp]

so the esearch cmd line to gene the gene-ids would be something like:

 (curl 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=txid169963%5BOrganism:noexp%5D&retmax=5000' | xmllint --xpath "translate(normalize-space(//IdList),' ',',')" - && echo) | tr "," "\n"

(...)

984470
984464
984454
984452
984451
984449
984448
984443
984438
984407
984406
984405
984398
984395
984394
984391
984382
984380
984366
ADD COMMENT
0
Entering edit mode
10.5 years ago
5heikki 11k

First few lines with entrez direct:

esearch -db nuccore -query NC_003210 | elink -target gene | efetch -format id

1. inlA
internalin A[Listeria monocytogenes EGD-e]
Other Aliases: lmo0433
Annotation: NC_003210.1 (454534..456936)
ID: 985151

2. inlB
internalin B[Listeria monocytogenes EGD-e]
Other Aliases: lmo0434
Annotation: NC_003210.1 (457021..458913)
ID: 986892

3. prfA
peptide chain release factor 1[Listeria monocytogenes EGD-e]
Other Aliases: lmo2543
Annotation: NC_003210.1 (2619415..2620491, complement)
ID: 986736

4. hly
listeriolysin O precursor[Listeria monocytogenes EGD-e]
Other Aliases: lmo0202
Annotation: NC_003210.1 (205819..207408)
ID: 987033

5. sigB
RNA polymerase sigma factor SigB[Listeria monocytogenes EGD-e]
Other Aliases: lmo0895
Annotation: NC_003210.1 (930671..931450)
ID: 986527

6. prfA
listeriolysin positive regulatory protein[Listeria monocytogenes EGD-e]
Other Aliases: lmo0200
Annotation: NC_003210.1 (203640..204353, complement)
ID: 987031

7. actA
actin-assembly inducing protein precursor[Listeria monocytogenes EGD-e]
Other Aliases: lmo0204
Annotation: NC_003210.1 (209470..211389)
ID: 987035

8. fri
non-heme iron-binding ferritin[Listeria monocytogenes EGD-e]
Other Aliases: lmo0943
Annotation: NC_003210.1 (979059..979529)
ID: 986847

9. iap
invasion associated secreted endopeptidase[Listeria monocytogenes EGD-e]
Other Aliases: lmo0582
Annotation: NC_003210.1 (618932..620380, complement)
ID: 985140

10. inlC
internalin C[Listeria monocytogenes EGD-e]
Other Aliases: lmo1786
Annotation: NC_003210.1 (1860200..1861090, complement)
ID: 985945

11. plcA
phosphatidylinositol-specific phospholipase c[Listeria monocytogenes EGD-e]
Other Aliases: lmo0201
Annotation: NC_003210.1 (204624..205577, complement)
ID: 987032
ADD COMMENT

Login before adding your answer.

Traffic: 3043 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6