So I have several excel files with 3000+ 'feature ID's' from next gen sequencing experiments. The feature ID's look as such:
LOC733603 MS4A7 CRISP3 RETN TNFAIP6 ALPL MMP8 IRG1 LTF KCNJ15 HCRTR1
Basically, I would like to gather the following information about each of these features for Sus scrofa:
- Gene name
- Gene description
- Protein Name
- Amino acid sequence
I am using python, mainly the urllib2 package, to make HTTP requests to the NCBI gene database.
I can easily get the gene name and gene description by querying NCBI's gene database. I am then trying to use the associated gene ID to query either NCBI's protein database or uniprot but I am not sure what is the wiser approach? Has anyone else had the same scenario and have any useful advice or other ways about obtaining the data I am interested in?
Even easier, is there a way to access the NCBI related protein information with an NCBI gene ID?