Retrieving Length Of Mrna, And Proteins Given Gene Id
1
2
Entering edit mode
13.0 years ago
Radu ▴ 50

I'm looking to pull information from data made available on the NCBI site. So far I've made use of the geneinfo and gene2accession datasets from ftp://ftp.ncbi.nih.gov/gene. So I've got GeneIDs, and accession versions/gi's for the nucleotide, mRNA and protein sequences associated with the geneID. The actual sequences I could get from gene2refseq but is there any way I could get just the lengths of the various transcripts?

I can't use Entrez, I need a copy of the raw data.

ncbi eutils • 5.1k views
ADD COMMENT
2
Entering edit mode
13.0 years ago

Take a look at the NCBI eutils. You can use, with the ID as query:

  • esummary on the sequence database to get information about the sequence
  • efetch on the sequence database to get the sequence
  • efetch on the gene database to link proteins to e.g. genes or transcripts

edit to address the large number of sequences:

  • If you use esummary on a list of mRNA IDs you can query a lot of IDs with only one request and get the length of all of them
  • Also, NCBI does allow a large number of queries as long as you don't exceed 3 per second and it is recommended that you do it during US nighttime - so a couple of thousand requests should be no problem.

If that is still not enough for you and you're using human sequences it might be better to download the latest release of the Ensembl database and execute you (SQL) queries there.

ADD COMMENT
0
Entering edit mode

Thanks for the answer Michael but I can't use that due to the requirements. I'm going to be making way too many requests for too much data so I require the actual dataset.

ADD REPLY
0
Entering edit mode

NCBI does allow a large number of queries as long as you don't exceed 3 per second and it is recommended that you do it during US nighttime - so a couple of thousand should be no problem.

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6