Question: Retrieving Length Of Mrna, And Proteins Given Gene Id
2
gravatar for Radu
7.6 years ago by
Radu50
Radu50 wrote:

I'm looking to pull information from data made available on the NCBI site. So far I've made use of the geneinfo and gene2accession datasets from ftp://ftp.ncbi.nih.gov/gene. So I've got GeneIDs, and accession versions/gi's for the nucleotide, mRNA and protein sequences associated with the geneID. The actual sequences I could get from gene2refseq but is there any way I could get just the lengths of the various transcripts?

I can't use Entrez, I need a copy of the raw data.

ncbi eutils • 3.3k views
ADD COMMENTlink modified 7.5 years ago by Michael Schubert6.8k • written 7.6 years ago by Radu50
2
gravatar for Michael Schubert
7.6 years ago by
Cambridge, UK
Michael Schubert6.8k wrote:

Take a look at the NCBI eutils. You can use, with the ID as query:

  • esummary on the sequence database to get information about the sequence
  • efetch on the sequence database to get the sequence
  • efetch on the gene database to link proteins to e.g. genes or transcripts

edit to address the large number of sequences:

  • If you use esummary on a list of mRNA IDs you can query a lot of IDs with only one request and get the length of all of them
  • Also, NCBI does allow a large number of queries as long as you don't exceed 3 per second and it is recommended that you do it during US nighttime - so a couple of thousand requests should be no problem.

If that is still not enough for you and you're using human sequences it might be better to download the latest release of the Ensembl database and execute you (SQL) queries there.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Michael Schubert6.8k

Thanks for the answer Michael but I can't use that due to the requirements. I'm going to be making way too many requests for too much data so I require the actual dataset.

ADD REPLYlink written 7.6 years ago by Radu50

NCBI does allow a large number of queries as long as you don't exceed 3 per second and it is recommended that you do it during US nighttime - so a couple of thousand should be no problem.

ADD REPLYlink written 7.6 years ago by Michael Schubert6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour