Question: Assembled Human Ests For Annotation
gravatar for Daniel Standage
7.8 years ago by
Daniel Standage3.9k
Davis, California, USA
Daniel Standage3.9k wrote:

I am annotating a region of the human genome and am looking for a set of assembled ESTs I can use as evidence for constructing gene models. A quick search on the SRA shows 4,318 human RNA-seq data sets, so I understand there is no shortage of data. I am hoping, however, to save the time that would be required to search through this massive amount of data, select an appropriate subset, and assemble the ESTs myself. I'm sure this has been done many times before, and for this particular task it doesn't make sense to repeat this process.

Is there any sort of general (i.e. not tissue-specific) reference EST/transcriptome assembly available for Homo sapiens?

UPDATE I expanded my search to the UCSC genome browser and found this page. Three files caught my eye immediately: est.fa.gz, mrna.fa.gz, and refMrna.fa.gz. I'm guessing that

  • est.fa.gz represents raw, unassembled EST sequences
  • mrna.fa.gz represents a comprehensive, redundant set of assembled transcripts
  • refMrna.fa.gz represents a non-redundant set of assembled transcripts

Is this correct?

gene est human • 1.5k views
ADD COMMENTlink modified 7.8 years ago by Larry_Parnell16k • written 7.8 years ago by Daniel Standage3.9k
gravatar for Larry_Parnell
7.8 years ago by
Boston, MA USA
Larry_Parnell16k wrote:


You may find the Unigene data at NCBI useful for assembled ESTs. I am not absolutely certain that the two mRNA datasets you listed are redundant and non-redundant, respectively. I don't think it matters for your purpose - meaning that you could use both to annotate the genomic region you have.

ADD COMMENTlink written 7.8 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 967 users visited in the last hour