Question

Assembled Human Ests For Annotation

1

Entering edit mode

12.4 years ago

Daniel Standage 4.1k

I am annotating a region of the human genome and am looking for a set of assembled ESTs I can use as evidence for constructing gene models. A quick search on the SRA shows 4,318 human RNA-seq data sets, so I understand there is no shortage of data. I am hoping, however, to save the time that would be required to search through this massive amount of data, select an appropriate subset, and assemble the ESTs myself. I'm sure this has been done many times before, and for this particular task it doesn't make sense to repeat this process.

Is there any sort of general (i.e. not tissue-specific) reference EST/transcriptome assembly available for Homo sapiens?

UPDATE I expanded my search to the UCSC genome browser and found this page. Three files caught my eye immediately: est.fa.gz, mrna.fa.gz, and refMrna.fa.gz. I'm guessing that

est.fa.gz represents raw, unassembled EST sequences
mrna.fa.gz represents a comprehensive, redundant set of assembled transcripts
refMrna.fa.gz represents a non-redundant set of assembled transcripts

Is this correct?

gene human est • 2.3k views

ADD COMMENT • link updated 12.4 years ago by Larry_Parnell 16k • written 12.4 years ago by Daniel Standage 4.1k

score 1 · Answer 1 · 2011-12-07

1

Entering edit mode

12.4 years ago

Larry_Parnell 16k

Daniel,

You may find the Unigene data at NCBI useful for assembled ESTs. I am not absolutely certain that the two mRNA datasets you listed are redundant and non-redundant, respectively. I don't think it matters for your purpose - meaning that you could use both to annotate the genomic region you have.

ADD COMMENT • link 12.4 years ago by Larry_Parnell 16k