I am annotating a region of the human genome and am looking for a set of assembled ESTs I can use as evidence for constructing gene models. A quick search on the SRA shows 4,318 human RNA-seq data sets, so I understand there is no shortage of data. I am hoping, however, to save the time that would be required to search through this massive amount of data, select an appropriate subset, and assemble the ESTs myself. I'm sure this has been done many times before, and for this particular task it doesn't make sense to repeat this process.
Is there any sort of general (i.e. not tissue-specific) reference EST/transcriptome assembly available for Homo sapiens?
UPDATE
I expanded my search to the UCSC genome browser and found this page. Three files caught my eye immediately: est.fa.gz
, mrna.fa.gz
, and refMrna.fa.gz
. I'm guessing that
est.fa.gz
represents raw, unassembled EST sequencesmrna.fa.gz
represents a comprehensive, redundant set of assembled transcriptsrefMrna.fa.gz
represents a non-redundant set of assembled transcripts
Is this correct?