Processed RNA-SEQ Datasets
3
1
Entering edit mode
8.9 years ago

Hi there,

I'm looking for a repository containing RNA-Seq data in a ready to analysis form (matrix with patients vs genes). I already know about TCGA, but the data there is not in a ready to process form. Does anyone know of machine learning studies that employ such kind of data and make them available in a easy to use format?

Cheers!

Clustering RNA-Seq Datasets • 4.0k views
ADD COMMENT
5
Entering edit mode
8.9 years ago
matted 7.7k

How about ReCount?

"ReCount is an online resource consisting of RNA-seq gene count datasets built using the raw data from 18 different studies. ... The count tables, ExpressionSets, and phenotype tables are ready to use and freely available here. By taking care of several preprocessing steps and combining many datasets into one easily-accessible website, we make finding and analyzing RNA-seq data considerably more straightforward."

ADD COMMENT
0
Entering edit mode

Thanks for the comment. However the number of cases from ReCount is still small for the kind of analysis I want to perform. Moreover, the phenotype from the studies does not seem helpful to me... I think I'll stick with TCGA data and do all the preprocessing.

ADD REPLY
2
Entering edit mode
8.9 years ago

If you're just looking for ready expression data to test your analysis you should definitely try Illumina Body Map Baseline Expression atlas here http://www.ebi.ac.uk/gxa/help/baseline-atlas.html.

The direct link: http://www.ebi.ac.uk/arrayexpress/files/E-MTAB-1733/E-MTAB-1733.processed.1.zip

It is tissue-specific so it would be easier to test machine learning using tissue-specific genes rather than vague cancer signatures :)

ADD COMMENT
1
Entering edit mode
8.9 years ago

If the data is submitted with a table of RPKM expression values (or some transformation of RPKM/FPKM/TPM, etc.), then you might be able to find something like this in the series matrix file in GEO.

http://www.ncbi.nlm.nih.gov/gds/?term=RNA-Seq

Also, the published TCGA studies make your life a little easier in terms of finding the relevant data (although I don't remember if the RNA-Seq data has been concatenated into a matrix), compared to the HTTP directory:

https://tcga-data.nci.nih.gov/docs/publications/

ADD COMMENT

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6