Largest RNAseq dataset other than L1000?
1
1
Entering edit mode
6.2 years ago
Dirk ▴ 100

Does anyone know of any datasets other than L1000/CMap dataset (https://clue.io/) that has RNAseq data for cell lines and a large number of compounds/perturbagens? I know there a lot of "one-off" datasets for particular cell lines/tissues and a handful of compounds, but I am primarily interested in exploring machine learning methods on the biggest dataset I can get my hands on.

RNA-Seq machine learning L1000 • 2.2k views
ADD COMMENT
0
Entering edit mode

Note: I am aware of datasets from the likes of COSMIC, but I am currently a part of a commercial institution, so I would need data that is under a flexible license or is just free.

ADD REPLY
0
Entering edit mode

Have you looked at GTEx?

ADD REPLY
0
Entering edit mode

So far as I can tell, GTEx is just a collection of user-submitted assays (different conditions, tissues, cells, compounds, etc) that primarily focuses on tissue level RNA-seq data. Is there a large collection of cellular assays within GTEx?

ADD REPLY
0
Entering edit mode

Lymphoblasts are cell lines I guess.

ADD REPLY
0
Entering edit mode
6.2 years ago

What exactly is looking to do with the RNA-Seq data? Working on a similar project, feel free to message me.

ADD COMMENT
0
Entering edit mode

I'm primarily interested in creating structure-activity relationships (SAR) for compounds, with differential gene expression as the target (e.g. given a compound, predict the differential expression (likely just a classification of up/down-regulated) of all human genes . There are a number of papers that have have made interesting approaches (https://academic.oup.com/bioinformatics/article/32/12/1832/1743989, and http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00260), and I'd like to make a similar approach. Optimally, the dataset would have similar experimental conditions for a large number of genes and compounds tested. I've found the GEO Omnibus (https://www.ncbi.nlm.nih.gov/gds), but its unclear if this is the collection of datasets that I care about.

Edit: to be clear, I'm hoping for datasets that exist for each prominent cell lines (e.g. NCI-60, or something similar).

ADD REPLY
0
Entering edit mode

Sounds good. I haven't started this part of my project yet but it's something I'll need to think about soon so let's see if we can work this out.

How many data sets do you need?

I found: http://www.roadmapepigenomics.org/data/tables/all

460 here: https://www.encodeproject.org/matrix/?type=Experiment&assay_title=total+RNA-seq

Also might consider reading through this post and let me know what you find:

What Databases Are Available For Rna-Seq Datasets?

ADD REPLY

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6