Question: Largest RNAseq dataset other than L1000?
1
gravatar for Dirk
2.2 years ago by
Dirk80
Dirk80 wrote:

Does anyone know of any datasets other than L1000/CMap dataset (https://clue.io/) that has RNAseq data for cell lines and a large number of compounds/perturbagens? I know there a lot of "one-off" datasets for particular cell lines/tissues and a handful of compounds, but I am primarily interested in exploring machine learning methods on the biggest dataset I can get my hands on.

ADD COMMENTlink modified 2.2 years ago by edmund.a.miller0 • written 2.2 years ago by Dirk80

Note: I am aware of datasets from the likes of COSMIC, but I am currently a part of a commercial institution, so I would need data that is under a flexible license or is just free.

ADD REPLYlink written 2.2 years ago by Dirk80

Have you looked at GTEx?

ADD REPLYlink written 2.2 years ago by WouterDeCoster43k

So far as I can tell, GTEx is just a collection of user-submitted assays (different conditions, tissues, cells, compounds, etc) that primarily focuses on tissue level RNA-seq data. Is there a large collection of cellular assays within GTEx?

ADD REPLYlink written 2.2 years ago by Dirk80

Lymphoblasts are cell lines I guess.

ADD REPLYlink written 2.2 years ago by WouterDeCoster43k
0
gravatar for edmund.a.miller
2.2 years ago by
edmund.a.miller0 wrote:

What exactly is looking to do with the RNA-Seq data? Working on a similar project, feel free to message me.

ADD COMMENTlink written 2.2 years ago by edmund.a.miller0

I'm primarily interested in creating structure-activity relationships (SAR) for compounds, with differential gene expression as the target (e.g. given a compound, predict the differential expression (likely just a classification of up/down-regulated) of all human genes . There are a number of papers that have have made interesting approaches (https://academic.oup.com/bioinformatics/article/32/12/1832/1743989, and http://pubs.acs.org/doi/abs/10.1021/acs.jcim.6b00260), and I'd like to make a similar approach. Optimally, the dataset would have similar experimental conditions for a large number of genes and compounds tested. I've found the GEO Omnibus (https://www.ncbi.nlm.nih.gov/gds), but its unclear if this is the collection of datasets that I care about.

Edit: to be clear, I'm hoping for datasets that exist for each prominent cell lines (e.g. NCI-60, or something similar).

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Dirk80

Sounds good. I haven't started this part of my project yet but it's something I'll need to think about soon so let's see if we can work this out.

How many data sets do you need?

I found: http://www.roadmapepigenomics.org/data/tables/all

460 here: https://www.encodeproject.org/matrix/?type=Experiment&assay_title=total+RNA-seq

Also might consider reading through this post and let me know what you find:

What Databases Are Available For Rna-Seq Datasets?

ADD REPLYlink written 2.2 years ago by edmund.a.miller0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 746 users visited in the last hour