Question

Forum:Are We At A Point Yet Where We Can Do Bioinformatics Research Based On Public Datasets?

5

Entering edit mode

10.5 years ago

Damian Kao 16k

Are there enough existing data out there to actually do meaningful research on? Let's say I am financially secure enough to not need to work and have reasonable funds to buy/rent computational time, can I actually just mine existing datasets and publish meaningful work on arxiv? Will anyone actually take me seriously without institutional/academic backing?

There are a lot of datasets on very specific domains of research. Can we actually make them all comparable? For example, can we take chip-seq of one study and RNA-seq of another study and analyze them together? Are the biological samples taken similar enough or library prep steps well described enough to make the data comparable?

public-datasets • 3.3k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 10.5 years ago by Damian Kao 16k

0

Entering edit mode

From my perspective, the short answer is probably yes and No. Yes - there is a likely a ton of important findings hiding in datasets like TCGA just waiting to be mined with the right approach and targeted questions. No - you probably won't be able to pull it of without institutional/academic backing. For one thing, in order to access controlled data sets you have to go through official channels. And, any discoveries you make and want to validate will need access to additional samples and so on. But, most importantly, the scientific community will be prejudiced against findings published only in places like ArXiv. Someone really clever, determined, with sufficient existing recognition in the community could probably pull it off though. Is it worth it to do this off the reservation? Just to prove it can be done? maybe...

ADD REPLY • link 10.5 years ago by Obi Griffith 20k

score 4 · Answer 1 · 2013-10-20

Lots of bioinformaticians are doing analyses of existing public datasets or using them to benchmark new tools. That said, while ArXiv is useful, no, outside of actually publishing your work in peer-reviewed journals you are unlikely to be taken seriously. In fact most people won't even see or know of your work.

score 1 · Answer 2 · 2013-10-20

I have following points to add- Most (if not all) of the databases are not very well curated and may not have all the information at one place. It need lots of efforts and preliminaries before some one can actually reach to these data-sets. Then one piece of information is missing and you have to go to second dataset/s and so on so forth. These databases serves bread and butter of some research grps. e.g TCGA papers were published with lots of fan fare (mostly in Nature) with number of riders to other researchers but now they want to create another analysis grp of publishing the almost similar (same) data with integrative analysis. Why it was not done in the first place. The same data is fragmentary in various databases. Would some one like to invest so much time and resources is something need to be considered seriously before jumping into that. I think on the face of it is good for those few grps. To argue that and get funding. Secondly the technology (with chemistry) keeps on changing. New commercial platforms keep on rolling in so it again become a good jargon to reckon with. New databases need to be created. I will take a balance approach which will also satisfy the funding agencies. To make the hypothesis from your own experiment first and then back up with meta-analysis of these databases. It is unfortunate that most (if not all) funding agencies refer investigators to look for such databases.

score 0 · Answer 3 · 2013-10-20

0

Entering edit mode

10.5 years ago

akshayb04 ▴ 30

I think by integrating data from different databases one could achieve certain milestones in such kind of research. Many attempts have been made to integrate and expand data. One such project that I used was Taverna workflow systems. You build a flow chart for what pipeline you need to run and it helps in integrating all types of databases with the data ( different data sets). On the other hand, the project also has a social networking site called myexperiments, where users can upload their flow diagrams and get comments and suggestions to improve it. After all, sharing is the only thing left in the egoistic scientific environment!

ADD COMMENT • link 10.5 years ago by akshayb04 ▴ 30

0

Entering edit mode

I kind of agree with Nitin and Dan points to certain extant. These are good source of funding for few grps like TCGA etc but to base your research/ future on these databases may not be a good idea.

ADD REPLY • link 10.5 years ago by rob.costa1234 ▴ 310

score 0 · Answer 4 · 2013-10-27

Atul Butte's research group at Stanford is used as an example of an group doing research by re-using / integrating public data.

They outsource the wetlab assays that they sometimes need to do. And supposedly publish 2 papers a month in peer reviewed journals .

http://buttelab.stanford.edu/

http://stanmed.stanford.edu/2012summer/article3.html

http://scholar.google.nl/citations?hl=en&user=NDyEvlQAAAAJ&sortby=pubdate&view_op=list_works&cstart=20