Forum:Reuse of publicly available datasets
5.3 years ago
sanathoi ▴ 10

Hi, I am novice to RNA-Seq and currently doing some RNA-Seq Analysis in publicly available reads. I was wondering whether is it possible to publish a work using those publicly available reads.

if it's public then it's public, you add citation to give credit when it's due and that's it. Whole metanalysis approaches make a living from this or benchmarking papers

This post can use a different title.

Perhaps "Fair-use of publicly available reads" or "Re- or Meta-analysis of publicly available reads" would be more suitable?

5.3 years ago
GenoMax 115k

Publicly available data sometime still carries restrictions. These may be presented in a "click through" agreement that you agree to (don't we all) as you breeze by. This is important for data that may be still unpublished/under analysis. A common restrictions is that you can't use that data for an independent publication (especially if it is still unpublished).

In short, you would want to do due diligence, before assuming an accessible dataset is public and running with it.

Interesting point. Say you download all assemblies from Trace in search for a single gene, or download raw data from SRA and make my own assembly?

There are also several documents on open data policies:

e.g. the Fort Lauderdale meeting discussing community resource projects, the resulting NHGRI policy statement, and the Toronto statement.

That may run afoul of this tenet from Toronto statement.

Respecting the scientific etiquette that allows data producers to publish the first global analyses of their data set

I think there are two situations where I still could use it but not sure about 1.:

1. If the data has not been published yet, but I am looking only for 1 gene (not a global analysis)
2. If the data has been published, I am free to use it for whatever I like.
Case #1 could be considered "fair-use" as long as you indicate the exact search you did on the "trace archive" so it could be reproduced.

I see another problem here, and I hope it is not a real one.... If I search whole Trace archive, and only used assemblies with citations, how would I put in 1000+ citations in that article?

Maybe like we do in population genetics. Hard to cite >300 papers, so we just put the citation in supplements, and short cite them alongisde one table with data, other solution is just to cite database, and no so far has pointed out, citing just UCSC as wrong doing

5.3 years ago
Emily 23k

If in doubt, contact the owners of the data.

5.3 years ago
Benn 8.3k

I guess it is. Of course explain in your methods of your paper how you got your data and what you did with it.

Indeed, you'll have to cite and acknowledge your data sources and see if there is a license agreement which you have to follow.

yea that last part is quite important. Though I think it would be hard to lie hands just like that on a licensed data set, but nevertheless, make sure, you fullfill the agreement (it often is, that fully public data set is released just like that, and for the sake of scientific publication you can use it as it is, but very often those datasets have some parts not released fully publicly, and you have to get the permission to obtain and use them, as they are intended to be)

Thanks everyone for clearing doubts. Indeed, citing and acknowledgement is most important in scientific publication.