Forum:Reuse of publicly available datasets
3
1
Entering edit mode
5.3 years ago
sanathoi ▴ 10

Hi, I am novice to RNA-Seq and currently doing some RNA-Seq Analysis in publicly available reads. I was wondering whether is it possible to publish a work using those publicly available reads.

RNA-Seq sequencing publication Forum • 1.4k views
2
Entering edit mode

if it's public then it's public, you add citation to give credit when it's due and that's it. Whole metanalysis approaches make a living from this or benchmarking papers

2
Entering edit mode

This post can use a different title.

Perhaps "Fair-use of publicly available reads" or "Re- or Meta-analysis of publicly available reads" would be more suitable?

4
Entering edit mode
5.3 years ago
GenoMax 115k

Publicly available data sometime still carries restrictions. These may be presented in a "click through" agreement that you agree to (don't we all) as you breeze by. This is important for data that may be still unpublished/under analysis. A common restrictions is that you can't use that data for an independent publication (especially if it is still unpublished).

In short, you would want to do due diligence, before assuming an accessible dataset is public and running with it.

2
Entering edit mode

Interesting point. Say you download all assemblies from Trace in search for a single gene, or download raw data from SRA and make my own assembly?

There are also several documents on open data policies:

e.g. the Fort Lauderdale meeting discussing community resource projects, the resulting NHGRI policy statement, and the Toronto statement.

1
Entering edit mode

That may run afoul of this tenet from Toronto statement.

Respecting the scientific etiquette that allows data producers to publish the first global analyses of their data set

1
Entering edit mode

I think there are two situations where I still could use it but not sure about 1.:

1. If the data has not been published yet, but I am looking only for 1 gene (not a global analysis)
2. If the data has been published, I am free to use it for whatever I like.
1
Entering edit mode

Case #1 could be considered "fair-use" as long as you indicate the exact search you did on the "trace archive" so it could be reproduced.

0
Entering edit mode

I see another problem here, and I hope it is not a real one.... If I search whole Trace archive, and only used assemblies with citations, how would I put in 1000+ citations in that article?

1
Entering edit mode

Maybe like we do in population genetics. Hard to cite >300 papers, so we just put the citation in supplements, and short cite them alongisde one table with data, other solution is just to cite database, and no so far has pointed out, citing just UCSC as wrong doing

3
Entering edit mode
5.3 years ago
Emily 23k

If in doubt, contact the owners of the data.

2
Entering edit mode
5.3 years ago
Benn 8.3k

I guess it is. Of course explain in your methods of your paper how you got your data and what you did with it.

0
Entering edit mode

Indeed, you'll have to cite and acknowledge your data sources and see if there is a license agreement which you have to follow.

0
Entering edit mode

yea that last part is quite important. Though I think it would be hard to lie hands just like that on a licensed data set, but nevertheless, make sure, you fullfill the agreement (it often is, that fully public data set is released just like that, and for the sake of scientific publication you can use it as it is, but very often those datasets have some parts not released fully publicly, and you have to get the permission to obtain and use them, as they are intended to be)

0
Entering edit mode

Thanks everyone for clearing doubts. Indeed, citing and acknowledgement is most important in scientific publication.