Question: A Universal Format For Expression Data?
gravatar for Zjk
8.5 years ago by
Zjk10 wrote:

I read Timtico's answer here.

And I wonder why not a universal format for expression data?

For every sample:

geneID  mRNA quantity
gene0   N0
gene1   N1
gene2   N2

is what we really want to know.

If I understand it right, the coverage of mapping in RNASeq is the Ni. What about microarry data? Can we convert it into this format?

gene • 1.6k views
ADD COMMENTlink modified 8.5 years ago by Chris Evelo10.0k • written 8.5 years ago by Zjk10

You have given the answer yourself in your example, you ask for a universal format. That would cover all possible use-cases. Your example to the contrary is completely reduced and simplistic. Think about the myriads of interesting questions your format doesn't cover (exons, introns, arbitrary regions, intergenic regions, other eg. small RNA, protein expression, discrete counts vs. ratios, the need for sample annotation,... ) and try to add these to your format.

ADD REPLYlink written 8.5 years ago by Michael Dondrup47k

if you think about it, even the Ns represent coverage or counts, you still need normalization or other procedures before you compare the values between samples. It's all about statistics!

ADD REPLYlink written 8.5 years ago by Vitis2.3k
gravatar for iw9oel_ad
8.5 years ago by
iw9oel_ad6.0k wrote:

By coincidence, today's XKCD explains why there is no universal format for expression data.

Devising even simple data formats is a complex task and the more general the application, the harder it gets to elegantly incorporate everyone's use-cases.

Most expression microarray experiments measure relative intensities between different conditions/treatments (which we interpret as being related to relative RNA levels), so I would not expect to know an "mRNA quantity" unless I had used spike-in controls. Even then, a microarray experiment is not the way to quantitate RNA.

ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by iw9oel_ad6.0k

My computer doesn't prefer XML. Just a matter of proper education.

ADD REPLYlink written 8.5 years ago by Lyco2.3k

The problem is that biologist want text files but computers prefer XML files

ADD REPLYlink written 8.5 years ago by Pasta1.3k
gravatar for Chris Evelo
8.5 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

Well the XKCD answer mentioned by Keith is good to have in mind, but of course not a reason to give up all efforts to create standards or to try to improve data and software interoperability.

Keith is right though that microarray data do not easily give absolute expression values and it is almost impossible to compare absolute expression values across experiments and across technologies. Microarray data are indeed more suited for comparisons between conditions than for absolute expression values.

But then you can of course ask zjk's question again. Can't you easily create a standardized format or at least a standardized content for microarray expression data comparison? Of course in that case you would need to specify what conditions (and thus what groups of arrays) you would want to compare. And that asks for another type of standardization: the experiment description files. In fact there are interesting efforts for that like the now almost traditional MageTab format or the new Isa-tab that also covers other types of omics.

The standardized result should be based on standardized data treatment for QC, normalization and such and should then give you a p-value for the statistics (an ANOVA if you have more than one comparison is a good start) and the various fold changes for the comparisons. This plus the probeset identifiers (which can easily be resolved to genes, but which keeps the raw information), was precisely what we stored as a network wide standard in [?]NuGO[?]. The same type of information is also provided by [?]Gene Expression Atlas[?]. Since that is a repository of selected (from Arrayexpress) good quality expression studies with data treatment and result storage in a standardized way, I would suggest to use the Gene Expression Atlas data format as the de facto standard for microarray comparison data, unless you have a really good reason to do otherwise. In fact that is what we plan to do in the [?]dbNP[?] project.

ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by Chris Evelo10.0k

Is the ArrayExpress Atlas format in JSON/XML?

ADD REPLYlink written 8.5 years ago by lh331k

I think both positions of you Keith and Chris are totally justified. The difference is the adjective 'universal'. And regarding 'universal models or tools' my position is that something promising to be able to be suitable for everything will in fact be adequate for nothing completely.

ADD REPLYlink written 8.5 years ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 602 users visited in the last hour