Question: Can I use deseq on non transcriptome data?
0
gravatar for robert.murphy
8 months ago by
robert.murphy10 wrote:

I have a matrix of species abundance per sample (across 49 samples) from a metagenomic dataset. Could this be used in deseq2 to generate a differential abundance of bin presences in samples?

genome • 221 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by robert.murphy10

I'm new to RNAseq and have yet to fully understand the statistics behind deseq myself. But i could image that the prerequisite assumptions about RNA-datasets might not hold true for your situation (e.g. are you certain that more than 50% of species have identical abundance between conditions?).

Also, how did you calculate abundance? You mentioned non-integer data, so is it percentages? The absolute size of the numbers is actually relevant. A table with every entry multiplied by 100 will yield different result than a table with every entry multiplied by 1.000.

ADD REPLYlink modified 8 months ago • written 8 months ago by Tom530
0
gravatar for i.sudbery
8 months ago by
i.sudbery9.4k
Sheffield, UK
i.sudbery9.4k wrote:

DESeq2 can in theory be used to analyse any dataset that consists of comparing two or more sets of counts from an experiment where counts are being used to measure something that also varies biologically between replicates. I don't know if there are particular bias' to account for in metagenomic analysis, but I'm pretty sure people have used DESeq for this.

ADD COMMENTlink written 8 months ago by i.sudbery9.4k

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531

ADD REPLYlink written 8 months ago by genomax91k

Thanks :) Looks like it is deffinatly possible then!

ADD REPLYlink written 8 months ago by robert.murphy10

I will give it a try then :)

ADD REPLYlink written 8 months ago by robert.murphy10

How would you suggest I deal with the non integer nature of the data I have given that DESeq2 requiers integers

ADD REPLYlink written 8 months ago by robert.murphy10

Hmmm... how have you calculated your abundances? I would have thought you had read counts, which would be integer.

ADD REPLYlink written 8 months ago by i.sudbery9.4k
1

The abundances were calculated by taking all scaffolds in a bin and looking at their abundance in a sample. I have been given this abundace matrix so not 100% sure, but it was controlled for against a normalized depth so I think this is where the no integers come from I have just multiplied the whole matrix by 100 to remove sub 1 numbers then taking the integer of them. However this does not feel correct to do. Anyway after looking I am not sure I can satisfy the fact DESeq requiers every gene to not contain atleast one zero.

ADD REPLYlink written 8 months ago by robert.murphy10

Removing genes that have all zero's shouldn't be a problem. But you are correct about multiplying by 100 not being the correct way to do it make your data integer: DESeq needs raw count data, so that its statistical models are valid, it won't work on normalised data.

ADD REPLYlink written 8 months ago by i.sudbery9.4k

the normalisation was just to controll for different sequence depths as not all sequences came from the same run. Multiplying by 100 should not be viewed as a form of normalisation should it? it is just moving the space the counts occupy up a couple of factors?

ADD REPLYlink written 8 months ago by robert.murphy10

Its not the multiplying by 100 that violates DESeq's assumptions, but the normalising by read depth. This is because in count statistics 10/100 is not the same as 100/1000.

DEseq has its own methods for accounting for differences in read depth.

ADD REPLYlink written 8 months ago by i.sudbery9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2242 users visited in the last hour