**10**wrote:

I have a matrix of species abundance per sample (across 49 samples) from a metagenomic dataset. Could this be used in deseq2 to generate a differential abundance of bin presences in samples?

Question: Can I use deseq on non transcriptome data?

0

robert.murphy • **10** wrote:

I have a matrix of species abundance per sample (across 49 samples) from a metagenomic dataset. Could this be used in deseq2 to generate a differential abundance of bin presences in samples?

0

i.sudbery ♦ **9.4k** wrote:

DESeq2 can in theory be used to analyse any dataset that consists of comparing two or more sets of counts from an experiment where counts are being used to measure something that also varies biologically between replicates. I don't know if there are particular bias' to account for in metagenomic analysis, but I'm pretty sure people have used DESeq for this.

How would you suggest I deal with the non integer nature of the data I have given that DESeq2 requiers integers

Hmmm... how have you calculated your abundances? I would have thought you had read counts, which would be integer.

1

The abundances were calculated by taking all scaffolds in a bin and looking at their abundance in a sample. I have been given this abundace matrix so not 100% sure, but it was controlled for against a normalized depth so I think this is where the no integers come from I have just multiplied the whole matrix by 100 to remove sub 1 numbers then taking the integer of them. However this does not feel correct to do. Anyway after looking I am not sure I can satisfy the fact DESeq requiers every gene to not contain atleast one zero.

Removing genes that have all zero's shouldn't be a problem. But you are correct about multiplying by 100 not being the correct way to do it make your data integer: DESeq needs raw count data, so that its statistical models are valid, it won't work on normalised data.

the normalisation was just to controll for different sequence depths as not all sequences came from the same run. Multiplying by 100 should not be viewed as a form of normalisation should it? it is just moving the space the counts occupy up a couple of factors?

Please log in to add an answer.

Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.

Powered by Biostar
version 2.3.0

Traffic: 2242 users visited in the last hour

I'm new to RNAseq and have yet to fully understand the statistics behind deseq myself. But i could image that the prerequisite assumptions about RNA-datasets might not hold true for your situation (e.g. are you certain that more than 50% of species have identical abundance between conditions?).

Also, how did you calculate abundance? You mentioned non-integer data, so is it percentages? The absolute size of the numbers is actually relevant. A table with every entry multiplied by 100 will yield different result than a table with every entry multiplied by 1.000.

530