Question

Can I use deseq on non transcriptome data?

0

Entering edit mode

4.2 years ago

robert.murphy ▴ 80

I have a matrix of species abundance per sample (across 49 samples) from a metagenomic dataset. Could this be used in deseq2 to generate a differential abundance of bin presences in samples?

genome • 1.4k views

ADD COMMENT • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

I'm new to RNAseq and have yet to fully understand the statistics behind deseq myself. But i could image that the prerequisite assumptions about RNA-datasets might not hold true for your situation (e.g. are you certain that more than 50% of species have identical abundance between conditions?).

Also, how did you calculate abundance? You mentioned non-integer data, so is it percentages? The absolute size of the numbers is actually relevant. A table with every entry multiplied by 100 will yield different result than a table with every entry multiplied by 1.000.

ADD REPLY • link 4.2 years ago by Tom ▴ 540

score 0 · Answer 1 · 2020-01-28

0

Entering edit mode

4.2 years ago

i.sudbery 19k

DESeq2 can in theory be used to analyse any dataset that consists of comparing two or more sets of counts from an experiment where counts are being used to measure something that also varies biologically between replicates. I don't know if there are particular bias' to account for in metagenomic analysis, but I'm pretty sure people have used DESeq for this.

ADD COMMENT • link 4.2 years ago by i.sudbery 19k

0

Entering edit mode

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

Thanks :) Looks like it is deffinatly possible then!

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

I will give it a try then :)

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

How would you suggest I deal with the non integer nature of the data I have given that DESeq2 requiers integers

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

Hmmm... how have you calculated your abundances? I would have thought you had read counts, which would be integer.

ADD REPLY • link 4.2 years ago by i.sudbery 19k

1

Entering edit mode

The abundances were calculated by taking all scaffolds in a bin and looking at their abundance in a sample. I have been given this abundace matrix so not 100% sure, but it was controlled for against a normalized depth so I think this is where the no integers come from I have just multiplied the whole matrix by 100 to remove sub 1 numbers then taking the integer of them. However this does not feel correct to do. Anyway after looking I am not sure I can satisfy the fact DESeq requiers every gene to not contain atleast one zero.

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

Removing genes that have all zero's shouldn't be a problem. But you are correct about multiplying by 100 not being the correct way to do it make your data integer: DESeq needs raw count data, so that its statistical models are valid, it won't work on normalised data.

ADD REPLY • link 4.2 years ago by i.sudbery 19k

0

Entering edit mode

the normalisation was just to controll for different sequence depths as not all sequences came from the same run. Multiplying by 100 should not be viewed as a form of normalisation should it? it is just moving the space the counts occupy up a couple of factors?

ADD REPLY • link 4.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

Its not the multiplying by 100 that violates DESeq's assumptions, but the normalising by read depth. This is because in count statistics 10/100 is not the same as 100/1000.

DEseq has its own methods for accounting for differences in read depth.

ADD REPLY • link 4.2 years ago by i.sudbery 19k