Question: Normalization methods for metagenomics WGS data (not 16S data)
0
gravatar for David
23 months ago by
David150
David150 wrote:

Hi, I have illumina 2x150 bp from a metagenomics experiment. This is not 16S data but WGS data.

I was wondering if either DEseq2 or edgeR can be used for data normalization in this context ? I know these two are normally used for RNA data but it looks they might also perform well with DNA data (based on this article "Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics"

Thanks for your comments, David

edger deseq2 normalization • 1.2k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by David150

If you have a table with read counts then DESeq2 should work, the main issue here is what reference to use.

ADD REPLYlink written 23 months ago by Asaf5.3k

I have the table with read counts. What do you mean by reference to use ??

ADD REPLYlink written 23 months ago by David150

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 23 months ago by genomax63k

Perhaps you can clarify what you are trying to do in more detail?

ADD REPLYlink written 23 months ago by Brian Bushnell16k

I have a reads count table corersponding to a metagenomics experiment. I have cleared the reads (removed contaminants and filtered low quality reads). I have assigned taxonomy to the reads and build a table. Now i have a table (each column being a sample and each row being a taxa). I want to compare samples (some are control samples and other are treated samples). Before running the comparison i need to normalize the data and that is why i´m asking if DESeq2 would be suitable for that (which seems to be the case from the paper i posted). I just want to know if others have done so ?

ADD REPLYlink modified 23 months ago • written 23 months ago by David150

I don't think normalization should be necessary, but if you want to do that, the simplest method would be to subsample all of your samples to the same number of reads (which would be the number of reads in your smallest sample). But since you already have the data in a table, you can "normalize" by multiplying all of the entries by (#reads in smallest sample)/(#reads in this sample).

ADD REPLYlink written 23 months ago by Brian Bushnell16k

DESeq2 normalization process assumes that most of the entities are the same in all of the samples. If you think that this assumption applies then you're good to go with DESeq2, you can go forward and do the entire analysis with it. If, however, this is not the case and the population is completely different then you have a problem and you wouldn't be able to state: "species X appears in condition A more than B" because you don't know if it was increased in A or just diluted by other species. You would be able to look at relations between pairs of species and state about these ratios.

ADD REPLYlink written 23 months ago by Asaf5.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour