Normalization methods for metagenomics WGS data (not 16S data)
1
0
Entering edit mode
4.0 years ago
David ▴ 200

Hi, I have illumina 2x150 bp from a metagenomics experiment. This is not 16S data but WGS data.

I was wondering if either DEseq2 or edgeR can be used for data normalization in this context ? I know these two are normally used for RNA data but it looks they might also perform well with DNA data (based on this article "Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics"

deseq2 edgeR normalization • 2.5k views
0
Entering edit mode

If you have a table with read counts then DESeq2 should work, the main issue here is what reference to use.

0
Entering edit mode

I have the table with read counts. What do you mean by reference to use ??

0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

0
Entering edit mode

Perhaps you can clarify what you are trying to do in more detail?

0
Entering edit mode

I have a reads count table corersponding to a metagenomics experiment. I have cleared the reads (removed contaminants and filtered low quality reads). I have assigned taxonomy to the reads and build a table. Now i have a table (each column being a sample and each row being a taxa). I want to compare samples (some are control samples and other are treated samples). Before running the comparison i need to normalize the data and that is why i´m asking if DESeq2 would be suitable for that (which seems to be the case from the paper i posted). I just want to know if others have done so ?

0
Entering edit mode

I don't think normalization should be necessary, but if you want to do that, the simplest method would be to subsample all of your samples to the same number of reads (which would be the number of reads in your smallest sample). But since you already have the data in a table, you can "normalize" by multiplying all of the entries by (#reads in smallest sample)/(#reads in this sample).

0
Entering edit mode

DESeq2 normalization process assumes that most of the entities are the same in all of the samples. If you think that this assumption applies then you're good to go with DESeq2, you can go forward and do the entire analysis with it. If, however, this is not the case and the population is completely different then you have a problem and you wouldn't be able to state: "species X appears in condition A more than B" because you don't know if it was increased in A or just diluted by other species. You would be able to look at relations between pairs of species and state about these ratios.

0
Entering edit mode
16 months ago
shengwei ▴ 30

You can try ALDEx2 as well, it deals with compositional data using log-ratio transformation.