Hello Megan users! I have some questions for you. I couldn't find my answers through Megan User's Manual, papers, online tutorials or even here in the specialized forums. I have been using MEGAN for over 2 years.

My dataset includes 7 metagenomic samples (454) of a water reservoir, 4 over an annual sampling and 3 spatially distributed in one of the annual samples, so the sampling design is like a cross + (one of the samples is the intersect). I use do treat this samples like 2 sub-datasets, 1 for temporal analysis and another to spatial analysis, since the relations that I expect to see are related to time in first, and to space in second. I know it seems like obvious, but you're gonna to see why I am telling you this.

Concerning the comparison of multiple samples: First of all, I am using the square root normalization, any advice, please. So, the thing is, when I perform a multiple samples comparison over my entire dataset (time and space together) and when I perform the comparison separately for time and for space, I got some discrepancies in terms of assignments. For example, when comparing the entire dataset, the ortholog group K02977 hits the samples 1 and 2. When comparing just time samples, it hits 1 and 3. Worse than this: K02997 hits time on 2, space on 4 and for the entire dataset it hits 5 and 8! OK, I know that it is about the reordering and reassignment inherent to the algorithm, but I would like to better understand why this happens, so I could be sure about using time and space separately or the entire dataset. Now I am looking all together, because my goal is, in fact, look for uniqueness over the entire dataset, regardless time and space, but for another goals, could be better treat it separately... Am I right? It is a matter of goal? It is not a error, or a fail, it is just a matter of interpretation/understanding? Could anyone please explain why this differential assigns happens? Should I keep looking all in 3-ways? all-together, timeXspace and each sample isolated?

PS: I read in a online tutorial that under multiple sample comparison analysis this skewness could happens, mainly over less common assigns, but to what extent could it be a real problem? Should I give up of less commons assigns, or give up using compare dialog, or it is ok and, looking carefully, could not be a problem?

Thank you very much

Best,

Marcele

Your question is too long and too complicated. To be successful in getting an answer you should simplify it greatly - right now it is too difficult to understand all the details - moreover most of the details seem to be unnecessary to answer the question.

I don't fully understand your question myself but if you are asking why is it that some results differ when you use a subset of the data relative to the full set of data - that has a very simple explanation. A subset will have different background and other frequencies and the method is likely accounting for that.