Question: Correcting for functional annotation database bias
gravatar for adityabandla
3.5 years ago by
adityabandla20 wrote:

In metagenomics datasets, it is standard practice to correct samples for (a) differences in sequencing effort (library size) and (b) normalise gene counts based on the total annotated hits per sample to obtain relative abundances

However, most databases on functional genes such as SEED or KEGG are biased, such that genes involved in central metabolism are better annotated. Hence, categories such as Carbohydrate metabolism and protein synthesis often dominate function profiles as result of this bias. Most articles do not correct for this database bias.

What are the common ways of accounting for this bias?

seed kegg metagenomics • 848 views
ADD COMMENTlink modified 3.5 years ago by Josh Herr5.7k • written 3.5 years ago by adityabandla20
gravatar for Josh Herr
3.5 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

We recognize the bias and discuss it. I'm not aware of any efforts to account or quantify these biases yet.

Much of our ability to understand this bias relies on laboratory validation and experiments to identify unknown functions, so we have a lot of work to do to fill in these gaps.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Josh Herr5.7k

Hi Josh

Thanks for the answer. Where I am coming from is the following. People generally use arbitrary cutoffs for shortlisting genes, say for plotting. For example, SEED subsystems that are 0.1% in relative abundance and greater. In this case, subsystems with more annotated genes and hence more classified reads tend to dominate and drown out the other subsystems.

Is it OK to consider each subsystem separately and calculate relative abundances of genes per subsystem instead of normalising agains total reads?

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by adityabandla20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2142 users visited in the last hour