Question: Limma/edgeR or other for metagenomic (non-RNA) data?
0
gravatar for gaio.transposon
5 months ago by
gaio.transposon0 wrote:

Hi everyone,

I have been stuck a few days on this problem, so I thought I'd better ask. This is not just any edgeR question as the input data is different than usual (non-RNA). So if anyone knows of a better tool to run this analysis please tell me, otherwise just assume the data is RNA counts data.

Data: unnormalized counts (derived from depth i.e. mapped reads against a metagenomic bin). A metagenomic bin corresponds to an organism (bacterium species) so obviously its abundance is correlated with the abundance of other species within the same sample, but it may not be relevant to the question here.

Data structure: (every subject belongs to either C or T group and is sampled twice at t=0 and t=1)

df <- data.frame(timepoint = rep_len(0:1, length.out=16),
         subjectID = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
         group = c(rep("C", 8), rep("T", 8)))

Aim: find differentially abundant metagenomic bins between the two cohorts (C vs T) i.e. which genomes go up/down between t=0 and t=1 in a different way between the two cohorts?

My code so far:

#clean dataframe by selecting rows with only a certain amount of NA values
input_df <- df[rowSums( is.na(df ) <=80, ]
#replace missing values with zeros
input_df[is.na(input_df)] <- 0

Prepare the count data (convert countdata first column (genomes) to rownames):

count_data <- data.frame(input_df[,-1], row.names=input_df[,1], check.names = FALSE)

Run voom:

count_data_2 <- voom(count_data)

Prepare design matrix: (I am creating 4 groups: C_t0, C_t1, T_t0, T_t1)

Treat <- factor(paste(metadata$cohort,metadata$date,sep="."))
design <- model.matrix(~0+Treat)
colnames(design) <- levels(Treat)

Block on subject: (gives me warnings)

corfit <- duplicateCorrelation(count_data_2,design,block=metadata$subject)
corfit$consensus

Fit and make contrasts:

fit <- lmFit(count_data_2,design,block=metadata$subject,correlation=corfit$consensus)
cm <- makeContrasts(
  T1vT0 = Treatment.1-Treatment.0,
  C1vC0 = Control.2-Control.1,
  T0vC0 = Treatment.1-Control.1,
  T1vC1 = (Treatment.1-Treatment.0)-(Control.2-Control.1),
  levels = design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
top20 <- topTable(fit2, n = 20, coef="T1vC1", adjust.method = "BH")

Q: Is this setup/my code correct? Can it be improved? Do I get in this way what I am after?

I also tried edgeR (instead of limma/voom) and I was getting seemingly more real p-values (as in: closer to p=0.005, whereas now with limma/voom I am getting suspiciously tiny p-values).

Thank you so much for your help!

Kind Regards Dany

ADD COMMENTlink modified 5 months ago by Fabio Marroni2.5k • written 5 months ago by gaio.transposon0
0
gravatar for Fabio Marroni
5 months ago by
Fabio Marroni2.5k
Italy
Fabio Marroni2.5k wrote:

I suggest you use the fitZig algorithm in metagenomeSeq package, which has been developed esplicitly for this, and that always gave me good results.

ADD COMMENTlink written 5 months ago by Fabio Marroni2.5k

Thank you Fabio. I remember reading a paper that compared metagenomeseq to edgeR among others and metagenomeseq would not perform really well: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2386-y

Did you use metegenomeseq when you had two groups to compare, each with => 2 timepoints?

ADD REPLYlink written 5 months ago by gaio.transposon0

Sorry, not yet (only single time-point). I will do it shortly, and post developments if I find something that might be of help.

ADD REPLYlink written 5 months ago by Fabio Marroni2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 975 users visited in the last hour