Question

Limma/edgeR or other for metagenomic (non-RNA) data?

0

Entering edit mode

4.5 years ago

gaio.transposon • 0

Hi everyone,

I have been stuck a few days on this problem, so I thought I'd better ask. This is not just any edgeR question as the input data is different than usual (non-RNA). So if anyone knows of a better tool to run this analysis please tell me, otherwise just assume the data is RNA counts data.

Data: unnormalized counts (derived from depth i.e. mapped reads against a metagenomic bin). A metagenomic bin corresponds to an organism (bacterium species) so obviously its abundance is correlated with the abundance of other species within the same sample, but it may not be relevant to the question here.

Data structure: (every subject belongs to either C or T group and is sampled twice at t=0 and t=1)

df <- data.frame(timepoint = rep_len(0:1, length.out=16),
         subjectID = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
         group = c(rep("C", 8), rep("T", 8)))

Aim: find differentially abundant metagenomic bins between the two cohorts (C vs T) i.e. which genomes go up/down between t=0 and t=1 in a different way between the two cohorts?

My code so far:

#clean dataframe by selecting rows with only a certain amount of NA values
input_df <- df[rowSums( is.na(df ) <=80, ]
#replace missing values with zeros
input_df[is.na(input_df)] <- 0

Prepare the count data (convert countdata first column (genomes) to rownames):

count_data <- data.frame(input_df[,-1], row.names=input_df[,1], check.names = FALSE)

Run voom:

count_data_2 <- voom(count_data)

Prepare design matrix: (I am creating 4 groups: C_t0, C_t1, T_t0, T_t1)

Treat <- factor(paste(metadata$cohort,metadata$date,sep="."))
design <- model.matrix(~0+Treat)
colnames(design) <- levels(Treat)

Block on subject: (gives me warnings)

corfit <- duplicateCorrelation(count_data_2,design,block=metadata$subject)
corfit$consensus

Fit and make contrasts:

fit <- lmFit(count_data_2,design,block=metadata$subject,correlation=corfit$consensus)
cm <- makeContrasts(
  T1vT0 = Treatment.1-Treatment.0,
  C1vC0 = Control.2-Control.1,
  T0vC0 = Treatment.1-Control.1,
  T1vC1 = (Treatment.1-Treatment.0)-(Control.2-Control.1),
  levels = design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
top20 <- topTable(fit2, n = 20, coef="T1vC1", adjust.method = "BH")

Q: Is this setup/my code correct? Can it be improved? Do I get in this way what I am after?

I also tried edgeR (instead of limma/voom) and I was getting seemingly more real p-values (as in: closer to p=0.005, whereas now with limma/voom I am getting suspiciously tiny p-values).

Thank you so much for your help!

Kind Regards Dany

metagenomics edgeR voom paired data • 1.8k views

ADD COMMENT • link updated 4.5 years ago by Fabio Marroni ★ 3.0k • written 4.5 years ago by gaio.transposon • 0

score 0 · Answer 1 · 2019-10-28

0

Entering edit mode

4.5 years ago

Fabio Marroni ★ 3.0k

I suggest you use the fitZig algorithm in metagenomeSeq package, which has been developed esplicitly for this, and that always gave me good results.

ADD COMMENT • link 4.5 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Thank you Fabio. I remember reading a paper that compared metagenomeseq to edgeR among others and metagenomeseq would not perform really well: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2386-y

Did you use metegenomeseq when you had two groups to compare, each with => 2 timepoints?

ADD REPLY • link 4.5 years ago by gaio.transposon • 0

0

Entering edit mode

Sorry, not yet (only single time-point). I will do it shortly, and post developments if I find something that might be of help.

ADD REPLY • link 4.5 years ago by Fabio Marroni ★ 3.0k