Question: Bio-statistics for metatranscriptomic
2.1 years ago
amelie.rouger0 wrote:

Hi everybody, I am trying to identified differentially expressed (DE) genes of several bacterial metatranscriptomes.

To sum up the experimental plan. We have 2 bacterial communities (named E and U) inoculated under 3 different conditions A, B and C (reference condition). I collected both DNA and RNA from time 7 and 9 and also the experiment was performed in duplicate. We sequenced RNA (each 24 samples) and metagenome (from pooled of DNA of several samples).

I assembled and annotated metagenome (141 921 contigs) and use it as reference to map RNA reads with bowtie2.

I performed DESeq2 analysis on those result and I obtained a large part of DE gene. So I used edgeR to change the normalization way (I had tested TMM, RLE and upperquartile methods) and I obtained around 10 000 DE genes for each comparison of condition (AvsC BvsC and AvsB).

The problem is that I don't know what is the next step. With a small list of DE gene I would blast the sequence on NCBI and try to identified from which bacterial species, this gene came from and try to identified metabolism pathwas by hand. But with a too large number of DE gene, I don't know how to do.

Do you think that I have to find another way of normalization to reduce the number of DE gene? Or do you have any idea of software I could use to group those gene by "category" or "class" to go throught the metabolic pathway?

Thank's for your help.


Did you map the RNA to the DNA? Maybe there are whole genomes which are DE, which is interesting by itself but in a different context, this might narrow down the list of genes.

Asaf5.4k

How did you annotate the metagenome assembly?

h.mon24k

Thank's for answering. Yes I used metagenome to map the RNA reads. I assembled metagenome with IDBA and annotated it with PROKKA.

When I used available genome on ncbi to map the RNA reads I aligned only 30% of reads, that why we used metagenomes to map RNA reads.

