Question: How To Do Pathway Analysis With Rna Data Which Are Not Pure
gravatar for Charly
7.1 years ago by
Charly60 wrote:

Hello all,

I have a problem where I will be so grateful for your assistance.

We have some transcriptome (RNA) sequencing data of a “new” eukaryote species with no reference genome. The RNA data we have are not pure.

We have two different groups of the Illumina reads: in the first group, the “new” species is contaminated by a particular bacteria and in the second, these bacteria are not present... But the two groups of data contain other eukaryotes or prokaryotes species. We were able to assemble the reads (from both groups) into a single large file of contigs.

The goal of my analysis is to:

(1) Perform PATHWAYS ANALYSIS to determine/annotate the genes involve in the production of a particular protein...thus determining some kind of connection/network among genes. We need to know the statistics (p-value, threshold, abundance, transcript related to a gene…) attached to each pathway. Something that KEGG-KAAS does not offer.

Could you pls suggest another software to use? Ingenuity is out of the question since it is not free.

(2) My second task is to determine the genes which are differentially expressed between the two groups, that is, the “new” species with the bacteria group and the one which doesn't contain the bacteria.

Pls, feel free to help by suggesting some interesting resources to use.

Thank you,


analysis • 2.7k views
ADD COMMENTlink modified 7.1 years ago by Richard Smith-Unna130 • written 7.1 years ago by Charly60
gravatar for Richard Smith-Unna
7.1 years ago by
Richard Smith-Unna130 wrote:

First, you should decontaminate your reads. If you know that you've got bacterial contamination, and you know which bacteria are contaminating, you can filter out those reads by mapping them against the bacterial genomes using bowtie2 with the --un-conc setting to output the reads which don't align concordantly.

For example, if you have three bacteria that you know are contaminating, you can concatenate all their genomes in to one FASTA file. Then use bowtie2 to keep only the clean reads:

bowtie2-build bacterialgenomes.fa bacterialgenomes
bowtie2 --un-conc cleaned -x bacterialgenomes leftreads.fq rightreads.fq

then you'll have cleaned.leftreads.fq and cleaned.rightreads.fq.

You can do the same for the contaminating eukaryotic genomes if you know which they are.

Then redo the assembly.

Then you can do your differential expression - I would recommend using eXpress or RSEM to quantify expression (results are almost the same between the two), and then baySeq, DESeq or EBSeq to do the differential expression analysis.

ADD COMMENTlink modified 21 months ago by RamRS30k • written 7.1 years ago by Richard Smith-Unna130

Thank you dear Richard for your feedback. Unfortunately, I can not clean my reads because we don't know the different eukaryotes and prokaryotes organisms contaminating them.

Since we don't have a genome or DNA data, it is difficult to use most of the software you suggested. At the moment, I am looking at MEGAN and Pathway-Guide.

ADD REPLYlink written 7.1 years ago by Charly60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1956 users visited in the last hour