Question

How To Do Pathway Analysis With Rna Data Which Are Not Pure

1

Entering edit mode

10.6 years ago

Charly ▴ 70

Hello all,

I have a problem where I will be so grateful for your assistance.

We have some transcriptome (RNA) sequencing data of a “new” eukaryote species with no reference genome. The RNA data we have are not pure.

We have two different groups of the Illumina reads: in the first group, the “new” species is contaminated by a particular bacteria and in the second, these bacteria are not present... But the two groups of data contain other eukaryotes or prokaryotes species. We were able to assemble the reads (from both groups) into a single large file of contigs.

The goal of my analysis is to:

(1) Perform PATHWAYS ANALYSIS to determine/annotate the genes involve in the production of a particular protein...thus determining some kind of connection/network among genes. We need to know the statistics (p-value, threshold, abundance, transcript related to a gene…) attached to each pathway. Something that KEGG-KAAS does not offer.

Could you pls suggest another software to use? Ingenuity is out of the question since it is not free.

(2) My second task is to determine the genes which are differentially expressed between the two groups, that is, the “new” species with the bacteria group and the one which doesn't contain the bacteria.

Pls, feel free to help by suggesting some interesting resources to use.

Thank you,

Merleau

analysis • 3.3k views

ADD COMMENT • link updated 10.6 years ago by Richard Smith-Unna ▴ 140 • written 10.6 years ago by Charly ▴ 70

Ram · Answer 1 · 2013-09-15

First, you should decontaminate your reads. If you know that you've got bacterial contamination, and you know which bacteria are contaminating, you can filter out those reads by mapping them against the bacterial genomes using bowtie2 with the --un-conc setting to output the reads which don't align concordantly.

For example, if you have three bacteria that you know are contaminating, you can concatenate all their genomes in to one FASTA file. Then use bowtie2 to keep only the clean reads:

bowtie2-build bacterialgenomes.fa bacterialgenomes
bowtie2 --un-conc cleaned -x bacterialgenomes leftreads.fq rightreads.fq

then you'll have cleaned.leftreads.fq and cleaned.rightreads.fq.

You can do the same for the contaminating eukaryotic genomes if you know which they are.

Then redo the assembly.

Then you can do your differential expression - I would recommend using eXpress or RSEM to quantify expression (results are almost the same between the two), and then baySeq, DESeq or EBSeq to do the differential expression analysis.