Dear All
I am trying to analyze co-transcriptome data from two enteric pathogens. These are new clinical isolates (X and Y). I have RNASeq reads from each species grown individually (X or Y) and from the co-growing culture (X+Y). The pattern of growth observed in in vitro cultures is that X suppresses growth rate of Y and we are trying to have a mechanistic explanation for this pattern.
For this, I used trinity for de novo transcriptome assembly and then RSEM (as an example of alignment based) or Kallisto (as an example of alignment-free). I then ran DESeq2 on the read counts from both RSEM and Kallisto and compared the differentially expressed genes from each case.
I get contradicting results when using both methods: using RSEM: > 2000 genes are sig up regulated in co-growth X+Y culture relative to individually grown X and < 100 down regulated. using Kallisto: > 1500 gene are sig down regulated in co-growth X+Y culture relative to individually grown X and ~300 are up regulated.
Also the skew in the number of DEG towards being up/down regulated is a bit suspicious.
My question is which method should I follow in this case ? What is a good approach for analyzing RNASeq from such an experimental setup ?
Any insights or help will be highly appreciated Thanks
You do not say how many biological replicates you have per condition. Different (correct) methods will arrive at different results, and even more so if the experimental design is insufficient.
Anyway, read the literature, decide on the method, and go ahead. The danger of trying too many methods is later cherry-picking the one with results suiting your expectations about the outcome.
Good advice. Kallisto (and like) has its applications but in case where the reference itself is not very solid it may not be the right tool.
That's basically why I thought it would be better to rely on de novo transcriptome assembly rather than a fragmented genome assembly (since I also have illumina DNA sequences for the same isolates). Is this assumption true ?
Also, neither methods seem to make sense at this point since the number of differentially expressed genes is unrealistic.
If organisms are very similar then this experimental approach is not likely to work in answering the question being posed.
Explanation in this case may turn out to be mundane e.g.. along the lines of organism X simply grows faster than Y in that culture conditions and out-competes it for nutrients. Have you looked at their growth rates independently in the same conditions? Can you provide some additional details about what the organisms in question are and what experimental conditions are being used.
The organisms are clinical isolates of Vibrio cholera (VCH) and Enterotoxigenic E. coli (ETEC). The growth rates of ETEC is faster than that of VCH when grown on M9+glucose or LB.
If that is the case then it is going to be difficult to use RNAseq data to find an explanation. Perhaps that is being reflected in the results you are seeing.
Since you have already done the expriment you could try the solution suggested by @h.mon below and see if that produces any useful results.
Thanks. I have three biological replicates from each culture and co-culture. Unfortunately, most of the literature is about handling co-transcitome from two different domains (eukaryote/prokaryote) which is a bit easier to handle since you can deal with each case as contaminant reads when attempting to quantify the other.
Have you checked the trinity assembly to see if it looks reasonable? Since you are working with bacteria you don't expect splicing (trinity is designed for eukaryotic transcriptomes). I wonder if you may be better off doing a normal assembly with SPAdes (or rnaSPAdes) instead.