Correlation between two different datasets: between results of RNAseq and absence/presence of Type3 Secretion System
1
0
Entering edit mode
7 weeks ago

Dear All,

I have a "How would you solve" kind of question. I have two sets of tables : 1. Log2FoldChange table and 2. Effectors Table.

Firstly, the Log2FoldChange table was obtained by performing DESeq analysis of 14 different infected samples being compared to Control and then obtaining the foldchange values from DEseq for each sample and then merging all the 14 different log2foldchnage columns into a single table, based on genes (each row is a unique gene). This table is 22000 * 14. So there are 22414 unique genes for 14 different strains in this table.

Secondly, present/absent-effector list for all 14 strains. So it tells us which effectors are present in each strain (they all have different sets of effectors). This is a 50 * 14 table for the same set of 14 strains, with each unique effector enlisted in a row and indicating either 0 or 1 for absence or presence in the rows.

What we want to investigate is: is there a correlation between the presence/absence of effectors and the gene expression in the host? Essentially , we would like to obtain the correlation between these two separate datasets?

Any ideas/suggestions on how to go about solving this problem would be very helpful and useful. My Initial idea is to carry out a Canonical Correlation Analysis (CCA) and I am still working on it. But I am open to more ideas and suggestions from the community.

Thanks in advance for our time and suggestions.

Correlation DESeq RNAseq • 556 views
1
Entering edit mode
7 weeks ago
LChart 840

CCA is an interesting idea; but one drawback is the binary nature of the effector matrix typically does not work that well with L2 objectives; and I worry you're not going to get interpretable loadings on your effectors.

If you have sufficiently many replicates, you can do this directly in DESeq2 by including the effectors as a variable: ~ effector.A.status + strain + 0 (just cbind the effector matrix to your metadata). Because the effector status doesn't vary within strain, this will basically "pull out" an estimate of the group average of effector-positive and effector-negative strains, allowing for a direct comparison.

To look for more complicated patterns, you could select the differentially expressed genes across all strains, cluster the expression matrix on those genes, and overlay the effector status (I use a heatmap for such things).

Traffic: 832 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.