Question: scRNA-seq and Chip-seq data integration
0
20 months ago by
carlosalfonsogonzalez610 wrote:

Dear All,

I'm currently trying to integrate, a single cell data of the Drosophila embryo, and also Chip-seq experiments from same embryo, same stage. Which statistical framework would you suggest to establish a clear relation between Chip-seq enrichment (of common epigenetics marks e.g. H3k27me2) and how this could be affecting the determination of cell-clusters? Rather than simple correlation test.

Thanks

modified 20 months ago by Kevin Blighe55k • written 20 months ago by carlosalfonsogonzalez610
2
20 months ago by
Kevin Blighe55k
Kevin Blighe55k wrote:

Hola Carlos,

Instead of doing simple correlation, you could model the relationship between each epigenetic signal and the expression of genes surrounding the signal. What do I mean by 'model'? I mean build a linear regression model, as follows:

``````lm(NearbyGene1 ~ mark1H3k27me2)
lm(NearbyGene2 ~ mark1H3k27me2)
lm(NearbyGene3 ~ mark1H3k27me2)
lm(NearbyGene4 ~ mark1H3k27me2)
...
lm(NearbyGene1 ~ mark2H3k27me2)
lm(NearbyGene2 ~ mark2H3k27me2)
...
lm(NearbyGene1 ~ mark3H3k27me2)
lm(NearbyGene2 ~ mark4H3k27me2)
``````

You will have to set this up as a loop. To use model formulae in a loop, you can create the model equation with `paste()` and then coerce it into a formula acceptable to the `lm()` function with `as.formula()`.

To extract information from a model, use the `summary()` function - there are ways of extracting each individual value via subsetting.

The benefit of using a model is that you can also adjust for other covariates / confounding factors, for example:

``````lm(NearbyGene1 ~ m2H3k27me2 + TissueType)
``````

Take a look here for other information related to linear regression models (and there's tonnes of information across the World Wide Web, too): A: Resources for gene signature creation

Kevin

Hi Kevin, This is a great answer, thank you very much. How many genes would you consider to test for the "Nearby Gene" comparisons?

You could just begin with, literally, each gene that is up- and down-stream of the H3K27 methylation site. If needed, you could extend it to include genes in a larger locus.

What do you think If I use logistic regression?