Question: scRNA-seq and Chip-seq data integration
0
gravatar for carlosalfonsogonzalez6
20 months ago by
carlosalfonsogonzalez610 wrote:

Dear All,

I'm currently trying to integrate, a single cell data of the Drosophila embryo, and also Chip-seq experiments from same embryo, same stage. Which statistical framework would you suggest to establish a clear relation between Chip-seq enrichment (of common epigenetics marks e.g. H3k27me2) and how this could be affecting the determination of cell-clusters? Rather than simple correlation test.

Thanks

ADD COMMENTlink modified 20 months ago by Kevin Blighe55k • written 20 months ago by carlosalfonsogonzalez610
2
gravatar for Kevin Blighe
20 months ago by
Kevin Blighe55k
Kevin Blighe55k wrote:

Hola Carlos,

Instead of doing simple correlation, you could model the relationship between each epigenetic signal and the expression of genes surrounding the signal. What do I mean by 'model'? I mean build a linear regression model, as follows:

lm(NearbyGene1 ~ mark1H3k27me2)
lm(NearbyGene2 ~ mark1H3k27me2)
lm(NearbyGene3 ~ mark1H3k27me2)
lm(NearbyGene4 ~ mark1H3k27me2)
...
lm(NearbyGene1 ~ mark2H3k27me2)
lm(NearbyGene2 ~ mark2H3k27me2)
...
lm(NearbyGene1 ~ mark3H3k27me2)
lm(NearbyGene2 ~ mark4H3k27me2)

You will have to set this up as a loop. To use model formulae in a loop, you can create the model equation with paste() and then coerce it into a formula acceptable to the lm() function with as.formula().

To extract information from a model, use the summary() function - there are ways of extracting each individual value via subsetting.

The benefit of using a model is that you can also adjust for other covariates / confounding factors, for example:

lm(NearbyGene1 ~ m2H3k27me2 + TissueType)

Take a look here for other information related to linear regression models (and there's tonnes of information across the World Wide Web, too): A: Resources for gene signature creation

Kevin

ADD COMMENTlink modified 20 months ago • written 20 months ago by Kevin Blighe55k

Hi Kevin, This is a great answer, thank you very much. How many genes would you consider to test for the "Nearby Gene" comparisons?

ADD REPLYlink written 20 months ago by carlosalfonsogonzalez610

You could just begin with, literally, each gene that is up- and down-stream of the H3K27 methylation site. If needed, you could extend it to include genes in a larger locus.

ADD REPLYlink written 20 months ago by Kevin Blighe55k

What do you think If I use logistic regression?

ADD REPLYlink written 17 months ago by carlosalfonsogonzalez610

Sure, but, what are your x and y variables going into the model?

ADD REPLYlink written 17 months ago by Kevin Blighe55k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1473 users visited in the last hour