Question: Chromvar for single cell ATAC seq
gravatar for jmah
4 months ago by
jmah10 wrote:

I want to find the master transcription factors (the transcription factors that govern the highest number of genes) per cell type in my animal. I will have single cell ATAC-seq and single cell RNAseq data. I was thinking of using the RNAseq data to group cells into metacells, then finding the most highly accessible peaks in those metacells using Chromvar. Would I be using as input the consensus peak set per metacell and then use the consensus peak set from all cells from all metacells as background? I am still a little confused about how Chromvar works after reading the Nature Methods paper and the online vignette.

ADD COMMENTlink modified 4 months ago by AidanQuinn40 • written 4 months ago by jmah10
gravatar for AidanQuinn
4 months ago by
United States
AidanQuinn40 wrote:

Do you have single cell mRNA expression and genome accessibility in the same cells?? If not, you'll need to think carefully about how you are clustering your cells and mapping clusters in one space to those in the other (i.e. how do you say that one RNA-seq cluster of cells is biologically equivalent to some other cluster of cells in ATAC-seq space?).

You may want to have a look at the SnapATAC package, which seems to out preform ChromVar in clustering single cells in ATAC-seq space (see this manuscript). Basically what I would suggest you do is first cluster your cells in ATAC-seq space, then call peaks and perform differential accessibility analysis among the clusters and finally perform your motif analysis on those differentially accessible features.

To connect these data to mRNA expression (single cell or otherwise) I would annotate your ATAC-seq peaks according to some sensible cis-regulatory element, mRNA expression joint network or HiC data in the relevant cell type.

ADD COMMENTlink written 4 months ago by AidanQuinn40

That's a really thoughtful response. I didn't think of the consequences of assuming that clustering in one space may not correspond to clustering in another. In fact, I would need to check that by clustering by RNAseq and motifs, and seeing if those clusters overlap.

Thanks for your help! You've given me much to think about.

ADD REPLYlink written 4 months ago by jmah10

Do you have a preferred pipeline for CRE-gene assignments? HiC often data do not offer te necessary resolution for reasonable assignments and/or are not available for many cell types. I personally work with correlation-based methods right now, correlating ATAC-seq and RNA-seq counts and testing for significant linear (Pearson) correlation within TAD boundaries, basically as implemented in InTAD (BioC). This obviously requires many replicates and should cover cell types that show some dynamics in both ATAC-seq and RNA-seq to get reasonable correlations while avoiding false correlations due to outliers, so a lot of limitations and pitfalls. If you have good experiences with an alternative strategy please feel free to share, I would be very interested in that.

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2150 users visited in the last hour