tSNE and UMAP of scATAC-seq data looks like spaghetti
6 weeks ago
smurph50 ▴ 20

I would like to use R to generate cluster my 20k cells from a single cell ATAC-seq experiment.

I ran PCA then selected the first 50 components, which were put into tSNE's normalize_input() then Rtsne(). This is the result I get.

I tried multiple perplexities from 5 to 50, number of components from 20 to 200, and UMAP. However the results were roughly the same.

Do you know what could cause this? I did not filter out peaks before running this because I am not sure what cutoff to use.

Show your full code or we're just going to be guessing at what you did. I'd recommend using a framework meant for scATAC analysis like ArchR or Signac.

Thanks Jared! I will use one of those to generate a tsne rather than troubleshooting. Here is my code.

mtx <- as.matrix(readMM("matrix.mtx"))

pca <- prcomp(mtx)

norm_mtx <- normalize_input(as.matrix(pca[,1:200]))

set.seed(42)

tsne_out <- Rtsne(norm_mtx,perplexity = 50)

I strongly recommend to follow a guided tutorial and use a dedicated package as mentioned above.

Points that may miss here:

• normalization for at least read depth
• feature selection (informative regions, separate from non-changing regions)
• details on PCA parameters
• reasonable defaults for everything related to scATAC data characteristics (extreme sparseness)

For single-cell RNA-seq I follow https://bioconductor.org/books/release/OSCA/ but I have no hands-on with scATAC-seq so far. I guess just following ArchR or Signac vignettes will save you a lot of trouble.

Have you tried normalizing the data before PCA?

5 weeks ago
James • 0

For UMAP try to increase negative_sample_rate to 25 or 50; For t-SNE try to use much large perplexity, e.g. 1000.