In Scanpy, how to RDS file and merge it with other Scanpy objects.
1
2
Entering edit mode
9 months ago
Andy ▴ 120

I have an RDS file that includes several Seurat objects that I want to use. I want to read it into scanpy and merge it with another file. What I did was convert the RDS file to an h5ad file and then read it into scanpy. However, when I merged all my files together, I encountered an error where my merged object had 0 vars. I hope you could offer me some suggestions.

Thanks Andy

scanpy file RDS • 2.4k views
ADD COMMENT
5
Entering edit mode
9 months ago
fracarb8 ★ 1.6k

I usually export everything I need manually

# save metadata table:
srtObject@meta.data$barcode <- rownames(srtObject@meta.data)
srtObject@meta.data$UMAP_1 <- srtObject@reductions$umap@cell.embeddings[,1]
srtObject@meta.data$UMAP_2 <- srtObject@reductions$umap@cell.embeddings[,2]
write.csv(srtObject@meta.data, file='/data/metadata.csv', quote=F, row.names=F)

# write expression counts matrix
library(Matrix)
counts_matrix <- GetAssayData(srtObject, assay='RNA', slot='counts')
writeMM(counts_matrix, file=paste0("/data/", 'counts.mtx'))

# write dimesnionality reduction matrix
write.csv(srtObject@reductions$pca@cell.embeddings, file='data/pca.csv', quote=F, row.names=F)

# write gene names
write.table(data.frame('gene'=rownames(counts_matrix)),file='data/gene_names.csv', quote=F,row.names=F,col.names=F)

Once everything is saved, I load it into scanpy

# load sparse matrix :
X = io.mmread("data/counts.mtx")

# create anndata object
adata = anndata.AnnData(X=X.transpose().tocsr() )

# load cell metadata:
cell_meta = pd.read_csv("data/metadata.csv")

# load gene names:
with open("data/gene_names.csv", 'r') as f:
    gene_names = f.read().splitlines()

# set anndata observations and index obs by barcodes, var by gene names
adata.obs = cell_meta
adata.obs.index = adata.obs['barcode']
adata.var.index = gene_names

# load dimensional reduction:
pca = pd.read_csv("data/pca.csv")
pca.index = adata.obs.index

# set pca and umap
adata.obsm['X_pca'] = pca.to_numpy()
adata.obsm['X_umap'] = np.vstack((adata.obs['UMAP_1'].to_numpy(), adata.obs['UMAP_2'].to_numpy())).T

# check the uMAP
sc.pl.umap(adata, color='cellType_final', add_outline=True, legend_loc='on data', legend_fontsize=8, legend_fontoutline=2,frameon=False, title='')

Probably a bit old fashion and not optimal for dealing with multiple files, but it can easily be put into a function.

ADD COMMENT
0
Entering edit mode

Thanks! I will try it out!

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6