Question

using scanpy to merge objects

0

Entering edit mode

10 months ago

Andy ▴ 120

Good afternoon,

I meet a question to merge my objects by using scanpy.

As you can see in the following figure, each objects has obs * var values. But after I merge them the adata has 0 vars.

The code I used is:

adatas=[sk1, sk2, sk3, b7]
adatas=ad.concat(adatas, merge='same')

I am wondering how to solve the problem. Thanks.

enter image description here

scanpy scRNA-seq single-cell • 4.0k views

ADD COMMENT • link updated 7 months ago by yl759 ▴ 120 • written 10 months ago by Andy ▴ 120

0

Entering edit mode

The way how I input the object, I believe I did them right.

For b7 which is a civ file, I did:

b7=sc.read_text("path/to/file/file.csv", delimiter=',').T

b7.var_names_make_unique()

sk1=sc.read_10x_h5('path/to/file/filtered_feature_bc_matrix.h5', genome=None, gex_only=True) sk1.var_names_make_unique()

ADD REPLY • link 10 months ago by Andy ▴ 120

0

Entering edit mode

I have an assumption that I may have obtained the 'wrong' CSV file. I obtained this CSV file from an RDS file, and I believe I made a mistake. I double-checked and realized that I lost the meta.data, which will cause significant trouble in the subsequent steps. Therefore, my question now is: How can I obtain a matrix with meta.data from an RDS file?

ADD REPLY • link 10 months ago by Andy ▴ 120

score 1 · Answer 1 · 2023-09-14

My understanding of your goal is to combine multiple 10x datasets for your downstream analysis. In that case, according to the documentation, you will lose a lot of data that are unique in each object by using the merge option.

”same”: Elements that are the same in each of the objects.

I think you should do:

adatas = [sk1, sk2, sk3, b7]
adatas = ad.concat(adatas,  join="outer")

To better label your objects for the downstream analysis, you could try to do:

adatas = {"sk1": sk1, "sk2": sk2, "sk3": sk3, "b7": b7}
adatas = ad.concat(adatas, label="dataset_name",  join="outer")

By giving a dictionary and setting the label argument, the concatenated AnnData object will have an additional column called dataset_name, indicating the original object name. You can add index_unique="a_string" to make your index unique, otherwise, the default option will keep the original index. Also, because you are using the "outer" option, you should set a default value for the missing cells by adding fill_value=Optional[Any].