Seurat obj conversion to Anndata not accurately creating var in a h5ad file
1
0
Entering edit mode
6 weeks ago
akh22 ▴ 50

Hi,

I have a following Seurat obj ;

> str(pbmc10k@assays)
List of 4
 $ RNA      :Formal class 'Assay' [package "Seurat"] with 8 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:24330253] 25 30 32 42 43 44 51 59 60 62 ...
  .. .. .. ..@ p       : int [1:10195] 0 4803 7036 11360 11703 15846 18178 20413 22584 27802 ...
  .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:36601] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:24330253] 1 2 1 1 1 3 1 1 1 1 ...
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:24330253] 25 30 32 42 43 44 51 59 60 62 ...
  .. .. .. ..@ p       : int [1:10195] 0 4803 7036 11360 11703 15846 18178 20413 22584 27802 ...
  .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:36601] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:24330253] 0.367 0.634 0.367 0.367 0.367 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num [1:2000, 1:10194] -0.0829 -0.2648 -0.195 -0.0133 1.2823 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2000] "PLEKHN1" "HES4" "ISG15" "LINC01342" ...
  .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. ..@ key          : chr "rna_"
  .. ..@ assay.orig   : NULL
  .. ..@ var.features : chr [1:2000] "PTGDS" "IGLC3" "PPBP" "CXCL10" ...
  .. ..@ meta.features:'data.frame':    36601 obs. of  5 variables:
  .. .. ..$ vst.mean                 : num [1:36601] 0 0 0 0.00392 0 ...
  .. .. ..$ vst.variance             : num [1:36601] 0 0 0 0.00391 0 ...
  .. .. ..$ vst.variance.expected    : num [1:36601] 0 0 0 0.00452 0 ...
  .. .. ..$ vst.variance.standardized: num [1:36601] 0 0 0 0.865 0 ...
  .. .. ..$ vst.variable             : logi [1:36601] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. ..@ misc         : NULL
 $ unspliced:Formal class 'Assay' [package "Seurat"] with 8 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:20057627] 8 46 53 69 70 71 72 73 84 89 ...
  .. .. .. ..@ p       : int [1:10195] 0 3837 6398 9489 10407 14220 16595 18856 20831 24609 ...
  .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:36601] "BX004987.1" "AC145212.1" "MAFIP" "AC011043.1" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:20057627] 1 3 1 1 2 1 1 9 1 1 ...
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:20057627] 8 46 53 69 70 71 72 73 84 89 ...
  .. .. .. ..@ p       : int [1:10195] 0 3837 6398 9489 10407 14220 16595 18856 20831 24609 ...
  .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:36601] "BX004987.1" "AC145212.1" "MAFIP" "AC011043.1" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:20057627] 1 3 1 1 2 1 1 9 1 1 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num[0 , 0 ] 
  .. ..@ key          : chr "unspliced_"
  .. ..@ assay.orig   : NULL
  .. ..@ var.features : logi(0) 
  .. ..@ meta.features:'data.frame':    36601 obs. of  0 variables
  .. ..@ misc         : NULL
 $ spliced  :Formal class 'Assay' [package "Seurat"] with 8 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:19059316] 48 49 50 54 59 60 61 65 70 72 ...
  .. .. .. ..@ p       : int [1:10195] 0 3840 5465 8968 9150 12398 14145 15854 17504 21746 ...
  .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:36601] "BX004987.1" "AC145212.1" "MAFIP" "AC011043.1" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:19059316] 1 1 3 1 1 1 1 1 1 2 ...
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:19059316] 48 49 50 54 59 60 61 65 70 72 ...
  .. .. .. ..@ p       : int [1:10195] 0 3840 5465 8968 9150 12398 14145 15854 17504 21746 ...
  .. .. .. ..@ Dim     : int [1:2] 36601 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:36601] "BX004987.1" "AC145212.1" "MAFIP" "AC011043.1" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:19059316] 1 1 3 1 1 1 1 1 1 2 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num[0 , 0 ] 
  .. ..@ key          : chr "spliced_"
  .. ..@ assay.orig   : NULL
  .. ..@ var.features : logi(0) 
  .. ..@ meta.features:'data.frame':    36601 obs. of  0 variables
  .. ..@ misc         : NULL
 $ SCT      :Formal class 'Assay' [package "Seurat"] with 8 slots
  .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:22836556] 12 17 21 22 23 45 59 60 74 79 ...
  .. .. .. ..@ p       : int [1:10195] 0 2675 4907 7438 8739 12423 14750 16985 19156 21564 ...
  .. .. .. ..@ Dim     : int [1:2] 20666 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:20666] "AL627309.1" "AL627309.5" "AL627309.4" "AL669831.2" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:22836556] 1 1 1 1 1 1 1 2 1 1 ...
  .. .. .. ..@ factors : list()
  .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. ..@ i       : int [1:22836556] 12 17 21 22 23 45 59 60 74 79 ...
  .. .. .. ..@ p       : int [1:10195] 0 2675 4907 7438 8739 12423 14750 16985 19156 21564 ...
  .. .. .. ..@ Dim     : int [1:2] 20666 10194
  .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. ..$ : chr [1:20666] "AL627309.1" "AL627309.5" "AL627309.4" "AL669831.2" ...
  .. .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. .. .. ..@ x       : num [1:22836556] 0.693 0.693 0.693 0.693 0.693 ...
  .. .. .. ..@ factors : list()
  .. ..@ scale.data   : num [1:3000, 1:10194] -0.1333 -0.5897 -1.0906 -0.0483 -0.0499 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3000] "PLEKHN1" "HES4" "ISG15" "AL390719.3" ...
  .. .. .. ..$ : chr [1:10194] "AAACCCACATAACTCG-1" "AAACCCACATGTAACC-1" "AAACCCAGTGAGTCAG-1" "AAACCCAGTGCTTATG-1" ...
  .. ..@ key          : chr "sct_"
  .. ..@ assay.orig   : NULL
  .. ..@ var.features : chr [1:3000] "GNLY" "IGKC" "S100A9" "S100A8" ...
  .. ..@ meta.features:'data.frame':    20666 obs. of  6 variables:
  .. .. ..$ sct.detection_rate   : num [1:20666] 0.003924 0.051893 0.001177 0.000687 0.067393 ...
  .. .. ..$ sct.gmean            : num [1:20666] 0.002724 0.038556 0.000816 0.000476 0.050365 ...
  .. .. ..$ sct.variance         : num [1:20666] 0.003909 0.064675 0.001176 0.000686 0.081901 ...
  .. .. ..$ sct.residual_mean    : num [1:20666] -0.00392 -0.01045 0.0048 -0.00904 -0.00616 ...
  .. .. ..$ sct.residual_variance: num [1:20666] 0.779 0.894 1.275 0.75 0.965 ...
  .. .. ..$ sct.variable         : logi [1:20666] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. ..@ misc         :List of 2
  .. .. ..$ vst.out  :List of 12
  .. .. .. ..$ model_str            : chr "y ~ log_umi"
  .. .. .. ..$ model_pars           : num [1:2000, 1:3] 0.137 14.691 2.243 2.874 0.746 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:2000] "B9D1" "XRCC5" "RPS12" "RPS3A" ...
  .. .. .. .. .. ..$ : chr [1:3] "theta" "(Intercept)" "log_umi"
  .. .. .. ..$ model_pars_outliers  : logi [1:2000] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. .. .. ..$ model_pars_fit       : num [1:20666, 1:3] 0.0475 0.4904 0.0527 0.0326 0.6443 ...
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:20666] "AL627309.1" "AL627309.5" "AL627309.4" "AL669831.2" ...
  .. .. .. .. .. ..$ : chr [1:3] "theta" "(Intercept)" "log_umi"
  .. .. .. .. ..- attr(*, "outliers")= logi [1:2000] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. .. .. ..$ model_str_nonreg     : chr ""
  .. .. .. ..$ model_pars_nonreg    : NULL
  .. .. .. ..$ arguments            :List of 24
  .. .. .. .. ..$ latent_var          : chr "log_umi"
  .. .. .. .. ..$ batch_var           : NULL
  .. .. .. .. ..$ latent_var_nonreg   : NULL
  .. .. .. .. ..$ n_genes             : num 2000
  .. .. .. .. ..$ n_cells             : num 5000
  .. .. .. .. ..$ method              : chr "poisson"
  .. .. .. .. ..$ do_regularize       : logi TRUE
  .. .. .. .. ..$ theta_regularization: chr "od_factor"
  .. .. .. .. ..$ res_clip_range      : num [1:2] -101 101
  .. .. .. .. ..$ bin_size            : num 500
  .. .. .. .. ..$ min_cells           : num 5
  .. .. .. .. ..$ residual_type       : chr "pearson"
  .. .. .. .. ..$ return_cell_attr    : logi TRUE
  .. .. .. .. ..$ return_gene_attr    : logi TRUE
  .. .. .. .. ..$ return_corrected_umi: logi TRUE
  .. .. .. .. ..$ min_variance        : num -Inf
  .. .. .. .. ..$ bw_adjust           : num 3
  .. .. .. .. ..$ gmean_eps           : num 1
  .. .. .. .. ..$ theta_estimation_fun: chr "theta.ml"
  .. .. .. .. ..$ theta_given         : NULL
  .. .. .. .. ..$ verbosity           : num 2
  .. .. .. .. ..$ verbose             : NULL
  .. .. .. .. ..$ show_progress       : NULL
  .. .. .. .. ..$ sct.clip.range      : num [1:2] -18.4 18.4
  .. .. .. ..$ genes_log_gmean_step1: Named num [1:2000] -2.4848 0.0795 1.6421 1.4171 -0.0858 ...
  .. .. .. .. ..- attr(*, "names")= chr [1:2000] "B9D1" "XRCC5" "RPS12" "RPS3A" ...
  .. .. .. ..$ cells_step1          : chr [1:5000] "CATGAGTTCGCGTCGA-1" "TCGCAGGCAGTCGTTA-1" "TCGTGCTAGGTTAAAC-1" "TTTGATCTCTCGCTTG-1" ...
  .. .. .. ..$ cell_attr            :'data.frame':  10194 obs. of  10 variables:
  .. .. .. .. ..$ orig.ident        : Factor w/ 1 level "10x10kPBMCdualIndex": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. ..$ nCount_RNA        : num [1:10194] 22575 7758 21733 860 15311 ...
  .. .. .. .. ..$ nFeature_RNA      : int [1:10194] 4803 2233 4324 343 4143 2332 2235 2171 5218 3199 ...
  .. .. .. .. ..$ nCount_spliced    : num [1:10194] 15555 4838 14972 225 9914 ...
  .. .. .. .. ..$ nFeature_spliced  : int [1:10194] 3840 1625 3503 182 3248 1747 1709 1650 4242 2502 ...
  .. .. .. .. ..$ nCount_unspliced  : num [1:10194] 12648 7323 10260 1509 12001 ...
  .. .. .. .. ..$ nFeature_unspliced: int [1:10194] 3837 2561 3091 918 3813 2375 2261 1975 3778 2856 ...
  .. .. .. .. ..$ percent.mt        : num [1:10194] 5.33 8.84 6.32 31.51 7.91 ...
  .. .. .. .. ..$ umi               : num [1:10194] 22575 7758 21733 860 15311 ...
  .. .. .. .. ..$ log_umi           : num [1:10194] 4.35 3.89 4.34 2.93 4.19 ...
  .. .. .. ..$ gene_attr            :'data.frame':  20666 obs. of  5 variables:
  .. .. .. .. ..$ detection_rate   : num [1:20666] 0.003924 0.051893 0.001177 0.000687 0.067393 ...
  .. .. .. .. ..$ gmean            : num [1:20666] 0.002724 0.038556 0.000816 0.000476 0.050365 ...
  .. .. .. .. ..$ variance         : num [1:20666] 0.003909 0.064675 0.001176 0.000686 0.081901 ...
  .. .. .. .. ..$ residual_mean    : num [1:20666] -0.00392 -0.01045 0.0048 -0.00904 -0.00616 ...
  .. .. .. .. ..$ residual_variance: num [1:20666] 0.779 0.894 1.275 0.75 0.965 ...
  .. .. .. ..$ times                :List of 7
  .. .. .. .. ..$ start_time    : POSIXct[1:1], format: "2020-12-12 19:54:11"
  .. .. .. .. ..$ get_model_pars: POSIXct[1:1], format: "2020-12-12 19:54:15"
  .. .. .. .. ..$ reg_model_pars: POSIXct[1:1], format: "2020-12-12 19:54:47"
  .. .. .. .. ..$ get_residuals : POSIXct[1:1], format: "2020-12-12 19:54:47"
  .. .. .. .. ..$ correct_umi   : POSIXct[1:1], format: "2020-12-12 19:54:58"
  .. .. .. .. ..$ get_gene_attr : POSIXct[1:1], format: "2020-12-12 19:55:11"
  .. .. .. .. ..$ done          : POSIXct[1:1], format: "2020-12-12 19:55:15"
  .. .. ..$ umi.assay: chr "RNA"

I converted this Seurat obj to Anndata by sceasy as follows;

> convertFormat(pbmc10k, from = "seurat", to="anndata", outFile = "pbmc10k.h5ad")
... storing 'Phase' as categorical
... storing 'hpca.fine' as categorical
... storing 'hpca.main' as categorical
... storing 'monaco.main' as categorical
... storing 'monaco.fine' as categorical
AnnData object with n_obs × n_vars = 10194 × 36601
    obs: 'nCount_RNA', 'nFeature_RNA', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'percent.mt', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.8', 'seurat_clusters', 'S.Score', 'G2M.Score', 'Phase', 'old.ident', 'hpca.fine', 'hpca.main', 'monaco.main', 'monaco.fine'
    var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
    obsm: 'X_pca', 'X_umap'
Warning message:
In .regularise_df(obj@meta.data, drop_single_values = drop_single_values) :
  Dropping single category variables:orig.ident

As you can see, for some reasons, the conversion did not pick up $spliced and $unspliced as vars but instead, it picked $RNA@meta.features as vars. I would appreciate any explanation for this and any pointers to fix this.

Thanks.

RNA-Seq R Seurat Anndata • 176 views
ADD COMMENT
0
Entering edit mode
7 days ago
akh22 ▴ 50

I solved this by manually creating/adding $spliced and $unspliced layers by using anndata R package.

ADD COMMENT

Login before adding your answer.

Traffic: 1699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6