I retrieved single-cell data from GEO datasets (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3489183). The file format is .h5, produce by CellRanger V2.0 pipeline (10x Genomics). To open it and to have a look at the datasets inside, I used the following Python code:
import h5py import pandas import numpy f = h5py.File('GSM3489183_IPF_01_filtered_gene_bc_matrices_h5.h5', 'r') list(f.keys())
dset = f['GRCh38'] list(dset)
['barcodes', 'data', 'gene_names', 'genes', 'indices', 'indptr', 'shape']
According to CellRanger manual, the dataset called 'data' should contain the Nonzero UMI counts in column-major order, The 'shape' dataset is a tuple of (# rows, # columns) indicating the matrix dimensions. Each of these datasets has 1 column. To see the relative data I used the code:
a = np.array(f['GRCh38/data']) pd.DataFrame(a)
However, I don't see how I can retrieve, from this data, a table in which genes are rows and cells are columns. The 'data' datasets must be the expression data about each gene, in each cell, but since it is a 1-column dataset, I don't see how I can build a table with cells as columns with the relative data for each gene. Do you have experience with this type of file? Thank you in advance!