Question: Extract datasets from .h5 file
gravatar for JulianC
6 months ago by
JulianC10 wrote:


I retrieved single-cell data from GEO datasets ( The file format is .h5, produce by CellRanger V2.0 pipeline (10x Genomics). To open it and to have a look at the datasets inside, I used the following Python code:

import h5py
import pandas
import numpy

f = h5py.File('GSM3489183_IPF_01_filtered_gene_bc_matrices_h5.h5', 'r')


dset = f['GRCh38']

['barcodes', 'data', 'gene_names', 'genes', 'indices', 'indptr', 'shape']

According to CellRanger manual, the dataset called 'data' should contain the Nonzero UMI counts in column-major order, The 'shape' dataset is a tuple of (# rows, # columns) indicating the matrix dimensions. Each of these datasets has 1 column. To see the relative data I used the code:

a = np.array(f['GRCh38/data'])

However, I don't see how I can retrieve, from this data, a table in which genes are rows and cells are columns. The 'data' datasets must be the expression data about each gene, in each cell, but since it is a 1-column dataset, I don't see how I can build a table with cells as columns with the relative data for each gene. Do you have experience with this type of file? Thank you in advance!

ADD COMMENTlink written 6 months ago by JulianC10
gravatar for genomax
6 months ago by
United States
genomax91k wrote:

Take a look at this 10x genomics page that describes how to work with h5 data.

ADD COMMENTlink written 6 months ago by genomax91k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour