Extract datasets from .h5 file
1
0
Entering edit mode
4.1 years ago
JulianC ▴ 30

Hi!

I retrieved single-cell data from GEO datasets (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3489183). The file format is .h5, produce by CellRanger V2.0 pipeline (10x Genomics). To open it and to have a look at the datasets inside, I used the following Python code:

import h5py
import pandas
import numpy

f = h5py.File('GSM3489183_IPF_01_filtered_gene_bc_matrices_h5.h5', 'r')
list(f.keys())

['GRCh38']

dset = f['GRCh38']
list(dset)

['barcodes', 'data', 'gene_names', 'genes', 'indices', 'indptr', 'shape']

According to CellRanger manual, the dataset called 'data' should contain the Nonzero UMI counts in column-major order, The 'shape' dataset is a tuple of (# rows, # columns) indicating the matrix dimensions. Each of these datasets has 1 column. To see the relative data I used the code:

a = np.array(f['GRCh38/data'])
pd.DataFrame(a)

However, I don't see how I can retrieve, from this data, a table in which genes are rows and cells are columns. The 'data' datasets must be the expression data about each gene, in each cell, but since it is a 1-column dataset, I don't see how I can build a table with cells as columns with the relative data for each gene. Do you have experience with this type of file? Thank you in advance!

Single-cell Cell ranger Python • 5.6k views
ADD COMMENT
1
Entering edit mode
4.1 years ago
GenoMax 141k

Take a look at this 10x genomics page that describes how to work with h5 data.

ADD COMMENT

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6