Question: Extract datasets from .h5 file
0
gravatar for JulianC
6 months ago by
JulianC10
JulianC10 wrote:

Hi!

I retrieved single-cell data from GEO datasets (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3489183). The file format is .h5, produce by CellRanger V2.0 pipeline (10x Genomics). To open it and to have a look at the datasets inside, I used the following Python code:

import h5py
import pandas
import numpy

f = h5py.File('GSM3489183_IPF_01_filtered_gene_bc_matrices_h5.h5', 'r')
list(f.keys())

['GRCh38']

dset = f['GRCh38']
list(dset)

['barcodes', 'data', 'gene_names', 'genes', 'indices', 'indptr', 'shape']

According to CellRanger manual, the dataset called 'data' should contain the Nonzero UMI counts in column-major order, The 'shape' dataset is a tuple of (# rows, # columns) indicating the matrix dimensions. Each of these datasets has 1 column. To see the relative data I used the code:

a = np.array(f['GRCh38/data'])
pd.DataFrame(a)

However, I don't see how I can retrieve, from this data, a table in which genes are rows and cells are columns. The 'data' datasets must be the expression data about each gene, in each cell, but since it is a 1-column dataset, I don't see how I can build a table with cells as columns with the relative data for each gene. Do you have experience with this type of file? Thank you in advance!

ADD COMMENTlink written 6 months ago by JulianC10
1
gravatar for genomax
6 months ago by
genomax91k
United States
genomax91k wrote:

Take a look at this 10x genomics page that describes how to work with h5 data.

ADD COMMENTlink written 6 months ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour