Decimal numbers in the third column of Gene/cell count matrix (mtx format).
Entering edit mode
9 days ago

I am trying to do a sc-RNAseq analysis of already published and available data with scanpy.

I downloaded the 3 necessary files: gene.tsv , barcode.tsv and matrix.mtx from :

In the matrix.mtx file I understand that the first column is the gene identifier, the second column the sample (cell) identifier but the third column in this case are decimal numbers which does not agree with my thinking that it corresponds to the number of counts/ reads. Here show you an example:

%%MatrixMarket matrix coordinate real general
19712 98047 197107184
1 792 2.336580098246731
1 932 1.843584537092641
1 1032 1.452642393416585
1 1511 1.0351544389422742
1 1728 1.4001400066737717
1 1887 1.9039943314637129
1 2657 2.0057818468348683

I do not understand this very well and also this problem is dragged to all subsequent analyzes such as sc.pp.calculate_qc_metrics where for example total_counts is a decimal number.

Thank you very much in advance and excuse my English.

SingleCell scRNAseqanalysis Scanpy MatrixMarket • 177 views
Entering edit mode
9 days ago
ATpoint 65k

This is not a human-readable but sparse matrix format. Read it into Python (or any framework) with a dedicated function ( and then follow a guided tutorial to get started. Don't try to figure out everything else yourself, that's not productive.


Login before adding your answer.

Traffic: 639 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6