Decimal numbers in the third column of Gene/cell count matrix (mtx format).
1
0
Entering edit mode
19 months ago

I am trying to do a sc-RNAseq analysis of already published and available data with scanpy.

I downloaded the 3 necessary files: gene.tsv , barcode.tsv and matrix.mtx from :

https://singlecell.broadinstitute.org/single_cell/study/SCP1290/molecular-logic-of-cellular-diversification-in-the-mammalian-cerebral-cortex#study-download

In the matrix.mtx file I understand that the first column is the gene identifier, the second column the sample (cell) identifier but the third column in this case are decimal numbers which does not agree with my thinking that it corresponds to the number of counts/ reads. Here show you an example:

%%MatrixMarket matrix coordinate real general
19712 98047 197107184
1 792 2.336580098246731
1 932 1.843584537092641
1 1032 1.452642393416585
1 1511 1.0351544389422742
1 1728 1.4001400066737717
1 1887 1.9039943314637129
1 2657 2.0057818468348683

I do not understand this very well and also this problem is dragged to all subsequent analyzes such as sc.pp.calculate_qc_metrics where for example total_counts is a decimal number.

Thank you very much in advance and excuse my English.

SingleCell scRNAseqanalysis Scanpy MatrixMarket • 535 views
ADD COMMENT
0
Entering edit mode
19 months ago
ATpoint 81k

This is not a human-readable but sparse matrix format. Read it into Python (or any framework) with a dedicated function (https://scanpy.readthedocs.io/en/stable/generated/scanpy.read_mtx.html) and then follow a guided tutorial to get started. Don't try to figure out everything else yourself, that's not productive.

ADD COMMENT

Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6