I am trying to do a sc-RNAseq analysis of already published and available data with scanpy.
I downloaded the 3 necessary files: gene.tsv
, barcode.tsv
and matrix.mtx
from :
In the matrix.mtx
file I understand that the first column is the gene identifier, the second column the sample (cell) identifier but the third column in this case are decimal numbers which does not agree with my thinking that it corresponds to the number of counts/ reads. Here show you an example:
%%MatrixMarket matrix coordinate real general
19712 98047 197107184
1 792 2.336580098246731
1 932 1.843584537092641
1 1032 1.452642393416585
1 1511 1.0351544389422742
1 1728 1.4001400066737717
1 1887 1.9039943314637129
1 2657 2.0057818468348683
I do not understand this very well and also this problem is dragged to all subsequent analyzes such as sc.pp.calculate_qc_metrics
where for example total_counts is a decimal number.
Thank you very much in advance and excuse my English.