ValueError when loading expression matrix into scanPy
1
0
Entering edit mode
2.1 years ago
c2e09af0 • 0

Hello everyone, I am new to bioinformatics and want to build a reference atlas to project my own data on it using scArches and other packages like scanpy. However, I'm having troubles in loading the reference dataset . I downloaded the exprMatrix.tsv.gz file from https://cells-test.gi.ucsc.edu/?ds=early-brain and used the following code to import the data into Python:

   import scanpy as sc
   adata = sc.read_text("exprMatrix.tsv.gz")

I get this error:

ValueError: could not convert string to float: 'NA'

I tried loading the data in R with the Seurat package, which worked after appending one empty line. Can it be that Python and R use different expressions for 'NA' values (NaN?) and therefore Python can not load the file? Can I just replace the 'NA' values with 'NaN' in the file or do they have a different meaning?

I would very much appreciate help. Thank you for taking the time!

RNA-seq Python scArches scanpy scRNAseq • 1.3k views
ADD COMMENT
1
Entering edit mode
2.1 years ago
zorbax ▴ 610

I think the read functions are for NumPy arrays, but you can use pandas to load your file like a data frame and then use AnnData to load it:

import scanpy as sc
import pandas as pd

chunks = pd.read_table("~/path/to/exprMatrix.tsv.gz", index_col=0, chunksize=1000000)
df = pd.concat(chunks)
adata = sc.AnnData(df)
ADD COMMENT
0
Entering edit mode

Thank you so much, it worked with your code!

ADD REPLY

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6