Question: Transposing a .tsv count matrix
0
gravatar for Duckula
11 months ago by
Duckula20
Duckula20 wrote:

Dear all, I have a .tsv count matrix from a single cell experiment and I want to analyze it using SCANPy package in Python. When I import my data using following command

mydata=sc.read("counts_expression_matrix.tsv",delimiter='\t',cache=True)

I can see my data is stored as follow:

AnnData object with n_obs × n_vars = 4057 × 66882 obs: 'n_genes' var: 'n_cells'

which I assume is not correct and my var should correspond to my genes and obs to my cells. I am not so familiar with Python or Scanpy. does anyone has a suggestion ?

Thanks

scanpy rna-seq python • 586 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by Duckula20

No, that looks correct.

In python, everything is an object. Your data has been read in to an instance of an "AnnData" object (whatever one of those is).

In order to actually see the data you will probably need to do something like print(mydata) or

for i in mydata:
    print(i)

But it depends whether that object is iterable or not (I'm not familiar with that tool specifically).

ADD REPLYlink written 11 months ago by Joe14k

Hi, Thanks for your reply, but if you see its indicating that n_var stands for 'n_cells' and according to that I am having 66882 cells with 4057 genes! That doesn't make sense also when I plot for my highly variable genes using provided function I see cell names instead of genes in my plot!

ADD REPLYlink written 11 months ago by Duckula20

Oh I see your concern, apologies I didn't read the question fully.

Show us some of your data so we can check if it appears to be formatted right.

ADD REPLYlink written 11 months ago by Joe14k

Well the problem is I can't open my file in usual apps like notepad++ or so. I have been trying to import my data into R but its still running ! that's why I actually switched to python and scanpy as it was saying its optimized for huge data which I found somehow correct because I was able to load my data as anndata in a short time but then I realized my formatting apparently is wrong. I don't have any other clue for seeing the table..maybe by panda ? I am not sure. Do you also have any suggestion ?

ADD REPLYlink modified 11 months ago • written 11 months ago by Duckula20

Have you googled transpose tab separated file?

ADD REPLYlink modified 11 months ago • written 11 months ago by WouterDeCoster42k

@WouterDeCoster Thanks for your comment! I actually tried to import my data in R and then transpose it from there but I faced the error "Error: cannot allocate vector of size 125 Kb" which also indicated that my data is transposed on a wrong side but now I couldn't even import it not even transpose it! do you think python would be capable of doing it? do you have any hints?

ADD REPLYlink written 11 months ago by Duckula20

Yes, using pandas for example

  1. read_csv
  2. transpose

I don't know if you have enough memory available.

ADD REPLYlink modified 11 months ago • written 11 months ago by WouterDeCoster42k

My concern is that the data is too big in sense of size itself and doesn't have to do anything with memory because in R the Error message "allocate vector of size" indicate a limit on the object it self to be created. "The number of bytes in a character string is limited to 2^31 - 1 ~ 2*10^9, which is also the limit on each dimension of an array." So I am now confused how to handle the issue

ADD REPLYlink written 11 months ago by Duckula20
1

R memory management, frankly, sucks. It does not like large objects, even if you have enough memory to load them fully. It might work in python with pandas, because numpy doesn't use garbage, outdated methods for memory management. If you really need to use R, you can try importing the object, saving it as an Rdata object, and then restarting R and loading it again, which usually reduces the memory footprint by up to 60-70%. Hadley Wickham has some great notes on R memory management and how it works here.

But I don't think that's really your issue. The vector error is really odd, as that usually only arises when you have tons of rows/columns, which doesn't seem to be the case here. Post the first few lines of your original file so that we can see the actual format.

ADD REPLYlink written 11 months ago by jared.andrews073.8k

Just try it in Python, it's not because it doesn't work in R that it will not work in Python.

ADD REPLYlink written 11 months ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour