Transposing a .tsv count matrix
0
0
Entering edit mode
3.0 years ago
Duckula ▴ 40

Dear all, I have a .tsv count matrix from a single cell experiment and I want to analyze it using SCANPy package in Python. When I import my data using following command

mydata=sc.read("counts_expression_matrix.tsv",delimiter='\t',cache=True)


I can see my data is stored as follow:

AnnData object with n_obs × n_vars = 4057 × 66882 obs: 'n_genes' var: 'n_cells'

which I assume is not correct and my var should correspond to my genes and obs to my cells. I am not so familiar with Python or Scanpy. does anyone has a suggestion ?

Thanks

RNA-Seq Python scanpy • 2.7k views
0
Entering edit mode

No, that looks correct.

In python, everything is an object. Your data has been read in to an instance of an "AnnData" object (whatever one of those is).

In order to actually see the data you will probably need to do something like print(mydata) or

for i in mydata:
print(i)


But it depends whether that object is iterable or not (I'm not familiar with that tool specifically).

0
Entering edit mode

Hi, Thanks for your reply, but if you see its indicating that n_var stands for 'n_cells' and according to that I am having 66882 cells with 4057 genes! That doesn't make sense also when I plot for my highly variable genes using provided function I see cell names instead of genes in my plot!

0
Entering edit mode

Oh I see your concern, apologies I didn't read the question fully.

Show us some of your data so we can check if it appears to be formatted right.

0
Entering edit mode

Well the problem is I can't open my file in usual apps like notepad++ or so. I have been trying to import my data into R but its still running ! that's why I actually switched to python and scanpy as it was saying its optimized for huge data which I found somehow correct because I was able to load my data as anndata in a short time but then I realized my formatting apparently is wrong. I don't have any other clue for seeing the table..maybe by panda ? I am not sure. Do you also have any suggestion ?

0
Entering edit mode

Have you googled transpose tab separated file?

0
Entering edit mode

@WouterDeCoster Thanks for your comment! I actually tried to import my data in R and then transpose it from there but I faced the error "Error: cannot allocate vector of size 125 Kb" which also indicated that my data is transposed on a wrong side but now I couldn't even import it not even transpose it! do you think python would be capable of doing it? do you have any hints?

0
Entering edit mode

Yes, using pandas for example

I don't know if you have enough memory available.

0
Entering edit mode

My concern is that the data is too big in sense of size itself and doesn't have to do anything with memory because in R the Error message "allocate vector of size" indicate a limit on the object it self to be created. "The number of bytes in a character string is limited to 2^31 - 1 ~ 2*10^9, which is also the limit on each dimension of an array." So I am now confused how to handle the issue

1
Entering edit mode

R memory management, frankly, sucks. It does not like large objects, even if you have enough memory to load them fully. It might work in python with pandas, because numpy doesn't use garbage, outdated methods for memory management. If you really need to use R, you can try importing the object, saving it as an Rdata object, and then restarting R and loading it again, which usually reduces the memory footprint by up to 60-70%. Hadley Wickham has some great notes on R memory management and how it works here.

But I don't think that's really your issue. The vector error is really odd, as that usually only arises when you have tons of rows/columns, which doesn't seem to be the case here. Post the first few lines of your original file so that we can see the actual format.

0
Entering edit mode

Just try it in Python, it's not because it doesn't work in R that it will not work in Python.