Integrating TPM and raw count Seurat objects
1
1
Entering edit mode
3.7 years ago
berry ▴ 40

Hi,

I have been given a normalised count matrix (single-cell) as following :

        Cell1              Cell2       Cell3 .....................
Gene1   0.0000              0          0.0000  
Gene2   0.0000              0          0.0000 
Gene3  155.8516             0          0.0000 
Gene4   0.0000              0          280.9867

I have no access to raw counts and I want to create a Seurat object with this matrix. I have seen posts about reading TPM in Seurat and manually log transforming however, I have no experience with this type of data and I'm not sure if this is TPM or log(TPM+1) data. Is there a way to understand this?

Secondly, after reading this in Seurat my initial aim is to combining it with my data (which I created Seurat objects with raw counts). Do you think this is feasible to integrate Seurat objects that are created with TPM and raw counts?

Thank you very much!

single-cell sc-RNAseq Seurat • 3.2k views
ADD COMMENT
0
Entering edit mode
3.7 years ago

Hi,

If the counts are in TPM (Transcripts per Million), if you sum each column it should give you 1e+06 (1 million).

So you can sum up each column to see if its in TPM or not:

head(colSums(data))

head is just to print the sum of the first columns, and colSums it will sum the counts in each column from your data put your matrix there.

If they are not in TPM, but they are in log(TPM+1), you can reverse from log(TPM+1) to TPM by doing: 2^(log(TPM+1)), where 2 assumes that the counts were log2 transformed. If you do this and sum again the columns and if it gives you values around 1e+06, that was your transformation.

Regarding the question about using data sets with different units/scales, I think is not the best approach, but I believe that Seurat has pipelines (transformation and integration) to deal with multiple data-sets from different platforms etc. I think they describe this in detail in this paper: https://www.cell.com/cell/fulltext/S0092-8674(19)30559-8 (though I did not read the paper - so be careful).

I hope this helps,

António

ADD COMMENT
0
Entering edit mode

Hi António,

Thank you very much for your reply.

When I sum up the columns, I got this : 760257.6

So I tried transforming with 2^(log(TPM+1)) and then summing up the columns I got this : 157130.48

Do you have any idea why this could be happening?

ADD REPLY
0
Entering edit mode

You get this value 760257.6 or this 157130.48 for all the columns?

Just to be clear, when you did the 2^(log(TPM+1)) the values log(TPM+1) represent just the values in your data table. So, when you apply the reversing of log2(TPM+1), you just provide, let's say the value 155.8516 (from Cell1, Gene3) to: 2^(155.8516). This was how you did it? (of course you need to do this for all the entries in your data and only then sum the each column independently of each other)

If so, this means that you don't have TPM neither log2(TPM+1) data. It was used other transformation/normalization.

How did you obtain the data in first place? You should talk with the person that transformed/normalized the data in order to know exactly the transformation/normalization used.

António

ADD REPLY
0
Entering edit mode

You need to subtract the pseudocount in order to go from log2(TPM+1) to TPM, by doing (2^log2(TPM+1))-1, where this log2(TPM+1) represents counts log2 transformed on TPM plus one pseudocount.

António

ADD REPLY
0
Entering edit mode

Sorry Antonio I was confused. So I got 760257.6 for the columns. And I guess this means it is already TPM?

This was a public data and it was very hard to get a clear answer from the authors.

Thank you for your help :)

ADD REPLY
0
Entering edit mode

If it's public, can you share the link to the data? Then, I can check myself. It's easier this way.

António

ADD REPLY
0
Entering edit mode

Can you share your email address please?

ADD REPLY
0
Entering edit mode

Sorry, the data is not public? If it's you can just share the link to the paper/database whatsoever. I prefer not to share my e-mail (though if you search you might find it!).

If this is not the case, just put the data in your google drive or dropbox and share the link (again assuming that the data is public, you should not be afraid to share because it was already published and it's open to anyone).

If this is not the case, I probably will recommend you to not share the data (though I'll not do anything with it).

António

ADD REPLY

Login before adding your answer.

Traffic: 2255 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6