How to classify cells based on the expression of genes in scRNA-seq
0
0
Entering edit mode
4.0 years ago
tujuchuanli ▴ 100

Hi,

I am analyzing the scRNA-seq data for breast cancer. Here I want to classify the cells into E+ and E- group which is based on the expression level of gene E (cells with low expression of gene E is E- and cells with high expression of gene E is E+). However, there are several members (isoforms) for gene E. I have to combine them to classify the cells. Here I have two plans:

  1. Just simply sum the normalized expression value for these members and classify the cells based on the sum value.

  2. Based on the clustering algorithm, the group of cells which expressed the specific combinational pattern of these members were defined as E+ and the rests were defined as E-.

Do you have any suggestions to my plans? Or do you have other plans?

Thanks

scRNA-seq expression • 966 views
ADD COMMENT
0
Entering edit mode

Can you clarify what you mean by isotype? To me this is a term applied to antibodies.

ADD REPLY
0
Entering edit mode

Thanks,

I have edited my post. There are many members for gene E in human genome. Maybe I can call it isoform. However, I don`t mean that these members were generated by different alternative splicing from the same pre-mRNA.

ADD REPLY
0
Entering edit mode

Those would be paralogs if they arose from duplication events. Summing up across paralogs only makes sense if you know/believe that all the paralogs contribute to the same function/biological outcome of interest.
Clustering being unsupervised doesn't guarantee that all E+ (or E-) cells will fall into one cluster although you're free to label all members of a cluster as E+ or E- based on some information. However if you can evaluate and label clusters then you probably have information that you could use for a more directly supervised approach like logistic regression.

ADD REPLY
0
Entering edit mode

Thank you for helpful replying,

There are about 15 paralogs for gene E and 8 paralogs in these are active paralogs. Although the enzyme activity of these 8 paralogs could not be the same. Summing up across these active paralogs could be the best way for me.

I don`t expect that cells with high expression of gene E could be clustered into the same cluster. I just want to find cells with specific combinational pattern of paralogs for gene E and try to find the pattern and difference between cells with different expression pattern. Here I named cells with specific combinational pattern of paralogs for gene E as cell with E+.

I am very interested in the approach like logistic regression you mentioned. Do you have some more detail reference? I can`t know how to do it based on what you said above.

ADD REPLY

Login before adding your answer.

Traffic: 3130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6