Question

Categorize patients based on continuous variables

1

Entering edit mode

6.2 years ago

ciemanek ▴ 140

Hi!

My problem concerns categorizing patients to groups based on continuous variables. From the previous studies we know that there are continuous differences in mean expression of two signatures, which are negatively correlated. We are interested in comparing two extreme groups in terms of differentially expressed genes. Is there any statistical method for determining the cutoff from tha data? Maybe some measure of similarity we could use? Would it be reasonable to cluster patients based on those two signatures and in that way choose extreme groups?

Any advice will be appreciated.

Regards, Agata

biostatictics expression • 1.7k views

ADD COMMENT • link updated 6.1 years ago by dariober 14k • written 6.2 years ago by ciemanek ▴ 140

1

Entering edit mode

Instead of using a single mean value for a signature you could try to cluster the samples using the expression of all genes present in the signature. This could help filter out some of the likely noise coming from genes that are part of the signature but that don't vary much in your data. The approach you described is otherwise reasonable.

ADD REPLY • link 6.2 years ago by Martombo ★ 3.1k

0

Entering edit mode

Yes, this was more or less my reasoning: to cluster patients based on all genes in both signatures and then set a cut-off on the branches. Would it be reasonable then to perform transcriptome-wide differential gene expression testing and co-expression analysis between two groups on such classified data?

ADD REPLY • link 6.2 years ago by ciemanek ▴ 140

0

Entering edit mode

Yes I think that's the best solution you can get. Also, you don't necessarily have to bin the samples in two groups: you can perform a differential expression analysis looking for gene patterns that correlate with a continuous variable. You can model your data on the gene signature score (in DESeq2, voom for example).

ADD REPLY • link 6.2 years ago by Martombo ★ 3.1k

1

Entering edit mode

Thanks a lot! I will definitely take a look into that - in general those sigantures mean expression is correlated with the level of differentiation and what is of interest to me is what are possible underlying differences between highly and lowly differentiated tumors that's why my first thought was to zoom in to extreme groups.

Also since we're discussing, do you think performing co-expression analysis to find networks of genes would make sense and would have to be performed on the whole dataset (we have no control group) or rather on extreme groups to compare them? I wonder if reconstructing networks from whole dataset wouldn't be biased due to tissue-specific expression.

ADD REPLY • link 6.2 years ago by ciemanek ▴ 140

2

Entering edit mode

The clustering idea is great. If, in addition, you are interested in segregating patients based on the expression of just one gene of interest, then you could literally divide the patients into tertiles, quartiles, quintiles, et cetera, and then compare the top and bottom groups.

A co-expression network would also help, and it is possible to identify sub-groups (communities or modules) in such networks, and to then see how these sub-groups relate to yuor clinical variables. On the issue of tissue specificity, it's up to you to ensure that your samples are from the same tissue and that there is no bias in that sense. A good study design guards against biases like that.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

About the tissue specificity - the data I have is only tumor data, we have no controls for that and what I mean is that some of the genes might be co-expressed due to the tissue of origin. I wonder how I could account for that - should I look for control data in databases (problem is that sample sizes are usually very small) or can it be considered during functional analysis?

ADD REPLY • link 6.1 years ago by ciemanek ▴ 140

score 1 · Answer 1 · 2018-03-03

1

Entering edit mode

6.1 years ago

dariober 14k

categorizing patients to groups

It seems to me that you are looking at statistical methods like logistic regression or decision trees or support vector machines (and many others, a lot of the machine learning literature is about classification).

ADD COMMENT • link 6.1 years ago by dariober 14k