Question

Doing differential gene expression without knowing sample classes

1

Entering edit mode

4.8 years ago

zizigolu ★ 4.3k

Hi

I have been given a big set of RNA-seq, one sample looks like this

ENSG00000258486.2   1151554 1151554 597 79153.32269 78738.12898
ENSG00000265150.1   1089307 1089307 297 150505.7244 149716.2562
ENSG00000202198.1   996127  996128  331 123494.0095 122846.3529

I also have case ID for each sample like

BUT I don't know what these IDs are, which is normal, which is tumor, and there is no one to ask from

I have to reduce the features in RNA-seq data and extract the most informative genes for integrating with proteomics; In such case people usually do differential expression but I don't know the class of samples to think about DESeq2 or edgeR

So, if you were me, how would you deal with this data? How would you extract the most informative features? Is it possible to do this at all without knowing the samples identification?

Thank you for any idea

RNA-Seq deseq2 edger • 1.2k views

ADD COMMENT • link 4.1 years ago by zizigolu ★ 4.3k

6

Entering edit mode

I'd reject the data.

ADD REPLY • link 4.8 years ago by russhh 5.7k

2

Entering edit mode

Agree with russhh, on principal.

I am asking myself the following:

from where did F obtain this data?
why is there no information on sample grouping?

If, genuinely, nobody knows the sample groups, then do the PCA bi-plot, as implied by Genomax, and send that back to whoever it is with whom you are working. If you want, also check the component loadings along PC1 and PC2 so that you can see which genes are the main source of variation along these [principal components]. Through this process, you may actually infer the sample groupings.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

The problem is that the collaborator (data owner) replies with too much delay even I am waiting for a month for an answer. That is way I either should extract informative features from this unknown RNA-seq or find another RNA-seq in internet to provide differentially expressed genes between carcinoma and matched normal samples.

ADD REPLY • link 4.8 years ago by zizigolu ★ 4.3k

0

Entering edit mode

So, the collaborator is the one who is disorganised and who messed up.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

2

Entering edit mode

Em, can't you ask the person who gave you the data what are the IDs?

ADD REPLY • link 4.8 years ago by grant.hovhannisyan ★ 2.6k

score 4 · Answer 1 · 2019-07-16

4

Entering edit mode

4.8 years ago

WouterDeCoster 47k

Either use a clustering-based approach (unbiased) to separate samples into biological groups (or into technical batch effect groups) or use some biological evidence (biased) e.g. expression of a marker gene, tumor suppressor gene,...

ADD COMMENT • link 4.8 years ago by WouterDeCoster 47k

score 3 · Answer 2 · 2019-07-16

Why should this be any different than how you would do a usual DE analysis? Thing you need to know is which samples are replicates (if any) and how they are to be grouped (unless you have just 1 of everything, which would be difficult to deal with).

Start with some PCA type analysis to see if you can identify groups.