Question: Doing differential gene expression without knowing sample classes
gravatar for A
11 months ago by
A3.8k wrote:


I have been given a big set of RNA-seq, one sample looks like this

ENSG00000258486.2   1151554 1151554 597 79153.32269 78738.12898
ENSG00000265150.1   1089307 1089307 297 150505.7244 149716.2562
ENSG00000202198.1   996127  996128  331 123494.0095 122846.3529

I also have case ID for each sample like

BUT I don't know what these IDs are, which is normal, which is tumor, and there is no one to ask from

I have to reduce the features in RNA-seq data and extract the most informative genes for integrating with proteomics; In such case people usually do differential expression but I don't know the class of samples to think about DESeq2 or edgeR

So, if you were me, how would you deal with this data? How would you extract the most informative features? Is it possible to do this at all without knowing the samples identification?

Thank you for any idea

edger rna-seq deseq2 • 372 views
ADD COMMENTlink modified 4 months ago • written 11 months ago by A3.8k

I'd reject the data.

ADD REPLYlink written 11 months ago by russhh5.4k

Agree with russhh, on principal.

I am asking myself the following:

  1. from where did F obtain this data?
  2. why is there no information on sample grouping?

If, genuinely, nobody knows the sample groups, then do the PCA bi-plot, as implied by Genomax, and send that back to whoever it is with whom you are working. If you want, also check the component loadings along PC1 and PC2 so that you can see which genes are the main source of variation along these [principal components]. Through this process, you may actually infer the sample groupings.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe61k

The problem is that the collaborator (data owner) replies with too much delay even I am waiting for a month for an answer. That is way I either should extract informative features from this unknown RNA-seq or find another RNA-seq in internet to provide differentially expressed genes between carcinoma and matched normal samples.

ADD REPLYlink modified 11 months ago • written 11 months ago by A3.8k

So, the collaborator is the one who is disorganised and who messed up.

ADD REPLYlink written 11 months ago by Kevin Blighe61k

Em, can't you ask the person who gave you the data what are the IDs?

ADD REPLYlink written 11 months ago by grant.hovhannisyan2.0k
gravatar for WouterDeCoster
11 months ago by
WouterDeCoster44k wrote:

Either use a clustering-based approach (unbiased) to separate samples into biological groups (or into technical batch effect groups) or use some biological evidence (biased) e.g. expression of a marker gene, tumor suppressor gene,...

ADD COMMENTlink written 11 months ago by WouterDeCoster44k
gravatar for genomax
11 months ago by
United States
genomax85k wrote:

Why should this be any different than how you would do a usual DE analysis? Thing you need to know is which samples are replicates (if any) and how they are to be grouped (unless you have just 1 of everything, which would be difficult to deal with).

Start with some PCA type analysis to see if you can identify groups.

ADD COMMENTlink modified 11 months ago • written 11 months ago by genomax85k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1158 users visited in the last hour