Question

Gene expression data set scaling

0

Entering edit mode

6.7 years ago

1769mkc ★ 1.2k

I have rna seq data , I trying to make heatmap but there are certain values which are quite low the lower range goes to some extent ,meanwhile the upper range is reasonable .So im trying to scale the data.Now my question if im scaling the data would it preserve the true biological meaning because when i plot scaled data vs the data thats not scaled i see a quite a difference

Any suggestion would be highly appreciated

RNA-Seq R • 4.9k views

ADD COMMENT • link updated 6.7 years ago by Devon Ryan 104k • written 6.7 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

I am not sure if I understand your question right, although a colleague had once scaled the color range of the heatmap using a log scale. He plotted a histogram of the FPKM values to decide the color range.

P.S. I am not sure if the biological sense is retained, but I presume it should still make sense.

ADD REPLY • link 6.7 years ago by vinayjrao ▴ 250

0

Entering edit mode

well i have values in order of -50 ,-60 i certainly dont want to put that in heatmap but at the same time i want to retain those differences .Im using pheatmap it doesn;t have the heatmap.2 kind thing where you can plot the histogram or density sort of thing .

ADD REPLY • link 6.7 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

RPKM or FPKM values are according to me relative in a linear scale. For example, if your values range from -50 to 500, you could make -50 relative to 0. In that case, your values on the heatmap would be 0-550, but I'm not sure if it would be considered data manipulation. I would wait for someone to reply on that.

ADD REPLY • link 6.7 years ago by vinayjrao ▴ 250

0

Entering edit mode

thats true but it may be noise as well so im not sure..

ADD REPLY • link 6.7 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

Yes, that certainly is a possibility you can't rule out. I would however like to know how did you get readcounts in the negative scale, and what that means.

ADD REPLY • link 6.7 years ago by vinayjrao ▴ 250

1

Entering edit mode

well those are all FPKM values so i have like 5 cell type data i think most likely they are getting highly down-regulated perhaps unless its noise

ADD REPLY • link 6.7 years ago by 1769mkc ★ 1.2k

score 3 · Accepted Answer · 2017-07-28

3

Entering edit mode

6.7 years ago

Devon Ryan 104k

Presuming your genes are in rows and your columns are samples then scaling rows will preserve the biology within genes. If you do clustering then that will change, but that's typically less of an issue.

ADD COMMENT • link 6.7 years ago by Devon Ryan 104k

0

Entering edit mode

yes my genes are in rows , and sample in columns. How do i scale column with the
is it scale(df) or something else ?

ADD REPLY • link 6.7 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

How you want to scale things is completely up to you.

ADD REPLY • link 6.7 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon! I read your comment and I am now a bit unsure of the type of scaling that I need to perform on my data. If I have genes as rows and samples in columns and the intention is to perform a clustering on samples, should scaling be done on column or on rows? I appreciate if you could explain this to me. Thanks.

ADD REPLY • link 5.9 years ago by h.moosavi57 • 0

0

Entering edit mode

Generally you want to scale things such that highly-expressed genes aren't driving the clustering, which would mean by rows. However, you can also do things like vst() in DESeq2 to put things on a more useful scale to begin with.

ADD REPLY • link 5.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the answer Devon! however, when I do gene-wise scaling on all genes will have sd=1. Thus, if I am not mistaken, significance of genes will be lost. So I am a little bit unclear what criteria for patients classification into potential subgroups of a cancer classification algorithms will use?
By the way, my gene expression data is derived from microarrays which are on log2 scale.

ADD REPLY • link 5.9 years ago by h.moosavi57 • 0

1

Entering edit mode

Then you're not clustering, you're classifying, which is completely different. You should post such things as a new question.

ADD REPLY • link 5.9 years ago by Devon Ryan 104k

0

Entering edit mode

Hi @Devon, I normalized my read counts using vst() and I want to do kmeans clustering for my samples. Based on your comments you mean no need to scale my data after vst normalization? I'll appreciate your help!

ADD REPLY • link 3.9 years ago by Raheleh ▴ 260