Question

what is the ideal normalistation methods for TCGA gene expression or DNA methylation dtasets?

1

Entering edit mode

4.5 years ago

Chaimaa ▴ 260

Hello guys,

What is the ideal or simple normalization methods that one can adopt for Level3 TCGA gene expression or DNA methylation dtasets, since they are already preprocessed?

I appreciate any comments or help in this issue!

gene next-gen • 1.6k views

ADD COMMENT • link 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

Please provide some details on how these files were preprocessed. Are these raw data that you have or are they transformed like TPM or something?

ADD REPLY • link 4.5 years ago by ATpoint 84k

0

Entering edit mode

@ok i will provide some of the row and columns from these data here, Note that these data i got from gdac broad institute Firehose

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

There is no need to provide the data itself, just inform about what they are.

ADD REPLY • link 4.5 years ago by ATpoint 84k

0

Entering edit mode

@ATpoint below i have explain for you what i got and what i have tried.

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

0

Entering edit mode

@ATpoint, Hi, here are my data looks like, rows represente samples and columns represent list of genes and the values inside the cells indicate the expression and methylation values as i got from Broad website.

So these values as i think are not normalized enough to be used for any further analysis ? what i have to apply as normalization steps?

I also tried to apply regression analysis and cross validation directly on these data and i found no significant genes under 10 CV thats why i assumed that these data is not ok.

thanks in advance https://i.postimg.cc/rwTZsvXL/DATA.jpg

ADD REPLY • link updated 4.5 years ago by ATpoint 84k • written 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

Are these raw data that you have or are they transformed like TPM or something?

ADD REPLY • link 4.5 years ago by ATpoint 84k

0

Entering edit mode

@ATpoint yes, this is the raw data i have, no there is no TPM transformation , this all what i have

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

These are float numbers and even negative values. So these are not raw counts as expected for the common differential analysis tools. You will have to find out what exactly these values are. Check the websites you got them from if they have a methods section. If this is not possible email the contact person to get information. This is important as you technically can put any numeric values into a pipeline but the result might/is going to be non-sense if the underlying assumptions are violated. I am reasonably sure TCGA provides a sound explanation about data preprocessing.

ADD REPLY • link 4.5 years ago by ATpoint 84k

0

Entering edit mode

@ATpoint Hi, when i downloaded these data from Broad Firehose, i choose only the preprocessed files. For gene expression i take this file (mRNA_Preprocess_Median (MD5)) and for DNA methylation i take this file (Methylation_Preprocess (MD5)) then i integrated them together for the same set of samples as exactly shown in the picture above. Those values are expression values from gene expression and beta values from Dna methylation respectively,

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

0

Entering edit mode

is there any normalization steps can be applied directly on these data, if anyone experts in this issue plz guide me through?

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

I will stop responding until you post what these data are. Without this information it is a waste of time guessing around what maybe can be done since it is arbitrary. I do not want to sound harsh, but there is no point in continuing without the information I requested multiple times now. Good luck with your analysis.

ADD REPLY • link 4.5 years ago by ATpoint 84k

0

Entering edit mode

@ATpoint ok i will share the original files of them here, is it ok?

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

You don't need to share the actual files. Describe where/how you got them (or provide original link(s) to Broad site). Broad firehose site should have description of how the data was generated/processed.

ADD REPLY • link 4.5 years ago by GenoMax 144k

0

Entering edit mode

ok @genomax from this website i access and get my data http://firebrowse.org/?cohort=COADREAD&download_dialog=true

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260

1

Entering edit mode

Link to the documentation Broad provides for data processing.

ADD REPLY • link 4.5 years ago by GenoMax 144k

0

Entering edit mode

this is the gene expression file looks like after the download https://postimg.cc/nsymVY8Q and here is the DNA methylation looks like https://postimg.cc/MfWmxbRm

ADD REPLY • link 4.5 years ago by Chaimaa ▴ 260