Entering edit mode
4.9 years ago
Chaimaa
▴
260
Hello guys,
What is the ideal or simple normalization methods that one can adopt for Level3 TCGA gene expression or DNA methylation dtasets, since they are already preprocessed?
I appreciate any comments or help in this issue!
Please provide some details on how these files were preprocessed. Are these raw data that you have or are they transformed like TPM or something?
@ok i will provide some of the row and columns from these data here, Note that these data i got from gdac broad institute Firehose
There is no need to provide the data itself, just inform about what they are.
@ATpoint below i have explain for you what i got and what i have tried.
@ATpoint, Hi, here are my data looks like, rows represente samples and columns represent list of genes and the values inside the cells indicate the expression and methylation values as i got from Broad website.
So these values as i think are not normalized enough to be used for any further analysis ? what i have to apply as normalization steps?
I also tried to apply regression analysis and cross validation directly on these data and i found no significant genes under 10 CV thats why i assumed that these data is not ok.
thanks in advance https://i.postimg.cc/rwTZsvXL/DATA.jpg
@ATpoint yes, this is the raw data i have, no there is no TPM transformation , this all what i have
These are float numbers and even negative values. So these are not raw counts as expected for the common differential analysis tools. You will have to find out what exactly these values are. Check the websites you got them from if they have a methods section. If this is not possible email the contact person to get information. This is important as you technically can put any numeric values into a pipeline but the result might/is going to be non-sense if the underlying assumptions are violated. I am reasonably sure TCGA provides a sound explanation about data preprocessing.
@ATpoint Hi, when i downloaded these data from Broad Firehose, i choose only the preprocessed files. For gene expression i take this file (mRNA_Preprocess_Median (MD5)) and for DNA methylation i take this file (Methylation_Preprocess (MD5)) then i integrated them together for the same set of samples as exactly shown in the picture above. Those values are expression values from gene expression and beta values from Dna methylation respectively,
is there any normalization steps can be applied directly on these data, if anyone experts in this issue plz guide me through?
I will stop responding until you post what these data are. Without this information it is a waste of time guessing around what maybe can be done since it is arbitrary. I do not want to sound harsh, but there is no point in continuing without the information I requested multiple times now. Good luck with your analysis.
@ATpoint ok i will share the original files of them here, is it ok?
You don't need to share the actual files. Describe where/how you got them (or provide original link(s) to Broad site). Broad firehose site should have description of how the data was generated/processed.
ok @genomax from this website i access and get my data http://firebrowse.org/?cohort=COADREAD&download_dialog=true
Link to the documentation Broad provides for data processing.
this is the gene expression file looks like after the download https://postimg.cc/nsymVY8Q and here is the DNA methylation looks like https://postimg.cc/MfWmxbRm