Hi,
I am not sure if this repository is the right place to ask my question. I am trying to regenerate the results using this package named padma. As provided a mini-sample multi omics data taken from TCGA-LUAD in their package, they have four types of omics including mRNA, methylation, miRNA and CNA.
They used preprocessed data that is no longer available. I obtained TCGA data from the GDC portal and followed their preprocessing steps. However, I noticed a difference in the CNA values: in their data, the values are floating points ranging between -1 and 1, while my CNV data contains whole positive integers.
As far as I understand, CNA and CNV refer to the same concept (correct me if I'm wrong). Should I preprocess the CNV data to convert it to the proper format? I couldn't find any specific explanation for this in their paper.
I would appreciate any input.
In the GDC, there are raw float value CNV data, and integer CNV values after advanced modeling. If you only want those floating-point numbers (called segment means), you can check "DNACopy" as the workflow in the GDC repository. However, these segment means are not bounded by (-1, +1).
Thanks @Zhenyu Zhang for your response. I have just looked at the files you mentioned. Do you have any recommended package or code snippet to bring these segment values into gene level? In the mini-sample in the padma package, the CNA data is given at the gene level for each sample.
I am editing my reply as I found a repository named FIREBROWSE which provides gene level CNV segments data and as I tested the results are pretty close to what expected from padma. Thanks Zhenyu, you guided me through this.
If you look for integer value gene level CNV, GDC also has those. If you are looking for floating numbers, you need to your own gene region vs segment overlapping
Regarding firebrowse, you can find old hg19 TCGA data here. I don't know the exact data you are looking for, but I suspect these were run through GISTIC, which is more complicated.