The documentation for cBioPortal file formats discussing continuous copy number states that the GISTIC2 output file <prefix>_all_data_by_genes.txt
can be used directly as the cBioPortal data file (after changing column names.) cBioPortal expects this data to be in LOG2 format.
I have a file all_data_by_genes.txt
(NOTE: Not <prefix>_all_data_by_genes.txt
) generated by a run of GISTIC2 against an amalgamated segment (*.seg) file. However, then I try to use it according to the documentation, cBioPortal errors out saying that there are negative numbers in the data fields of the file (and there are.) This makes me assume the file is not actually LOG2 data.
Does anyone know ...
- What is the data type/format of the data in this file?
- Should I be using a different output file instead of
all_data_by_genes.txt
? - If I SHOULD be using
all_data_by_genes.txt
, do I need to convert the data?
Thanks! Mike
As it turns out, the problem was that the output file
all_data_by_genes.txt
often has a negative value in the Gene ID to (Entrez_Gene_Id) column. This breaks the import into cBioPortal.It has been confirmed by the GISTIC2 developers that this negative value is normal behavior, and can be ignored. The solution is to clean the file, and change all negative values in the
Gene ID
column to "Na"From the developer...