It is my first time working with cancer genomes and I have some doubts. I found this study in which they provide a lot of different sequencing data for the cell line HCC1395, and I would like to use them to assess a new tool that we are developing for detecting Copy Number Alterations. Problem is, I'm lacking a ground truth. In this study, they provide a golden dataset for SNVs and short INDELS, but they do not provide info about the CNAs. I necessarely need a not simulated normal/cancer sample pairs from the same patient, and this is the only source I found so far whicjh is freely providing a lot of well documented sequencing data.
Since the HCC1395 cell line was already studied before, I found some other studies providing the CNAs they found for it. My question now is: can I use those CNAs found in other studies on the same cell line as a ground truth to compare what I will find with our tool on the data I have?
I don't have much biological knowledge and my doubts arise manly because, If I undersood well, those cells are usually grown independently in a laboratory setting for each study, so I am not sure if they are comparable, or if they could have different mutations occurring between the different studies.
CCLE (via DepMap) maintains a database of mutations identified for each cell line, under "Omics Characterizations" https://depmap.org/portal/cell_line/ACH-000946?tab=mutation. Clones of these cell lines will at least have the mutations documented; and potentially others.
Cell lines are, of course, not tumor/normal reference; and AFAIK there is no "ground truth" for somatic copy number alterations in human; and it's hard enough to find tumor/normal human sequencing data that doesn't require dbGAP access.
Hello, thank you very much for your reply! I knew about the DepMap db, so if you say at least those mutation must be there, that's great it's alreadya start. Do you know if thr mutations reported in DepMap will be the same as those reported in COSMIC (https://cancer.sanger.ac.uk/cell_lines/sample/overview?id=749712)? In case it could be useful for you, a lot tumor/normal human sequencing data for the cell lines HCC1395/HCC1395BL are providedy completely freely here https://sites.google.com/view/seqc2/home/sequencing?authuser=0.
Also, I do not necessarily need tumor/normal samples from human, but I couldn't find other open samples in general (except for the ones I mentioned). Do you know any other resource of matched samples for other organisms?
There is some somatic SV study on these HCC1395: https://doi.org/10.1186/s13059-022-02816-6.
There is also a CNA preprint from years ago but never published: https://doi.org/10.1101/2021.02.18.431906