Question

How many samples are good to find significant targets using GISTIC?

1

Entering edit mode

4.0 years ago

joyk2a ▴ 30

Hi, I am analyzing around thirty tumor samples. Do you think this size is good to analyze targets with GISTIC? The most of journals I have read report more than 100 samples. I am wondering whether I can find valuable targets within small size cohorts. It will be great if you share your experiences who had analyzed GISTIC or similar analysis program. Thanks alot!

gene • 1.6k views

ADD COMMENT • link 4.0 years ago by joyk2a ▴ 30

score 1 · Answer 1 · 2020-04-05

1

Entering edit mode

4.0 years ago

Kevin Blighe 87k

It is a little low, which may translate as not many regions being statistically significant. However, I would encourage you to try it. You could justify reducing the p-value threshold for the identified recurrent regions, keeping in mind that the relatively low sample n is always a limitation.

Kevin

ADD COMMENT • link 4.0 years ago by Kevin Blighe 87k

1

Entering edit mode

Thanks, Kevin! I will try to do it.

ADD REPLY • link 4.0 years ago by joyk2a ▴ 30

0

Entering edit mode

Kevin,

How does GISTIC depend on number of samples being analyzed? I have a lot of samples, but I had to split them into batches to run CopywriteR, which is computationally expensive otherwise. I ran GISTIC2.0 on the output from each of these batches.

It is not possible to combine CopywriteR output across batches, so I cannot run GISTIC2.0 on some sort of combined input. Will this be a problem?

ADD REPLY • link 3.8 years ago by Ram 43k

0

Entering edit mode

It's because GISTIC is a bit different from any standard copy number analysis tool. GISTIC takes, as input, the already-derived per sample CN segments, and then processes all samples combined in order to 'score' copy number events across the entire cohort. In a way, it's doing the same as GAIA, i.e., looking for recurrently-aberrated regions. So, with a low number of samples, it would be difficult for any region to obtain a reliable score. I have used GAIA more than GISTIC, though.

Im not too familiar with CopywriteR, to be honest

ADD REPLY • link 3.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Looks like I'm going to have to re-analyze my data. CopywriteR is an R package that generates per-sample CN segments from BAM files. It is computationally expensive (>100GB RAM for 3 samples), so I had to split my samples into batches of 3 each. I'm guessing that cripples GISTIC2.0, because n=3 is nothing in the context of statistical significance.

EDIT: It looks like sample size does not affect gene level scoring in GISTIC2, which is what I'm after. I don't have to re-run all my samples, which is great news for me!

ADD REPLY • link 3.8 years ago by Ram 43k

0

Entering edit mode

Hi Kevin, I have a similar question. But in my study, I have a tumor sample size of 15. Do you think that I should use Gistic tool or is there any other way to represent the significance? and How I can apply something threshold for gain or loss explained at Cosmic website. You have any idea, I would be very grateful to you.