Question: GISTIC segment overlap error
0
gravatar for Wenhu_Cao
6 months ago by
Wenhu_Cao50
Wenhu_Cao50 wrote:

Hi everyone,

I am a little confused about the error complained by GISTIC:

GISTIC 2.0 input error detected:
169 segment overlaps detected in file '/path/***.seg.txt'.
First overlap detected between segments at lines 26227 and 26377.

I checked the first overlap lines in R, they are below:

line 26227:

Sample       Chromosome   Start      End Num_Probes Segment_Mean
  <chr>             <int>   <int>    <int>      <int>        <dbl>
1 TCGA-G4-6317          1 3218610 14832648       6157       -0.160

line 26377:

Sample       Chromosome   Start      End Num_Probes Segment_Mean
  <chr>             <int>   <int>    <int>      <int>        <dbl>
1 TCGA-G4-6317          1 3218610 49468025      24550       -0.099

I got the TCGA segmentation file from firehose, and only eliminated NAs and changed the names of Sample (save only first 12 digits). Does anyone know how to deal with this?

Thanks very much!

snp cnv scna gistic • 390 views
ADD COMMENTlink modified 3 months ago by ysh0 • written 6 months ago by Wenhu_Cao50

I think I may find a possibl cause, which is collapsing barcode to patients (first 12 chars) would cause different sample types of the same patient to have only one name. I will try this idea and let you know the results.

ADD REPLYlink written 6 months ago by Wenhu_Cao50
0
gravatar for Wenhu_Cao
6 months ago by
Wenhu_Cao50
Wenhu_Cao50 wrote:

Thanks for the jet lag between me and most biostars, it gives me time to solve this problem by my own.

The reason now seems trivial, however, I will note it for completeness. I substringed the first 12 chars of Tumor-Sample-Barcode to get patient only barcode for further analysis. The problem is that, TCGA has harvested not exactly only one tumor sample from one patient, like here, in my dataset, I found there are '01' - Primary Solid Tumor, '02' - Recurrent Solid Tumor and '06' - Metastatic for a single patient (the numbers are the 14th-15th chars of Tumor-Sample-Barcode, details here: TCGA barcode and TCGA code tables).

Then, substring will cause different sample results from a single patient use the same patient barcode as ID, that would cause GISTIC complain, and would also import bias into the following analysis.

OK, that's my experience, put it here as a reminder for myself!

ADD COMMENTlink modified 6 months ago • written 6 months ago by Wenhu_Cao50
0
gravatar for ysh
3 months ago by
ysh0
ysh0 wrote:

I think the reason is the cnv segment in line 26227 is included in segment in line 26377, they came from same sample , same chromosome, but the chromosome start and end is overlaped. you should consider to resolve this overlap. and it seems like you have 169 overlaps detected in your segmentation file. good luck!

ADD COMMENTlink written 3 months ago by ysh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1074 users visited in the last hour