Question: GISTIC segment overlap error
0
gravatar for Wenhu_Cao
11 weeks ago by
Wenhu_Cao50
Wenhu_Cao50 wrote:

Hi everyone,

I am a little confused about the error complained by GISTIC:

GISTIC 2.0 input error detected:
169 segment overlaps detected in file '/path/***.seg.txt'.
First overlap detected between segments at lines 26227 and 26377.

I checked the first overlap lines in R, they are below:

line 26227:

Sample       Chromosome   Start      End Num_Probes Segment_Mean
  <chr>             <int>   <int>    <int>      <int>        <dbl>
1 TCGA-G4-6317          1 3218610 14832648       6157       -0.160

line 26377:

Sample       Chromosome   Start      End Num_Probes Segment_Mean
  <chr>             <int>   <int>    <int>      <int>        <dbl>
1 TCGA-G4-6317          1 3218610 49468025      24550       -0.099

I got the TCGA segmentation file from firehose, and only eliminated NAs and changed the names of Sample (save only first 12 digits). Does anyone know how to deal with this?

Thanks very much!

snp cnv scna gistic • 137 views
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Wenhu_Cao50

I think I may find a possibl cause, which is collapsing barcode to patients (first 12 chars) would cause different sample types of the same patient to have only one name. I will try this idea and let you know the results.

ADD REPLYlink written 11 weeks ago by Wenhu_Cao50
0
gravatar for Wenhu_Cao
11 weeks ago by
Wenhu_Cao50
Wenhu_Cao50 wrote:

Thanks for the jet lag between me and most biostars, it gives me time to solve this problem by my own.

The reason now seems trivial, however, I will note it for completeness. I substringed the first 12 chars of Tumor-Sample-Barcode to get patient only barcode for further analysis. The problem is that, TCGA has harvested not exactly only one tumor sample from one patient, like here, in my dataset, I found there are '01' - Primary Solid Tumor, '02' - Recurrent Solid Tumor and '06' - Metastatic for a single patient (the numbers are the 14th-15th chars of Tumor-Sample-Barcode, details here: TCGA barcode and TCGA code tables).

Then, substring will cause different sample results from a single patient use the same patient barcode as ID, that would cause GISTIC complain, and would also import bias into the following analysis.

OK, that's my experience, put it here as a reminder for myself!

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Wenhu_Cao50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1480 users visited in the last hour