How to obtain a segmentation file from Control-FREEC output to use with GISTIC
1
2
Entering edit mode
4.1 years ago
dganiewich ▴ 130

Hello everyone,

I would like to perform a GISTIC analysis on a set of WES samples which I have analyzed with Control-FREEC. This program does not output the regular segmentation file with ID, Chrom, Start, End, num markers, log ratio which is needed for GISTIC, neither does it output the markers file.

1) I was wondering if anyone has already done this before and could explain me how to obtain the required input files for GISTIC?

2) I thought of running the ratio output file through DNAcopy (segmented it again), does this sound reasonable even though it is WES data?

Thank you very much for your help! Best, Daiana

ps. sorry if my english is not too good, it is not my native language XD

control-freec gistic • 3.4k views
ADD COMMENT
8
Entering edit mode
4.1 years ago

ps. sorry if my english is not too good, it is not my native language XD

No te preocupes - tu inglés es excelente.

------------------

The main input that you require for GISTIC is the segmentation file, which should have:

  • (1) Sample (sample name)
  • (2) Chromosome (chromosome number)
  • (3) Start Position (segment start position, in bases)
  • (4) End Position (segment end position, in bases)
  • (5) Num markers (number of markers in segment)
  • (6) Seg.CN (log2() -1 of copy number)

To go direct from Control-FREEC to GISTIC 2.0, I actually believe the best output file to use is the '*_ratio.bed' file, which can be produced by the freec2bed.pl script (see bottom of THIS page, under section entitled 'Translate Control-FREEC's output into Bed or Circos formats').

However, you will have to convert the copy number column in the BED output via:

log2(x) - 1

The problem will be to determine a value for the 'Num markers' column for the GISTIC input file. From Control FREEC, the reads per interval are stored in the *.cpn files, I believe, and these could be used as 'pseudo-markers'. To find a way to overlap these with the BED file will be extra work for you, thoug - it could be done via complex BEDTools commands, or within R using GenomicRanges.

<h6>#</h6>

Para concluir, it is not impossible to use Control FREEC with GISTIC; however, it may be easier to use DNAcopy with the aligned BAM file and just avoid the use of Control FREEC. It is your choice.

Hasta pronto

Kevin

NB - for GISTIC versions >2.0.23, no markers file is required.

ADD COMMENT
1
Entering edit mode

Gracias Kevin!! I will try and generate the bed and do the overlap with GenomicRanges (and will definitely put the code here if I succeed). I didn't like the idea of using DNAcopy as for what I have read, specificity is very low for Exome data, unlike Control-FREEC which is more reliable. Anyways, in case I cannot do the first, I might use DNAcopy, as GISTIC is my priority in this analysis. Thank you!! Best, Daiana

ADD REPLY
0
Entering edit mode

Dear Daiana, did you manage to create the Gistic Segmentation File starting from the output Control-FREEC? if yes, would you be so kind to share the GenomicRanges/bedtools commands you used to combine the pseudo marker info contained in the *.cpn files with the bed files? Thank you a lot, A

ADD REPLY

Login before adding your answer.

Traffic: 2063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6