How to read Affymetrix CYCHP File
1
3
Entering edit mode
10.0 years ago

Howdy y'all,

I'm having a hard time intrepreting the cychp format.

Here's the affy documentation but it doesn't tell me what the symbols in the actual file mean. I have no idea http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cychp.html

So has anyone worked with this .cychp file before? Is there an in house or 3rd party program that allows me to retrieve the data?

Tusen takk

affymetrix SNP cychp cnv binary-file • 5.7k views
ADD COMMENT
1
Entering edit mode

this is not really a data exchange format, rather it is a special binary format, it feels unlikely that other people develop software for it.

ADD REPLY
4
Entering edit mode
9.9 years ago
Christian ★ 3.0k

You need to first convert the cychp file into a readable text format with Affymetrix Powertools and then you can parse it.

  1. Convert to text

    apt-1.15.2-x86_64-intel-linux/bin/apt-chp-to-txt --chp-files samples.chpfile --out-dir .
    

    The file 'samples.chpfile' needs to be created first with a text editor and should contain the filenames of your .cychp files that you want to convert.

  2. Parse text file to obtain log R ratios and allele peaks

    use warnings FATAL => qw( all );
    use strict;
    
    use Getopt::Long;
    
    my ($lrr_file, $baf_file, $sample_name);
    GetOptions
    (
     "lrr-file=s" => \$lrr_file,  
     "baf-file=s" => \$baf_file,  
     "sample-name=s" => \$sample_name,  
    );
    
    die "ERROR: --lrr-file not specified" if (!$lrr_file);
    die "ERROR: --baf-file not specified" if (!$baf_file);
    die "ERROR: --sample-name not specified" if (!$sample_name);
    
    open(LRR, ">$lrr_file") or die "ERROR: Could not write to file $lrr_file\n";
    open(BAF, ">$baf_file") or die "ERROR: Could not write to file $baf_file\n";
    
    print LRR "\tchrs\tpos\tLRR_$sample_name\n";
    print BAF "\tchrs\tpos\tBAF_$sample_name\n";
    
    while(<>)
    {
     last if (/^ProbeSetName\tChromosome\tPosition\tLog2Ratio\tWeightedLog2Ratio\tSmoothSignal/);
    }
    
    while(<>)
    {
     last if  (/^#/);
     my ($ProbeSetName, $Chromosome, $Position, $Log2Ratio, $tWeightedLog2Ratio, $tSmoothSignal) = split(/\t/);
     print LRR "$ProbeSetName\t$Chromosome\t$Position\t$Log2Ratio\n";
    }
    
    while(<>)
    {
     last if (/^ProbeSetName\tChromosome\tPosition\tAllelePeaks0\tAllelePeaks1/);
    }
    
    while(<>)
    {
     last if (/^#/);
     my ($ProbeSetName, $Chromosome, $Position, $AllelePeaks0, $AllelePeaks1) = split(/\t/);
     my $baf = ($AllelePeaks0 - 250)/(750-250);
     $baf = 0 if ($baf < 0);
     $baf = 1 if ($baf > 1);
     print BAF "$ProbeSetName\t$Chromosome\t$Position\t$baf\n";
    }
    
    while(<>) {}; # read pipe to the end to avoid SIGPIPE error status 141
    
    close(LRR);
    close(BAF);
    
ADD COMMENT
0
Entering edit mode

Dear Christian,

would you please explain the logic of the following line?

my $baf = ($AllelePeaks0 - 250)/(750-250);

thanks

ADD REPLY
0
Entering edit mode
This is just an ad-hoc heuristic I came up with to transform allele peaks into BAF values with range between 0 and 1.
ADD REPLY

Login before adding your answer.

Traffic: 3081 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6