Question: What are the output values for PICNIC when called from the binary build?
1
gravatar for David Quigley
2.9 years ago by
David Quigley10k
San Francisco
David Quigley10k wrote:

I am using PICNIC for copy number analysis in cell lines from Affymetrix 6.0 data (Greeman Biostatistics 2010). PICNIC generates an output file documented in this PDF file. The documentation and Matlab source (specifically HMM_RunB.m) indicate the output file should have 17 columns. I do not have Matlab, so I am calling PICNIC from the binary build that does not require Matlab. This code generates an output file has 16 columns, not 17, and I am having a hard time figuring out what these columns are. Does anyone know with certainty the column headers for this build? They are not written into the output file itself, which is less helpful than it could be. Sample output for two rows:

3414949,1,1485718,0.801,2.22e-16,2,1,0,1,1,-1.77e-12,2,0.5,2,1,1
3186957,1,1488015,0.588,0.337,2,0,1,0,2,-8.08e-13,0.999,0.999,7.342e-27,0.0005,0.999

The headers listed in the documentation are:

  1. SNP Identifier
  2. Raw Intensity Ratio
  3. Allelic Angle
  4. Actual Copy Number
  5. Segmented Total Copy Number
  6. Segmented Minor Copy Number
  7. Middle Fitted Angle Height (above 0.5)
  8. Outer Fitted Angle Height (above 0.5)
  9. LOH index
  10. No. A copies (genotype)
  11. No. B copies (genotype)
  12. State Change Probability
  13. Genotyping Confidence
  14. Genotyping Confidence Conditional Upon State Classification
  15. Heterozygous Probability
  16. Allele A LOH probability
  17. Allele B LOH probability

EDIT: Added a typical call in response to a comment posted below. There is surely a cleaner way to extract the PI, ploidy, and alpha but this worked. Note the trailing slashes in directory names in the call to HMM, which I found to be required. PICNIC has undocumented expectations about the file names for input; it expects the original CEL file name to start with "CGP_", and gives uninformative output names if that is not present.

/PICNIC_DIR/preprocessing CELFILE.feature_intensity \
  /PICNIC_DIR/info/ \
  /PICNIC_OUTPUT_DIR/raw/ \
  /PICNIC_OUTPUT_DIR/output/ \
  /PICNIC_OUTPUT_DIR/

IN_PI=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 1 -d ',')
PLOIDY=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 2 -d ',')
ALPHA=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 3 -d ',')

/PICNIC_DIR/HMM \
  CELFILE.feature_intensity \
  /PICNIC_DIR/info/ \
  /PICNIC_OUTPUT_DIR/output/ \
  /PICNIC_OUTPUT_DIR/ 8 $IN_PI $PLOIDY $ALPHA

 

snp picnic copy-number • 1.1k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by David Quigley10k

Apologies for what may probably be a misuse of the comments section, but would you mind sharing the input code of your run? I have tried running the linux executable version of PICNIC as well but could not get past this error message: "Undefined function or variable "pr_pi". Error in ==> preprocessing at 107​". My input was "sh run_preprocessing.sh rootDir/Matlab_Compiler_Runtime/v710/ 080122_SNP6.0_184B5_B01.feature_intensity ../info/ ../outdir/raw/ ../outdir/output ../outdir/ 'CELL_LINE' 0.0", with a contamination estimate of 0 as was indicated in the manual for cell line samples. 

There is a grave lack of online discussion on this tool despite it's purported utility...

ADD REPLYlink written 2.9 years ago by a1249m0

I added my call. The source for preprocessing.m indicates it's tripping over the line

ploidy=find_ploidy(seg_info_update,sample_type,pr_pi);

pr_pi is supposed to be set at the top of the script but if you read the code, passing the CELL_LINE parameter and not passing 5 parameters results in failure to set pr_pi. It's a bug in the PICNIC code:

elseif nargin > 5
    if (strcmp(sample_type, 'PRIMARY') )
        if (nargin == 6)
            disp('not enough parameters');
            usage();
            exit(0);
        else
            pr_pi=str2num(in_pi);
        end;
    elseif (strcmp(sample_type, 'CELL_LINE') )
        if (nargin > 7)
            disp('too many parameters');
            usage();
            exit(0);
        end;
    else
        pr_pi=0;
    end;    
end;

 

ADD REPLYlink written 2.9 years ago by David Quigley10k

I had a suspicion that it was a bug! Thank you, and thanks very much for the CGP-prepend tip, it works now. I am going through this output with a colleague who has run the Matlab version of PICNIC previously; if he has any thoughts on the 16-column output (which is what I got as well) I will post them here.

 

ADD REPLYlink written 2.9 years ago by a1249m0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1708 users visited in the last hour