Question

sample swapping check with verifyBamID -- FREEMIX CHIPMIX

0

Entering edit mode

7.4 years ago

haiying.kong ▴ 360

I ran verifyBamID on our data to check sample swap and contamination.

Our data: whole exome sequence data on normal-tumor paired samples for 95 patients.

I first found genotype for all patients from WES of the 95 normal samples, and then ran verifyBamID for each tumor sample against the genotype data. The command I used is:

verifyBamID --bam ${GATK_BQSR_dir}/${Tumor}.recal.bam --vcf ${GermlineMutations} --out ${verifyBamID_dir}/${batch}_${Tumor} --best –ignoreRG

The output I have as .bestSM: (Sorry, it is difficult to read)

SEQ_ID RG      CHIP_ID #SNPS   #READS  AVG_DP  FREEMIX FREELK1 FREELK0 FREE_RH FREE_RA CHIPMIX CHIPLK1 CHIPLK0 CHIP_RH CHIP_RA DPREF   RDPHET  RDPALT
T9      ALL     B134340 6967873 14171034        2.03    0.04996 3595962.37      3635330.39      NA      NA      0.97661 3618996.71      4708154.17      NA      NA      7.531   2.1034  0.7001

SEQ_ID RG      CHIP_ID #SNPS   #READS  AVG_DP  FREEMIX FREELK1 FREELK0 FREE_RH FREE_RA CHIPMIX CHIPLK1 CHIPLK0 CHIP_RH CHIP_RA DPREF   RDPHET  RDPALT
T28     ALL     B231    6967873 15977931        2.29    0.04260 3905908.38      3945090.62      NA      NA      0.43404 3687074.01      4376275.50      NA      NA      3.674   2.2191  0.7342

T3      ALL     B230    6967873 16995487        2.44    0.05497 4096602.97      4154965.61      NA      NA      0.53847 3940187.43      4585371.56      NA      NA      5.192   2.2639  0.9876

T41     ALL     B578    6967873 16892777        2.42    0.05380 4576180.58      4625879.26      NA      NA      0.37675 4374536.53      5061819.99      NA      NA      4.600   2.2189  0.9553

T34     ALL     B148    6967873 14778134        2.12    0.03513 3621392.88      3649936.94      NA      NA      0.39406 3439631.00      3994788.97      NA      NA      4.364   2.3758  0.9488

T37     ALL     B146    6967873 18465313        2.65    0.03608 4465654.97      4503486.46      NA      NA      0.51553 4328126.49      5016103.08      NA      NA      5.035   2.3685  0.9142

The first sample in the list is matched to wrong normal sample. I want to clarify that the matched normal sample to this tumor sample cannot be sample swap, because they are from different research institutes, there is no chance for them to be swapped.

I would like to know how well the matched normal sample found by the software is matching to the tumor sample? Is there any score that can tell about this?

Is there any explanation why B134340 is identified as best match to T9? Can we explain it as T9 has high CHIPMIX score which is close to 1, and this means that T9 is highly contaminated?

CHIPMIX scores are mostly very high, about 0.5. Why?

What is difference between CHIPMIX and FREEMIX? In our data, we are using only whole exome sequence data. The genotype we are using to find best match with verifyBamID is also found from WES. In our case, what does these scores mean?

wes • 2.8k views

ADD COMMENT • link updated 7.4 years ago by GenoMax 141k • written 7.4 years ago by haiying.kong ▴ 360

0

Entering edit mode

I read the documents for verifyBamID more carefully.

According to what the document says, though T9 is best matched with B134340, this is not really the actual swapped sample, because CHIPMIX is close to 1. Only if CHIPMIX is close to 0, it possibly is the swapped sample.

But what does it mean that CHIPMIX for all other samples are so large, about 0.5?

ADD REPLY • link 7.4 years ago by haiying.kong ▴ 360