Hello all,
I am currently working on the CNV calls for a infertile patients whole exome cohort.
So I want to make germline CNV calls on the latter.
I wanted to test CNVkit for this.
We have patients identified as positive controls and validated experimentally, possessing a WDR66 homodeletion on exons 20/22 to 22/22.
However with the following pipeline, from the segmentation stage I don't find a segment consistent with this target modification ...(hmm and hmm_germline method tested).
The intervals list comes from the Ensembl canonical transcripts list (delimited according to the exons size).
PIPELINE used:
cnvkit.py access ~/hs38DH.fa -x ~/sv_blacklist.bed -o ~/STEP1_ACCESS/acces-excludes.hg38.bed
cnvkit.py target ~/Interval_GRCH38_97_200921_STEP0.bed --split -o ~/STEP2_TARGET_ANTITARGET/my_targets.bed
cnvkit.py antitarget ~/STEP2_TARGET_ANTITARGET/my_targets.bed -g ~/STEP1_ACCESS/acces-excludes.hg38.bed -o ~/STEP2_TARGET_ANTITARGET/my_antitargets.bed
test=$(find ~/BAM_FOLDER/ -name *.bam)
for bam in $test
do
name=${bam:0:-4}
cnvkit.py coverage $bam -p 20 ~/STEP2_TARGET_ANTITARGET/my_targets.bed -o ~/STEP3_COVERAGE/$name.targetcoverage.cnn
cnvkit.py coverage $bam -p 20 ~/STEP2_TARGET_ANTITARGET/my_antitargets.bed -o ~/STEP3_COVERAGE/$name.antitargetcoverage.cnn
done
cnvkit.py reference -o ~/STEP4_REFERENCE/FlatReference.cnn -f /data/septiera/REF_GENOME/hs38DH.fa -t ~/STEP2_TARGET_ANTITARGET/my_targets.bed -a ~/STEP2_TARGET_ANTITARGET/my_antitargets.bed
cnvkit.py fix ~/STEP3_COVERAGE/POSControlSample.targetcoverage.cnn ~/STEP3_COVERAGE/POSControlSample.antitargetcoverage.cnn ~/STEP4_REFERENCE/FlatReference.cnn -o ~/STEP5_FIX/POSControlSample.cnrhere
cnvkit.py segment ~/STEP5_FIX/POSControlSample.cnr -m hmm -p 10 -o ~/STEP6_SEGMENT/POSControlSample_hmm_wdrop.cns
cnvkit.py segmetrics ~/STEP5_FIX/POSControlSample.cnr -s ~/STEP6_SEGMENT/POSControlSample_hmm_wdrop.cns --ci -o ~/STEP7_CALL/POSControlSample_hmm_wdrop_ci.cns
cnvkit.py call ~/STEP7_CALL/POSControlSample_hmm_wdrop_ci.cns --filter ci -y -o ~/STEP7_CALL/POSControlSample_call_ci.cns
POSControlSample.cnr result example:
chromosome start end gene log2 depth weight
chr12 121918592 121918695 - -6.34236 3.82524 0.0001 #EXON1
chr12 121921286 121921683 - 4.49197 141.151 0.0001 #other gene
chr12 121923622 121923990 - 4.08991 124.815 0.0001 #EXON2
chr12 121931746 121931886 - 2.21378 107.421 0.0001 #EXON3
chr12 121934247 121934356 - -1.77108 69.9817 0.0001
chr12 121942534 121942645 - 0.540772 138.306 0.0001
chr12 121942895 121942975 - -0.904748 106.25 0.0001
chr12 121948984 121949061 - 3.90008 48.9091 0.0001
chr12 121951480 121951530 - 1.33628 69.5 0.0001
chr12 121954120 121954334 - 2.44978 76.4579 0.0001
chr12 121957074 121957268 - 3.89977 164.613 0.0001
chr12 121958272 121958522 - 3.93186 170.108 0.0001
chr12 121958943 121959094 - 2.87873 135.053 0.0001
chr12 121960585 121960758 - 0.838862 59.8035 0.0001
chr12 121961978 121962162 - 0.077442 30.2228 0.0001
chr12 121966955 121967069 - 0.574232 83.2719 0.0001
chr12 121968006 121968169 - -0.284698 122.264 0.0001
chr12 121975244 121975334 - -3.15683 27.9889 0.0001
chr12 121975542 121975685 - 1.88913 93.4965 0.0001
chr12 121976185 121999216 Antitarget -0.857707 0.588164 0.896216
chr12 121999716 121999944 - -23.0153 0 0.0001 #EXON20
chr12 122001497 122001598 - -6.13142 1.61386 0.0001 #EXON21
chr12 122003652 122003919 - 3.10711 61.7678 0.0001 #EXON22
POSControlSample_hmm_wdrop.cns result example:
chromosome start end gene log2 depth probes weight
chr12 121917290 121942645 - -4.14401 58.6302 10 0.001 #WDR66 SEGMENT 1 EXONS 1/22-6/22
chr12 121942895 122018922 - -0.783351 0.707287 20 1.74422 #WDR66 SEGMENT 2 EXONS 7/22-22/22
POSControlSample_call_ci.cns result example:
chromosome start end gene log2 cn depth probes weight
chr12 121917290 121942645 - -4.14401 0 58.6302 10 0.001 #EXONS 1/22-6/22 HOMODEL (error)
chr12 121942895 122216869 - -0.48987 1 0.932991 117 5.27485 #EXONS 7/22-22/22 HETDEL (error)
Is my pipeline correct for this analysis type? How can I improve it?
Thanks in advance!!
How many patients do you have? In my experience, I would suggest using your patients to build the reference instead of building a flat reference.