I have several whole-exome sequencing (WES) samples sequenced by pair-end (PE) and single-end (SE): 2 patient samples and 5 normal population samples, which were prepared by the same library prep kit and sequenced by the same platform, Nextseq 550. We built normal reference by either paired-end data (Ref-PE) or single-end data (Ref-SE).
We found that, by looking at the patient sample (sequenced by PE), the Ref-PE gave good results that we could successfully identify the CNV in the patient, for which CNV was confirmed by aCGH experiment.
However, when we analyzed the same patient sample (sequenced by SE) using either Ref-SE or Ref-PE, the both variations are huge. We got many noisy segments with low “probe numbers” and low “weight value”. In contrast, the true CNV segments have probes >150 and weighted value > 100.
Therefore, I’d like to ask several questions:
Are SE data not compatible for CNVkit? Or is there any parameters we could set for CNVkit when analyzing SE data?
As I mentioned above, may I set up a cut-off of high confident segment with probes >150 and weighted value > 100?
Is the BAF analysis suitable for germline sample? Or it’s for the somatic CNV only?
We used “access-5kb-mappable.hg19.bed” provided by CNVkit to generate my_antitargets.bed. However, when looking into the my_normal_reference.cnn, the all the values of “rmask” for each antitarget region are “0”. Both my Ref-PE and Ref-SE are the same. Is that normal?
Look forward to your advice, thank you!