I have to analyse Capture Sequencing 170 genes in a type of cancer. We have 75 tumors sequenced individually and 45 matched germline of good quality. I have to do a variant calling on those samples.
My theoritical pipeline is so far :
- sorting bam file
- remove low quality reads (< 20)
- remove PCR/optical duplicates
- GATK Base Recalibration + BQSR
- GATK IndelRealignement
- then pick 3 somatic variant callers and keep variant that are called by at least two variant callers between VarScan2 , Mutect2, Strelka2. (--min-base-score 20)
My problem is the data generated by the lab for the germline samples are "weird" : they combined in one capture sequencing three equimolar germline DNAs of three different parients. In other word I have 75 individual tumor capture sequencing, with 15 capture sequencing each combining three different patient germline DNA (expected allele frequency 1/6 ~16,6%).
For me I have three choices:
- Do everything in tumor only (and look afterwards germlline reads by hand to exclude tumor variant present at 17% in the matched "3 germline DNA into 1 capture sequencing")
- Do paired analysis of the matched tumor and the appropriate "3 germline DNA into 1 capture sequencing" for thr tumor DNA which has sequenced germline, somehow including the 17% allele frequency information here
- Create, based on those 15 capture sequencing with 45 germline DNA in them, a good Panel Of Normal to use for all tumor samples. (if so is 40 samples sufficient ?)
My question is : what would be be best approach, would have the best power for you and be acceptable for publication afterwards? Would you suggest another approach?
Thank you for reading this post. Alexandre Eeckhoutte