Question

Best practice for serial variant calling

0

Entering edit mode

4.8 years ago

graeme.thorn ▴ 100

I'm wondering what the best procedure for calling serial plasma samples from the same patients with a single normal sample would be.

For instance, running the samtools-mpileup-varscan2 pipeline with the normal sample first and the serial samples after gives genotype calls of 0/0 when I'd expect a variant to be called, such as here:

chr1    1471992 .   T   C   .   PASS    ADP=14;WT=2;HET=2;HOM=0;NC=0    GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:23:23:23:16:7:30.43%:4.5803E-3:50:38:11:5:2:5   0/1:21:13:13:7:6:46.15%:7.4534E-3:34:35:2:5:1:5 0/0:3:10:10:6:4:40%:4.3344E-2:35:30:2:4:0:4 0/0:6:13:13:9:4:30.77%:4.7826E-2:30:30:5:4:1:3

in the fourth column (3rd serial plasma sample), when the read statistics are very similar to that in the second (1st serial sample) where the genotype has been called as 0/1.

Is this the best way of calling variants on multiple samples, or is it better to do normal/p1, normal/p2, normal/p3 etc, and then merge the variant sets at the end?

variants • 879 views

ADD COMMENT • link 4.8 years ago by graeme.thorn ▴ 100

0

Entering edit mode

not sure about "best practice" but I generally run all the variant calling per-sample or per-pair in parallel, then convert to .tsv with GATK VariantsToTable, add sample labels to the .tsv, then concatenate the .tsvs into a single table for review. If you have tumor-normal pairs then be sure to use variant callers that support that, I use MuTect2 and LoFreq Somatic for that right now but there are plenty others. If you are asking about the technical aspects of how to run them in parrallel then you would want either something basic like GNU parallel or a workflow manager like Snakemake or Nextflow.

ADD REPLY • link 4.8 years ago by steve ★ 3.5k

0

Entering edit mode

It was more whether to run through varscan (or equivalent) all at once, leading to calls I think are incorrect like the one above, or whether to run the (single) normal v each serial plasma sample in pairs, so N v P1, N v P2, N v P3, N v P4 etc... then join the variants together as you suggest.

ADD REPLY • link 4.8 years ago by graeme.thorn ▴ 100

0

Entering edit mode

if your plasma samples were collected independently then I think you would definitely want to run the variant calling independently for each tumor-normal pair. They would be considered separate biological samples.

ADD REPLY • link 4.8 years ago by steve ★ 3.5k

0

Entering edit mode

It was more whether to run through varscan (or equivalent) all at once, leading to calls I think are incorrect like the one above, or whether to run the (single) normal v each serial plasma sample in pairs, so N v P1, N v P2, N v P3, N v P4 etc... then join the variants together as you suggest.

ADD REPLY • link 4.8 years ago by graeme.thorn ▴ 100