Question

How to bechmark a human dataset ? (is it needed)

1

Entering edit mode

6.2 years ago

pinn ▴ 210

Hi I'm working on exome dataset I generated variant calls from different combination of aligners vs variant callers. Can i benchmark my exome dataset with Genome-in-bottle (HG001, HG002,HG003, HG004 & HG005) to find the Truth positives (TP) and False positives (FP). Is this the way ? Benchmarking mean you can take any query dataset and do comparative analysis with confidence calls (HG001 ....HG005). Can any one explain, I'm very confused with benchmarking ?

Thanks for your comments.

genome next-gen • 2.2k views

ADD COMMENT • link updated 2.4 years ago by Jeremy Leipzig 22k • written 6.2 years ago by pinn ▴ 210

1

Entering edit mode

Based on your past questions, IMO you’re obsessing over TP/FP a lot. Run preliminary QC, extract variants of significance/interest and invest effort in in-depth QCing them.

ADD REPLY • link 6.2 years ago by Ram 43k

score 4 · Accepted Answer · 2021-11-01

4

Entering edit mode

2.5 years ago

Jeremy Leipzig 22k

You can now benchmark your own variant calls against HG002/HG003/HG004 on Truwl using 4 popular regions (more coming) and see how your results stack up against the top PrecisionFDA Truth Challenge competitors.

https://medium.com/truwl/accessible-and-uniform-benchmarking-22f598616ef5

ADD COMMENT • link 2.4 years ago by Jeremy Leipzig 22k

2

Entering edit mode

I waited so long, thanks for developing.

ADD REPLY • link 2.4 years ago by pinn ▴ 210

0

Entering edit mode

Jeremy Leipzig How about benchmarking the SV/CNV?

ADD REPLY • link 2.4 years ago by jacobhsu • 0

1

Entering edit mode

Yes we have GATK-SV running and are currently working with a contact at Stanford to choose a subset of metrics to display in the benchmarking table. I would love to hear about your needs if you are working with CNV/SVs, so please sign up for a free trial and we'll reach out.

ADD REPLY • link 2.4 years ago by Jeremy Leipzig 22k