Benefit of platinum genomes in WGS experiment
1
2
Entering edit mode
7.6 years ago
Ram 36k

I am looking to analyze an experiment that deals with samples from related people. The sequencing was outsourced and the experiment's variant calling step seems to involve the 17 platinum genomes.

I would like to gain a deeper understanding of the benefit of using the platinum genomes in my experiment. In other words, what have I gained by including this data in my experiment? How best can I leverage these results downstream?

Thank you in advance for your inputs!

WGS platinum genomes • 2.3k views
ADD COMMENT
0
Entering edit mode

What do you mean by: "The sequencing was outsourced and the experiment's variant calling step seems to involve the 17 platinum genomes."?

ADD REPLY
0
Entering edit mode

We had a different institution do the sequencing and bioinformatics for us, because it was cheaper than in-house. The variant calling part of their workflow involves our samples + the 17 platinum genomes.

ADD REPLY
0
Entering edit mode

Zev may be asking what you mean by "the 17 platinum genomes". I have never heard of that before. Google suggests it may be some related human genomes that Illumina sequenced to 30x-200x; is that correct?

As for gaining things... well, what is your experiment? Using deeply-sequenced related individuals can be very useful for calibrating variant-calling software; I used that approach in the past to demonstrate that a certain company's indel calls were completely bogus, because were impossible to explain by heredity. 30x is not deep sequencing by any standard, and it seems like various different library types were used for different members, so each has person has different biases... this is very bad experimental design for calibrating software. It's almost certainly better than completely uncalibrated software, but not something I would advertize.

Anyway, I can't think of any use for a single set of random related genomes other than for calibration, and this set does not seem to be a very good choice for that, assuming we are talking about the same thing.

ADD REPLY
0
Entering edit mode

I hadn't heard of the platinum genomes before I saw these experiment results either. Sure, I had seen occurrences of "NA12878" in the wild quite a few times, but never thought it'd be a part of some set of high quality genomes.

Yes, these 17 are Illumina's CEPH pedigree genomes that have been extensively studied for variant calling.

I wonder why Illumina would use anything but the best of protocols to create a "platinum" data set. Anyway, I guess I'll ask the vendor why they used the set and if we would benefit from it.

ADD REPLY
0
Entering edit mode
7.6 years ago

See here

http://www.illumina.com/platinumgenomes/

Illumina have basically done variant calling on this 17-member pedigree with their best sequencing protocols and very high depth, and then called variants with a large number of state of the art callers (GATK, Cortex, Freebayes, Isaac etc etc), and then used the pedigree as an independent method of detecting errors. It allows them to make a combined callset between callers which is more inclusive than an intersection, and gives a big truth set of calls ranging from SNPs to indels of many sizes.

I have no idea how this was used by the pipeline you mention, but maybe they applied the same pipeline to NA12878 and looked to see how many of this conservative truth set were called

ADD COMMENT
0
Entering edit mode

Thank you for the answer. Yes, I did search online quite a bit on the platinum genomes are read up on the protocol used by Illumina, but like you say, I have no idea either why this was used in the pipeline. Maybe only the guys who ran the pipeline can shed light on this. I'll keep you guys posted on their reasons.

ADD REPLY

Login before adding your answer.

Traffic: 1381 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6