Question

What's 1000 Genome indel calls??

0

Entering edit mode

6.0 years ago

2012secondseason ▴ 40

Hi. Studing tool of RealginerTargetCreator, I'm faced with a curious problem.

--known / -known Input VCF file with known indels Any number of VCF files representing known SNPs and/or indels. Could be e.g. dbSNP and/or official 1000 Genomes indel calls. SNPs in these files will be ignored unless the --mismatchFraction argument is used.

About this argument, what's 1000 Genomes indel calls? It is made of someone's genome? I don't understand that why I need information of one person's indel record at using this tool.

Thanks.

1000Genome realignertargetcreater gatk • 3.0k views

ADD COMMENT • link updated 3.1 years ago by enigmargs • 0 • written 6.0 years ago by 2012secondseason ▴ 40

score 3 · Accepted Answer · 2018-04-13

3

Entering edit mode

6.0 years ago

h.mon 35k

The 1000 Genomes InDel calls is made on two trios, not one individual. And Broad provides sets of known indels with 1000 Genomes + Mills - probably from An initial map of insertion and deletion (INDEL) variation in the human genome, and I have no idea how many individuals were genotyped there.

The importance of the known sites is explained at the Broad online documentation: What should I use as known variants/sites for running tool X?

ADD COMMENT • link 6.0 years ago by h.mon 35k

0

Entering edit mode

Nice explanation. Thank you very much!

ADD REPLY • link 6.0 years ago by 2012secondseason ▴ 40

0

Entering edit mode

h.mon Hi, 1000Genomes+Mills hyperlink leads to an article where the link to resource is broken. But the updated GATK resource bundle only has hg38 files. I have bam files mapped to hg37 reference (not sure whether it's from GATK bundle). It would be great if you could please direct me to the site where I can download all the resources shown in GATK bundle but for hg37?

ADD REPLY • link 3.1 years ago by enigmargs • 0