Question

Split Vcf File Using Gatk

2

Entering edit mode

11.9 years ago

Rubal7 ▴ 830

Hi all,

Just a quick question, input is appreciated:

Why does the GATK SelectVariants option require a reference genome to split a VCF file by individuals. Surely this is essentially just a file parsing problem?

Cheers!

genome gatk vcf parsing • 4.2k views

ADD COMMENT • link updated 24 months ago by jihosac954 • 0 • written 11.9 years ago by Rubal7 ▴ 830

2

Entering edit mode

can't you just use vcftools to split the file?

ADD REPLY • link 11.9 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

The command is called vcf-subset in vcftools with -c option giving the panel subset. I found it a bit slow, so I made a quick perl script that just splits the rows as in tab-separated file and select the right columns.

ADD REPLY • link 11.9 years ago by Michael 54k

0

Entering edit mode

Thanks yes vcf-subset works great.

ADD REPLY • link 11.9 years ago by Rubal7 ▴ 830

1

Entering edit mode

It probably requires one so that it knows the max chromosomal position.

ADD REPLY • link 11.9 years ago by Zev.Kronenberg 12k

score 0 · Answer 1 · 2012-06-05

I believe that this has to do with the central dogma of GATK:

"All datasets (reads, alignments, quality scores, variants, dbSNP information, gene tracks, interval lists - everything) must be sorted in order of one of the canonical references sequences."

The motivation for this is nicely explained in their FAQ: http://www.broadinstitute.org/gsa/wiki/index.php/Frequently_Asked_Questions#What_is_the_Central_Dogma_of_the_GATK.3F

score 0 · Answer 2 · 2022-05-03

0

Entering edit mode

24 months ago

jihosac954 • 0

If you are facing problems with dividing your heavy VCF files and not managed properly, then you can take the help of VCF Split Software to split large size VCF files according to date, year, folder, and size.

Visit at : https://www.wholeclear.com/split/vcard/

ADD COMMENT • link 24 months ago by jihosac954 • 0