There now also is a plugin in bcftools which does the split in a single pass over the multi-sample VCF/BCF file.
It does not seem to be very fast, but looks correct and there are options to do the split in custom ways.
You do need to install bcftools with the plugins
https://samtools.github.io/bcftools/howtos/plugins.html
Split plugin
http://samtools.github.io/bcftools/bcftools.html#plugin
split
split VCF by sample, creating single- or multi-sample VCFs
Example command line
bcftools plugin split input.bcf -Oz -o ./
The help
About: Split VCF by sample, creating single- or multi-sample VCFs.
Usage: bcftools +split [Options]
Plugin options:
-e, --exclude EXPR exclude sites for which the expression is true (applied on the outputs)
-G, --groups-file FILE similar to -S, but the samples are split by group:
# Create two output files (third column) with the second sample appearing
# in both. The second column is for optional renaming of the samples, use
# dash "-" to keep sample names unchanged
sample1 - file1
sample2 - file1,file2
sample3 new-name3 file2
-i, --include EXPR include only sites for which the expression is true (applied on the outputs)
-k, --keep-tags LIST list of tags to keep. By default all tags are preserved
-o, --output DIR write output to the directory DIR
-O, --output-type b|u|z|v b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
-r, --regions REGION restrict to comma-separated list of regions
-R, --regions-file FILE restrict to regions listed in a file
-S, --samples-file FILE list of samples to keep with up to three columns, one line per output file:
# Create two output files, the first sample is the basename
# of the new file
sample1
sample2,sample3
# Optional second column to rename the samples
sample1 new-name2
sample2,sample3 new-name2,new-name3
# Optional third column to provide output file base name, use dash "-"
# to keep sample names unchanged
sample1 new-name1 output1
sample2,sample3 - output2
-t, --targets REGION similar to -r but streams rather than index-jumps
-T, --targets-file FILE similar to -R but streams rather than index-jumps
--hts-opts LIST low-level options to pass to HTSlib, e.g. block_size=32768
Examples:
# Split a VCF file
bcftools +split input.bcf -Ob -o dir
# Exclude sites with missing or hom-ref genotypes
bcftools +split input.bcf -Ob -o dir -i'GT="alt"'
# Keep all INFO tags but only GT and PL in FORMAT
bcftools +split input.bcf -Ob -o dir -k INFO,FMT/GT,PL
# Keep all FORMAT tags but drop all INFO tags
bcftools +split input.bcf -Ob -o dir -k FMT
•
link
modified 3 months ago
•
written
3 months ago by
William • 4.7k