I have a vcf file with 23 chromsomes and other unwanted contigs. I want to extract a VCF file with chromsome 1 to chromsome 5 in one file. I want to include the header line as well. How can I do this in the most efficient way? Thanks
Keep in mind that the posted solution only works for single-digit chromosomes, so chr1, chr2, chr3 (...), but not chr10-22 and X. Using chr[1-22] will also not work, as you have to specify to search for double digits.
If you want all regular chromosomes, so 1-22 and X, but discard U, random contigs and stuff from a VCF, use:
SITE FILTERING OPTIONS
These options are used to include or exclude certain sites from any analysis being performed by the program.
Includes or excludes sites with indentifiers matching <chromosome>. **These options may be used multiple times to include or exclude more than one chromosome.**
This will preserve the header of course. In addition, the code posted above in the comments will also get the header as it is getting lines with # as well as chr[1-5] (the statement includes an or that will grab lines starting with # or with chr1, chr2, chr3, etc.