I have a few hundred single-sample vcf files that I need to consolidate into a single vcf file.
My understanding is that plink is one of the many tools that one can use for doing this. I would prefer to use plink if possible, even if it is not the best one, because I am already using plink for other operations, and I want to keep my toolchain as small as possible.
Unfortunately, I am having a hard time locating the relevant details for this task in the plink documentation.
Could someone kindly either post the plink command-line to perform such consolidation, or else a link to the relevant section(s) in the plink documentation?
EDIT: I am using plink version 1.90b, for backward-compatibility with the rest of the project I am working on.
(If the question above was clear enough to you, you can skip the rest of this post.)
Each single-sample vcf file consists of a comment/header section (each line beginning with ##), followed by a single row of tab-separated column headers (which begins with a single #), followed by ~1.7M tab-separated rows of metadata and data.
The comment/header section is identical across all these files.
The rows in all these files correspond to the set of SNPs and other variants that are probed by the same genotyping chip. In other words, the rows of all these files are consistent with each other.
The rows section of all these files consist of 10 tab-separated columns, the first 9 of which hold metadata, and are identical across all the files. Only the 10th column (including its column header) contains sample-specific data, and therefore differs across the files.
Accordingly, the first 9 column headers are identical across these files, while the 10 column header is a sample-specific identifier, and hence unique to each file.
Therefore, "consolidation" here means producing a file in which the leading header/comment section and the first 9 columns are identical to those of any of the single-sample vcf files, and the remaining (tab-separated) columns correspond to the 10th columns of all the single-sample vcf files. Conceptually, this a relatively simple operation. The problem is to do it efficiently.