Conventions for sorting provenance (original URL, processing steps) in VCF file
0
0
Entering edit mode
5.6 years ago
hyanwong ▴ 70

I'm creating a set of VCFs, one for each human chromosome, by taking a single VCF from e.g. ftp://ftp.ensembl.org/pub/release-91/variation/vcf/homo_sapiens/homo_sapiens.vcf.gz and running

bcftools view homo_sapiens.vcf.gz --regions 1 -Oz -o homo_sapiens_chr1.vcf.gz

I would like to store in each new VCF file the fact that this is the command I ran, and also that the original homo_sapiens.vcf.gz file was downloaded from that ensembl URL on a given date. I assume I should store this information in the ##source line of the VCF, but is there any convention on how this should be stored. E.g. is there a structured (e.g. JSON) schema for saving this sort of provenance information?

vcf metadata provenance • 762 views
ADD COMMENT

Login before adding your answer.

Traffic: 2917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6