Hi! Has anyone looked into VCF files provided by Dante Labs? I'm looking into .raw.vcf and .filtered.vcf. For reference sequence, all I can see is 2 contigs: GL000225.1 and GL000192.1. All variants belong to one of these.
There's no conventional CHROM column.
I don't know how do those coordinates map to chromosomes in GRCh37 or GRCh38 assemblies.
GL000192 corresponds to chr1_gl000192_random GL000225 corresponds to chrUn_gl000225
Moreover, there's no VCF header with metainfo, all I can see is 1 top row which is just a row of colon separated numbers that are meaningless to me (nothing like the conventional VCF header).
- What's the use of those VCFs?
- What variants do they contain?
- What version of VCF it is and where can I get the standard metainfo?
- Is there a map of those coordinates to any core assembly chromosomes?
- Am I right they contain not all genomic VCFs but only a small part of the genome? (Size of GL000225.1 contig is 211,173 bp and size of GL000192.1 is 547,496 bp, that obviously doesn't sum up to the whole genome).
UPD: I found a mistake, sorry. The file was incomplete, only tail was uploaded to Jupyterhub. In local Jupyter, the file loaded completely, it actually has the header and CHROM column.