I recently did a Promethease run from Dante Labs filtered VCF files and was fairly disappointed as it only recognises variant data rather than assuming that you have the reference value at other points (which I get may not be correct if the reads were low quality).
I'm also finding similar at Codegen.eu, using WGSExtract to produce microarray style data doesn't include all the useful SNPs.
I looked at the Promethease instructions and they suggested creating a gVCF at https://snpedia.com/index.php/VCF as they no longer accept BAM files but at 241Gb compressed size it's well over their 4GB limit. It does include al the Alt configs though as its from the GRCh38 reference that Dante use containing everything. Looking in IGV, the mpileup --g option also seems to have missed the genotyping, only reporting allele frequency data
I'm looking to create two files that might upload to these services
1 - A VCF or gVCF that also includes values and quality info for all the Human Chromosomes and MTDNA references and the genotype for each location. Ideally under the 4GB limit (that might require some more gVCF compression and stripping the alt configs)
2 - A raw style txt file for Codegen.eu from this data that lists as # rsid chromosome position genotype e.g. "rs4477212 1 82154 AA" for all the known rsIDs that meet a reasonable quality level in the data (including non variants). I had 30x genome with 130x exome data for this
Any help with a sequence of commands I could use for both of these would be appreciated.
Thanks
Tim