Hi
I am trying to generate a vcf from SNP array data (genomestudio). So far I can convert to plink format from genomestudio and create a vcf via plink2 commands:
plink2 \
--pedmap gs-out-plink \
--make-pgen \
--sort-vars \
--merge-x \
--out test-1
then:
plink2 \
--threads 10 \
--pfile test-1 \
--fa ${ref} \
--snps-only just-acgt \
--export vcf \
--output-chr chrM \
--out out-test-1
I would then like to validate the vcf I created with GATK:
java -jar gatk ValidateVariants -R ${ref} -V plink_vcf.vcf.gz
The problem is however I get the error (renamed chr and Position):
The REF allele is incorrect for the record at position chrq:56789 fasta says C vs VCF says G
I believe this is down to Top/Bottom strand nomenclature. In their manifest file it gives (shortened example with alternate names):
Name IlmnStrand SNP GenomeBuild Chr MapInfo RefStrand
rs123456789 TOP [A/G] 38 15 987654321 -
Is there a software or any other way I can convert these to give the appropriate reference and alt allele?
Thanks in advance!