I have a SNP dataset in Plink for 419,102 SNPs.
I am trying to run them through ANNOVAR, so I can figure out what types of functional elements they are spread in across the genome.
plink --bfile input --recode vcf-iid --out Ancestral_419k convert2annovar.pl -format vcf4old Ancestral_419k.vcf -outfile Ancestral_419k.avinput
The resulting VCF file has all 419,102 SNPs (and 28 header lines)
The ANNOVAR log file states the following:
NOTICE: Read 419130 lines and wrote 417600 different variants at 418216 genomic positions (418216 SNPs and 0 indels) NOTICE: Among 418216 different variants at 418216 positions, 111601 are heterozygotes, 305999 are homozygotes NOTICE: Among 418216 SNPs, 340143 are transitions, 78073 are transversions (ratio=4.36)
The avinput file has 418216 SNPs. I am not sure why 886 SNPs are not being read in the conversion. Anyone have an idea what is going on?