Question: Is there a FASTA file containing 1000 Genomes variant information to check reference alleles in VCF files?
25 days ago by
Hi folks!

I've imputed genotype data using the Michigan Imputation Server (MIS), using the 1000 Genomes Phase 1 panel (not many errors were found, according to MIS). After (and before) imputation, I wanted to perform a sanity check by running [], to make sure the ref/alt alleles in my data were consistent with 1000G Phase1 data. This analysis revealed several inconsistent reference sites, when comparing to this fasta file from 1000G Phase1. Upon close inspection, I noticed that the reference alleles for several SNPs which were "supposedly inconsistent" in my vcf were actually consistent with the data in UCSC Browser, suggesting me that I was using the wrong fasta file as reference for I saw that this person also had a similar issue, but I could not find an answer regarding which fasta file I should use as reference for (or for other tools, like "bcftools norm --check-ref").

I found this link which says that 1000 Genomes doesn't provide fasta files containing variant information, so the file I used as reference for was not right in the first place. Any clue anyone?

7 days ago by
I've had good results following the guidance in . I'd guess that either of the first two fasta files listed there should work; let me know if that isn't the case.

