Question: Is there a FASTA file containing 1000 Genomes variant information to check reference alleles in VCF files?
gravatar for rodd
25 days ago by
London, United Kingdom
rodd40 wrote:

Hi folks!

I've imputed genotype data using the Michigan Imputation Server (MIS), using the 1000 Genomes Phase 1 panel (not many errors were found, according to MIS). After (and before) imputation, I wanted to perform a sanity check by running [], to make sure the ref/alt alleles in my data were consistent with 1000G Phase1 data. This analysis revealed several inconsistent reference sites, when comparing to this fasta file from 1000G Phase1. Upon close inspection, I noticed that the reference alleles for several SNPs which were "supposedly inconsistent" in my vcf were actually consistent with the data in UCSC Browser, suggesting me that I was using the wrong fasta file as reference for I saw that this person also had a similar issue, but I could not find an answer regarding which fasta file I should use as reference for (or for other tools, like "bcftools norm --check-ref").

I found this link which says that 1000 Genomes doesn't provide fasta files containing variant information, so the file I used as reference for was not right in the first place. Any clue anyone?

ADD COMMENTlink modified 7 days ago by chrchang5235.2k • written 25 days ago by rodd40
gravatar for chrchang523
7 days ago by
United States
chrchang5235.2k wrote:

I've had good results following the guidance in . I'd guess that either of the first two fasta files listed there should work; let me know if that isn't the case.

ADD COMMENTlink written 7 days ago by chrchang5235.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1782 users visited in the last hour