Question: VCF files columns-biological explanation.
0
gravatar for GK1610
19 months ago by
GK161060
United States
GK161060 wrote:

I am working on joint genotype call of gvcf files of 200 samples. This is my first time with vcf data. I get the format. but I am struggling with some basic questions like

What is a reference allele e.g. in hg19 file from 1000 genomes_phase3? Is it the reference allele seen on most of people sequence?

What is an alternate allele? Is it the one which is minor allele at that variant and position?

What is NON_REF allele in alt. allele column?

snp • 680 views
ADD COMMENTlink modified 19 months ago by WouterDeCoster42k • written 19 months ago by GK161060
2
gravatar for WouterDeCoster
19 months ago by
Belgium
WouterDeCoster42k wrote:

The reference genome is a combination of individuals which got sequenced to generate one haploid set of chromosomes. The nucleotides in this reference are not necessarily the most frequent in the population. Therefore this reference genome is not a human genome, but just something to compare our reads with. It may contain haplotypes that do not exist in reality, because it's from multiple diploid individuals collapsed in a single haploid genome.

For a variant: the reference allele is the nucleotide of the reference genome at that position. An alternate allele is an allele not matching the reference allele. This may be the minor allele, but not necessarily because also the reference allele might be the minor allele.

ADD COMMENTlink written 19 months ago by WouterDeCoster42k

Thanks.. This is awesome!

ADD REPLYlink written 19 months ago by GK161060

Happy to help. If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

Cheers, Wouter

ADD REPLYlink written 19 months ago by WouterDeCoster42k

Refer to G5, G5A, KG and KG-PROD tags in dbSNP (refer to dbSNP builds for hg19 equivalent NCBI genome) to know reference allele frequency. dbSNP includes allele frequency from 1000 genome and hapmap projects.

ADD REPLYlink modified 19 months ago • written 19 months ago by cpad011212k
0
gravatar for Pierre Lindenbaum
19 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

VCF specification: https://samtools.github.io/hts-specs/VCFv4.2.pdf

ADD COMMENTlink written 19 months ago by Pierre Lindenbaum124k

Thanks but this document doesn't explain my questions.

ADD REPLYlink written 19 months ago by GK161060
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1625 users visited in the last hour