Question: VCF files columns-biological explanation.
0
gravatar for GK1610
2.5 years ago by
GK161080
United States
GK161080 wrote:

I am working on joint genotype call of gvcf files of 200 samples. This is my first time with vcf data. I get the format. but I am struggling with some basic questions like

What is a reference allele e.g. in hg19 file from 1000 genomes_phase3? Is it the reference allele seen on most of people sequence?

What is an alternate allele? Is it the one which is minor allele at that variant and position?

What is NON_REF allele in alt. allele column?

snp • 961 views
ADD COMMENTlink modified 2.5 years ago by WouterDeCoster44k • written 2.5 years ago by GK161080
2
gravatar for WouterDeCoster
2.5 years ago by
Belgium
WouterDeCoster44k wrote:

The reference genome is a combination of individuals which got sequenced to generate one haploid set of chromosomes. The nucleotides in this reference are not necessarily the most frequent in the population. Therefore this reference genome is not a human genome, but just something to compare our reads with. It may contain haplotypes that do not exist in reality, because it's from multiple diploid individuals collapsed in a single haploid genome.

For a variant: the reference allele is the nucleotide of the reference genome at that position. An alternate allele is an allele not matching the reference allele. This may be the minor allele, but not necessarily because also the reference allele might be the minor allele.

ADD COMMENTlink written 2.5 years ago by WouterDeCoster44k

Thanks.. This is awesome!

ADD REPLYlink written 2.5 years ago by GK161080

Happy to help. If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

Cheers, Wouter

ADD REPLYlink written 2.5 years ago by WouterDeCoster44k

Refer to G5, G5A, KG and KG-PROD tags in dbSNP (refer to dbSNP builds for hg19 equivalent NCBI genome) to know reference allele frequency. dbSNP includes allele frequency from 1000 genome and hapmap projects.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by cpad011214k
0
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

VCF specification: https://samtools.github.io/hts-specs/VCFv4.2.pdf

ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum130k

Thanks but this document doesn't explain my questions.

ADD REPLYlink written 2.5 years ago by GK161080
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1141 users visited in the last hour