VCF Files: Help on 0/1 1/1 0/0 1/1 | vs / (phased & unphased
1
4
Entering edit mode
7.1 years ago
jnowacki ▴ 100

I have an excel spreadsheet I'm trying convert to VCF File. I've got most fields right but I can't figure out how to convert to homologous / heterogeneous SNPs to one of these:

What is the difference between these?

  • 1/0 vs 0/1 ?? (I'm guessing no difference ???)
  • 1/1 vs 0/0 ??
  • 0|1 vs 0/1 ?? (1st method is haplotype specific, 2nd method is not? No linkage?)

I'm trying to read the manual but it's greek to me. Any help?

This is my input enter image description here

Current version of handmade VCF

enter image description here

VCF • 7.5k views
ADD COMMENT
1
Entering edit mode

Given that all the information I have is "hom" or "het" what do I enter into that column? Any advice? My biologist PhD and myself are having problems interpreting the VCF manual.

ADD REPLY
3
Entering edit mode
homref -> 0/0
homvar -> 1/1
het -> 0/1
ADD REPLY
1
Entering edit mode

Note that in the input you showed above, you also have a variant at chr7:128846328 that is not biallelic (there are three alleles in total, the REF ("GA", which gets GT index 0), ALT1 ("CT", which gets GT index 1), and ALT2 ("CA[obscured]", which gets GT index 2). So on the assumption that "het" in your column means that the sample has both of the ALTs and not the REF, you would need to use GT = "1/2".

ADD REPLY
0
Entering edit mode

Thank you!!!!!!

Why is

  • het -> 0/1

and not

  • het -> 1/0

?

ADD REPLY
0
Entering edit mode

They are both heterozygous

ADD REPLY
0
Entering edit mode

@Kevin Blighe - actually my questions are below your response (that you answered recently). The one that you answered was posted 3.5 years ago. Can i kindly request your help with the recent one

ADD REPLY
0
Entering edit mode

Hi @pierre Lindenbaum @Len Trigg @jnowacki,

May I check with you as I am new to this domain and found this thread useful

In the GT field of VCF file, GT[0] always indicates - Ref allele and GT[1] indicates Alternate allele. Am I right?

Ref    ALT

A         A      # homozygous (0/0)
A         T      # heterozygous (0/1) or (1/0) doesn't make any difference. Am I right?

q1) But may I know what is homvar/homalt which is 1/1? How does Ref and ALT alleles look like for them? can you give an example?

q2) Similarly, what is 1/2 or 2/1 called?

q3) Am I right to write Ref and ALT like below for 1/2 or 2/1?

Ref Alt

A     AC
AC    A
ADD REPLY
1
Entering edit mode

In the GT field of VCF file, GT[0] always indicates - Ref allele and GT[1] indicates Alternate allele. Am I right?

Yes, but you have to be aware that the 'alternate' allele is with respect to the reference genome, and there is no ideal reference genome ( see A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy. )

q1) But may I know what is homvar/homalt which is 1/1? How does Ref and ALT alleles look like for them? can you give an example?

Where have you seen these terms? - homvar and homalt probably relate to the same thing, i.e., 1/1, i.e., a homozygous variant call. Any source quoting them should state which is the ref and alt allele.

q2) Similarly, what is 1/2 or 2/1 called?

Multi-allelic site, where 2 alternate bases have been identified. This can occur in multi-sample studies or in disease conditions, like cancer. In a VCF, you'd see these listed like this:

chr4   12345   G   T,C   1/2

So, this individual has genotype TC

q3) Am I right to write Ref and ALT like below for 1/2 or 2/1?

No - incorrect. See answer for q2

ADD REPLY
1
Entering edit mode
7.1 years ago

see http://www.internationalgenome.org/wiki/Analysis/vcf4.0

As with the INFO field, there are several common, reserved keywords that are standards across the community:

GT genotype, encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:

> / : genotype unphased 
> | : genotype phased
ADD COMMENT

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6