Genotyping, genotype calling or SNP calling?
1
6
Entering edit mode
3.7 years ago
James Reeve ▴ 120

Hi, I'm new to bioinformatics and am getting confused about names for things. I want to know the name for when you map reads to a reference genome to find SNPs. Would this be "genotyping", "genotype calling" or "SNP calling"?

More generally, are these terms different? If you know of any other similar terms I would like to hear them.

SNP alignment • 6.3k views
19
Entering edit mode
3.7 years ago

Hey James, this is bioinformatics and there are multiple names for absolutely everything. Even 'bioinformatician' has multiple synonyms, such as:

• Computational biologist
• Systems biologist
• Bio-informaticist
• etc.

# Genotyping

If you use the word 'genotyping', the first thing that will come to people's heads is the use of a microarray that aims to determine the genotype at predetermined loci. Generally, for each locus, two probes will target the same position and the central base in the probe will be complementary to the expected base at the position of interest. If the sample is homozygous at the locus, then only one probe will bind and fluoresce; if the sample is heterozygous, then both probes will bind (to the maternal and paternal chromosomes) and flouresce. From the fluorescent intensities, we can infer the genotype.

For example, take a look at this position, where we have 2 probes, AGG[T]CAG and AGG[C]CAG

    AGG[T]CAG      - probe
AGGTTCC[A]GTCAGAAC - target sequence (maternal chromosome)

AGG[C]CAG      - probe
AGGTTCC[G]GTCAGAAC - target sequence (paternal chromosome)


Both probes bind because the individual has the heterozygous genotype, AG, and this will be reflected by both probes giving off a fluorescent signal, thus allowing us to infer the genotype.

# Genotype calling

This more relates to the act of determining the genotype through the examination of the relevant fluorescent intensities during the process of genotyping (above). There are specific genotype calling algorithms, such as 'birdseed', that analyse the fluorescent intensities and make a genotype call. An example below is a plot of the fluorescent intensities for a single sample across all genotyped positions that consist of [A/T] probes. The 3 'arms' that you see relate to (from the top clockwise):

• Homozygous allele 1
• Heterozygous
• Homozygous allele 2

As you can clearly see, there are many positions 'caught' in between these arms, and these are more difficult to call correctly. This figure is from a very old dataset that I have.

# SNP calling

For me this is the same as genotype calling.

## -------------------------------------------------------------------

All of this (above) may be entirely irrelevant to you because it sounds like you have next generation sequencing data. I would not readily use the term SNP when performing re-sequencing (re-sequencing is aligning your next generation sequence reads to an existing reference genome and then performing variant calling).

Note the following key distinctions:

# Single nucleotide polymorphism (SNP)

A base-position whose genotype varies across individuals and populations as part of 'natural' variation.

Note that, in relation to SNPs, we usually also speak of allele frequencies. For example, the same SNP position can have alleles that exhibit different frequencies in different populations. The 1000 Genomes Project and International HapMap did much work in these areas in order to find most SNPs in the human genome. Databases like dbSNP are still actually catching up and are constantly updating their databases to account for all of the information produced by these large-scale projects.

Also note that it is currently a matter of debate about the importance of SNPs in relation to disease. The SNP genotype at many loci undoubtedly affects chromatin structure and the binding of protein and RNA species at the locus, thus, one could infer a role in disease. I write more in my review: Gene editing in the context of an increasingly complex genome..

Finally, be acutely aware that people traditionally only considered SNPs as those variants having allele frequencies greater than 1% or 5% in the population of interest, and thus they are sometimes referred to as 'common variants'. If its frequency falls below these, we typically assign the term 'rare variant'. It is a mistake by some in the research community to assume that it is only rare variants that contribute to disease phenotypes - see Rare and Common Variants: Twenty arguments .

# Single nucleotide variant (SNV)

A variant encountered during sequence re-alignment that may or may not be a known SNP.

If you do an exome seq experiment, most of the SNVs that you find will be private (only found in that individual) and will therefore be 'unknown' to the research community, and obviously it would be inappropriate to call these SNPs without further investigation and categorisation.

0
Entering edit mode

Wouldn't you consider the notation given by current SNP callers [0/1,1/1,2/1...] as genotyping too?

0
Entering edit mode

Yes, of course. Encoding can be anything. There are a few different ways of representing genotypes