Question: Basics about sequencing/alignment and variant calling
0
gravatar for ernestrv0101
20 months ago by
USA
ernestrv01010 wrote:

I am new on bioinformatics, and I am not biologist. I have a basic questions about sequencing/alignment and variant calling. For example in humans, I understand that the DNA mutates all time, so during the process of sequencing, for one specific locus, in theory we only expect two variations (diploid) but they can be more because sequencing errors, indels, dels etc (these, are discarded during the variant calling process).

It can occurs that the human dna contain more than 2 alleles because some cells contain a mutation/variation and others not? In an haploid case, can we expect alleles?

When we obtain the 'consensus' from an alignment of all the reads from a single individual. It takes the most common variations for each position? So for example in a diploid organism, with one single consensus we loose a lot of information (all the alleles,in case they are heterozygotic)

Thanks!

ADD COMMENTlink modified 20 months ago by d-cameron2.1k • written 20 months ago by ernestrv01010

It can occurs that the human dna contain more than 2 alleles because some cells contain a mutation/variation and others not? In an haploid case, can we expect alleles?

Yes but most of case it is mosaic cells (sub-population of cells) which have low coverage.

When we obtain the 'consensus' from an alignment of all the reads from a single individual. It takes the most common variations for each position?

I most of case the information is contained in SNP database about the organism you are studying.

ADD REPLYlink written 20 months ago by Titus900
0
gravatar for d-cameron
20 months ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

When we obtain the 'consensus' from an alignment of all the reads from a single individual. It takes the most common variations for each position? So for example in a diploid organism, with one single consensus we loose a lot of information (all the alleles,in case they are heterozygotic)

Variant callers do not attempt to take a single consensus sequence, they attempt to report all alleles that are present. In the case of germline diploid samples, a variant caller will report up to two non-reference alleles at a given position (e.g. reference genome has G at that position and the sample is heterozygous C/G).

Reducing to a single consensus of a diploid organism is generally only ever done when creating a reference genome for that organism.

indels, dels etc (these, are discarded during the variant calling process).

Insertions and deletions are valid alleles and (good) variant callers do not discard them.

ADD COMMENTlink modified 20 months ago • written 20 months ago by d-cameron2.1k

Thanks for the clarification, d-cameron. So when you said:

is generally only ever done when creating a reference genome for that organism.

you mean when creating a reference assembly like:

Reference Assembly - Mapping Reads To A Reference Genome

So in that case, it just takes the most common variations for each position to create a single consensus and for it they first do a variant calling.

ADD REPLYlink written 20 months ago by ernestrv01010
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1969 users visited in the last hour