Question

A simple question regarding homozygous and heterozygous variation.

0

Entering edit mode

9.6 years ago

mangfu100 ▴ 800

Hi.

I have a simple question regarding homozygous and heterozygous variation.

We are using a reference sequence such as hg19 from UCSC or NCBI, and we can easily see the whole range of human sequence(from chr1 to chr22 with chrX , chrY) at nucleotide level resolution due to the development of next-generation sequencing.

Here, I wonder and hope somebody solve my question.

We know that chromosomes are two pairs that come from our parent, and homozygous concept is that having identical copies of the gene at the same location(e.g. the genotype is AA or aa) whereas heterozygous has different alleles occupying the gene's positions like Aa or aA.

At this point, as I mentioned above, we can identify just only one chromosome location in two pairs using reference sequence. To better understand, supposing that you are using reference sequencing hg19, and you want to go to specific position which is chr2 5500. So you just click the chromosome icon and drag to chr2 and click while going to 5500 position. In this process you don't need to select which pairs you want to search for. There is only one chromosome in reference sequencing, NOT two pairs. Why the reference sequence has just only one of pairs? or Is there anything that I make a mistake?

and I have another question regarding above similar concept.

Why the homozygous deletion events is measure of grouping the reads?

To better explain, I show you part of the paper "CREST maps somatic structural variation in cancer genome with base-pair resolution".

below content is part of it and my second question arises in bold sentence.

Many putative SV breakpoints have soft-clipped reads as well as wild-type reads because SVs usually occur either in a subset of tumors owing to tumor heterogeneity and/or are heterozygous event. Therefore, with the exception for homozygous deletion events, there are usually two groups of reads at a putative breakpoint.

(1) Many putative SV breakpoints have soft-clipped reads as well as wild-type reads because SVs usually occur either in a subset of tumors owing to tumor heterogeneity and/or are heterozygous event.

-> because of heterozygous event, one of pair is likely to be a normal whereas the other has potentially structural variaiont. so aligned read can be categorized into two subgroups which are mapped to normal or are mapped to structural variation range. Is it right?

(2) Therefore, with the exception for homozygous deletion events, there are usually two groups of reads at a putative breakpoint.

-> In the point of view above, I can understand why usually two groups of reads exists because of heterozygous. However, i didn't figure out first sentence. Why only the homozygous deletion is excepted?

There are various homozygous variations from insertion to translocation and i think that all the reads should be formed in one group..because it's homozygous ! each chromosome pair is identical. So why the author pointed the homozygous deletion only? is there other aspects that I didn't capture?

sequence alignment next-gen • 4.2k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by mangfu100 ▴ 800

score 3 · Accepted Answer · 2014-09-25

0. Why is only one copy of each chromosome present in a reference rather than two?

Firstly, some species (especially plants) have more than two copies of each chromosome (google "ploidy"). Secondly, a reference sequence is just a convenient dataset to compare things against and search. We could also create the personal genomes of everyone we sequence and give coordinates relative to that, but that'd just cause excess confusion and waste a lot of peoples time. Secondly, imagine trying to determine where a feature was in a reference genome where all chromosome copies are present and modified to match their actual sequence. You just want context with things like this. Further, remember that things like the human genome are an amalgam of multiple people and don't even represent an average human.

Now having said that, including multiple copies is sometimes useful. Heng Li has been putting in some considerable effort into getting bwa to deal with alternate human haplotypes in a good way and this is likely to pay dividends in terms of variant calling reliability.

1. because of heterozygous event, one of pair is likely to be a normal whereas the other has potentially structural variaiont. so aligned read can be categorized into two subgroups which are mapped to normal or are mapped to structural variation range. Is it right?

Yes, that's right.

2. So why the author pointed the homozygous deletion only? is there other aspects that i didn't capture?

It's the same for any homozygous event. BTW, this can also occur when coverage isn't high enough.