Question about non-polymorphic site and flanking sequence.
2
1
Entering edit mode
9.4 years ago
mangfu100 ▴ 800

Hi

I have studied genetic variation and something ambiguous terms confused me.

The below paragraphs contains the words that make me confused.

I search for those word on the Internet but I didn't get the potential meaning of them. Particular, flanking sequence is very hard concept for me. I often saw that term in fusion-concept. but I cannot fully understand.

Could you explain those two term easily to me?

Genetic variants that are mapped to segmental duplications are most likely sequence alignment errors and should be treated with extreme caution. Sometimes they manifest as SNPs with high fold coverage and probably high confidence score, but they may actually represent two non-polymorphic sites in the genomes that happen to have the same flanking sequence. To identify variants in these regions, use the command below.

sequencing alignment gene • 4.0k views
ADD COMMENT
2
Entering edit mode

Are you aware what segmetal duplications are? The are just like genomic repeats, except with much, much less copies per genome. And just as you can't be sure about variants in repeats, you can't be sure about variants in segmental duplications ("should be treated with extreme caution.").

"Flanking sequence" - sequences downstream/upstream from your site/variant.

ADD REPLY
6
Entering edit mode
9.4 years ago

It's probably best to explain this with an example. When we described the discovery of KCNJ18 and mutations in it causative for a periodic paralytic disorder we relied in part on the bolded parts of that quote. While we did that using Sanger sequencing, the same principles apply to NGS data.

Suppose you're doing whole exome sequencing of a human sample, presumably looking for mutations. If you aligned against the hg19 reference genome and then looked at the KCNJ12 gene in your sample, you would notice a number of apparently very high-coverage heterozygous mutations. So since the coverage there is really high, you have to wonder if they're real or if this is some sort of artifact. Well, it turns out that long ago (well, very recently in evolutionary time) there was a segmental duplication event in this region (giving rise to KCNJ18). The duplicated region then developed its own variants, some of which became fixed in the population. Because this duplicated region doesn't exist in the hg19 reference, you'll see apparent variants that aren't actually real...simply because the reference lacks the duplication and therefore the reads couldn't align without variants to that region. In other words, you've called SNPs at sites where they don't exist (i.e., they're non-polymorphic, where polymorphic means "able to take on more than one state") due to the region to which the reads align having near-identical sequence (i.e., the flanking or surrounding sequence is the same).

Segmental duplications are a tough thing to deal with in genome assemblies. They don't need to be as evolutionarily fixed as the one that contains KCNJ18, so you can see differences due to this between individuals (or different animal/plant samples). It's times like this when it's good to have sequenced a number of normal samples as well, in which case you'd notice the deviation from Hardy-Weinberg equilibrium.

ADD COMMENT
0
Entering edit mode

Thanks.

I fully understand concept

ADD REPLY
0
Entering edit mode

KCNJ12 in second paragraph line 2 , I think, is KCNJ18 not 12.

Right?

ADD REPLY
0
Entering edit mode

KCNJ12 is correct. hg19 didn't have KCNJ18, though hg38 does. KCNJ18 is in the segmental duplication in this case (there are actually a lot of them in the pericentromeric regions).

ADD REPLY
2
Entering edit mode
9.4 years ago
Denise CS ★ 5.2k

'Flank' can mean the side (to the right or left) of something. Flanking sequence in the paragraph above refers to the sequence on the side (either upstream, downstream or both) of the non-polymorphic sites. When I think of flanking sequence this image is what comes to my mind. The two non-polymorphic sites are the positions in the genomes that did not differ between the individuals analysed.

ADD COMMENT

Login before adding your answer.

Traffic: 3058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6