Question

Analysis Of Low-Complexity Regions

4

Entering edit mode

12.6 years ago

Khader Shameer 18k

Following our exome analysis, we found a missense variant in a low-complexity region (LCR) in a protein. There is no domains or motifs exist in the sequence of the protein. Is there any further biological inference that I can get about LCRs using bioinformatics approaches ?

sequence • 6.0k views

ADD COMMENT • link updated 12.6 years ago by Larry_Parnell 16k • written 12.6 years ago by Khader Shameer 18k

0

Entering edit mode

Have you applied the usual variant effect predictors like SIFT, polyphen, etc.?

ADD REPLY • link 12.6 years ago by Sean Davis 26k

0

Entering edit mode

@Sean: Yes I did, this particular variant is missense based on PolyPhen.

ADD REPLY • link 12.6 years ago by Khader Shameer 18k

score 2 · Answer 1 · 2011-09-12

2

Entering edit mode

12.6 years ago

Lyco ★ 2.3k

I assume that you hope to attach some biological meaning to your variant, despite the fact that it is part of a compositionally biased region. As you certainly know (and Larry has mentioned), the majority of such protein regions are not particularly important. Antigenicity might be an issue, but probably only for somatic mutations (?)

One possibility that you might want to investigate is a short linear motif hidden in the compositionally biased region. Many such motifs like to hide out in natively unfolded regions and might well be associated with a compositional bias. what I would recommend to do is to find a reasonable set of orthologs from species ad various range of evolutionary distances. Then, subject the sequences to a suitable multiple alignment method (probcons or MAFFT/L-INS-I work reasonably well) and see if there is a suspicious conservation of the variant residues or its immediate neighbors. Usually, compositionally biased regions align poorly, but if there is a hidden motif it can often be uncovered by this method. Alternatively, you might check alignment-free motif detection methods like e.g. Gibbs sampling.

Nevertheless, chances are slim and it is much more likely that your variant is just due to the fact that compositionally biased regions tend to be more polymorphic than structured domains.

ADD COMMENT • link 12.6 years ago by Lyco ★ 2.3k

0

Entering edit mode

Thanks Lycos. The gene/protein is specific to metazoan lineage. I did an alignment using some of the orthologs and noticed that the region is highly conserved in primates.

ADD REPLY • link 12.6 years ago by Khader Shameer 18k

0

Entering edit mode

That is a good start. On the other hand, most protein regions are 'highly conserved in primates', just because there hasn't been enough time for accumulating mutations. In my opinion, a lineage-specific high conservation is meaningful only if e.g. the flanking regions have a lower conservation.

ADD REPLY • link 12.6 years ago by Lyco ★ 2.3k

0

Entering edit mode

Thanks for that tip. Here flanking regions are also conserved. So now investigating whether my mutation could be part of a novel domain or eukaryotic linear motifs.

ADD REPLY • link 12.6 years ago by Khader Shameer 18k

score 1 · Answer 2 · 2011-09-09

One thing that comes to mind is a hinge region between segments of secondary structure. So, two tests that one could try are for hydrophobicity/hydrophilicity and for antigenicity. You will have to search for available antigenicity assessment tools as this is something I did many years ago. (Added in edit: To clarify, the antigenicity test can give you some idea of residue exposure and if there is an allele-specific change in that prediction, that can be meaningful and even interpreted as a change in structure. Predictions of random coil can also be used in this regard.)

I assume you have checked for motifs for both alleles of the variant. You can check the DNA for motifs, such as an ESE, exon splice enhancer. ESEs generally map in exons (of course) very near the splice site, but need not. I would also check for microRNA interactions as the variant allele could produce (or abrogate) an mRNA-miRNA interaction. I've written before on BioStar about Brest's paper on IRGM and Crohn's disease with an allele-specific miRNA interaction driving disease risk phenotypes.

If I think of other tests to perform, I'll edit my response. In general, you may want to consider the alleles from the DNA side as well as from the protein side.