Question: Why Are Monomorphic Loci Excluded From Analysis?
3
gravatar for 714
6.4 years ago by
71490
71490 wrote:

As the title suggest, I was wondering why it is a good idea to exclude monomorphic loci from SNP analysis. How would including them affect a PCA plot for example?

pca snp • 10k views
ADD COMMENTlink modified 6.4 years ago by blacktomato2750 • written 6.4 years ago by 71490
6
gravatar for Fabio Marroni
6.4 years ago by
Fabio Marroni2.4k
Italy
Fabio Marroni2.4k wrote:

In my understanding, monomorphic means something that appears in just one state (or form), in contrast to polymorphic that means something that appears in more than one form. SNPs are by definition polymorphic. A monomorphic site is one site in which all the individuals have the same form (genotype). It is a good idea to exclude it from analysis because it gives no information. Please, note that you implicitly always exclude from analysis the majority of the 3 billion positions of the human genome for which you find no variation.

ADD COMMENTlink written 6.4 years ago by Fabio Marroni2.4k
1

Would there be any harm in keeping monomorphic loci in the dataset given that they do not seem to contribute to any of the variation that we might see?

ADD REPLYlink written 6.4 years ago by 71490
4

As Josh already said, it does no harm in terms of results (they are uninformative), but it wastes computer time.

ADD REPLYlink written 6.4 years ago by Fabio Marroni2.4k
3
gravatar for Josh Herr
6.4 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

You would inflate your SNP numbers and misrepresent your data.

How would you differentiate between a sequencing error, one-off single mutation or transcription error, and a bona-fide SNP? SNPs are found across individuals in a population -- monomorphic loci represent one individual's nucleotide state and may be the result of errors across numerous levels. When you see a SNP in multiple individuals you can infer it is not from sequencing error or a mutation found in a single individual.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Josh Herr5.7k

That makes some sense, howeforver I'm afraid I don't quite understand all of it. For example, if 100 individuals were gentyped at loci A-D and all were homozygous C/C at locus A, then why would one exclude locus A from the dataset and subsequently, analysis?

ADD REPLYlink written 6.4 years ago by 71490
1

In your example, locus A would be not informative and it would be pointless to leave that nucleotide alignment position in the analysis -- it would provide you with no information and would also waste compute time (meh, probably negligible). You would want to remove uninformative characters -- this would include non-variable sites as well as monomorphic sites (one "mutation" and not a SNP) or highly variable sites.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Josh Herr5.7k
0
gravatar for blacktomato27
6.4 years ago by
United States
blacktomato2750 wrote:

Hi, How to consider heterozygous allelic state of parents in polymorphism analysis, for example SNP1 SNP2 SNP3 p1 AA AT AA p2 AA AA TT here i want to see polymorphism between p1 and p2, This is my expected results SNP1 SNP2 SNP3 p1 mono ? poly p2
Thanks in advance

ADD COMMENTlink written 6.4 years ago by blacktomato2750

I don't quite understand your question? (This isn't an answer by the way, so it should be placed as an additional question in a new thread). Are you asking how to differentiate between heterozygosity and SNP polymorphisms?

ADD REPLYlink written 6.4 years ago by Josh Herr5.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1408 users visited in the last hour