What is the expected number of common/mutual/shared SNVs between two, three, four... unrelated individuals?
1
0
Entering edit mode
9 weeks ago

Hello,

I tried Google but didn't find anything satisfying. I am doing some genetics analyses on a large family and would like to confirm e.g. that the numbers of shared SNVs between two or more samples are reasonable.

The only number I found via Google is that a typical human WGS will have anything between 3 and 20 million SNVs, which isn't very precise...

Thanks in advance!

snv genomics variation snp human • 794 views
ADD COMMENT
0
Entering edit mode

see this response by Paolo Maccallini:

ADD REPLY
0
Entering edit mode

Thanks Jeremy, I am very impressed at how Xwitter posts can be embedded so easily...

Anyway, yes indeed that 4.5M would be the maximum - but what about the average? It should be much lower ja?

ADD REPLY
0
Entering edit mode

We could study this empirically on our 1000 genomes data. PM me and we'll discuss a scoring strategy (defining exactly what shared means)

ADD REPLY
1
Entering edit mode

Interesting! I'd love to discuss, but there's no PM function here, and you don't allow PMs on Xwitter. You can email me at joelwallenius at gmail, for example. :)

ADD REPLY
1
Entering edit mode
9 weeks ago
dthorbur ★ 1.9k

The wide range you saw in the human example just highlights that this isn't really something you can make precise generalisations about.

There are a lot of factors to take into account:

  1. How related are the samples?
  2. How much standing SNV variation is segregating in these populations?
  3. What is the populations' mutation rate?
  4. Recombination rates?
  5. Population stratification?
  6. Population connectivity and immigration?
  7. etc...

Also, what are you defining as shared SNVs? Is this a pairwise comparison, and if so, do all non-reference alleles count as shared even if they are fixed in your populations? Or are you limiting the analysis to only polymorphic sites with at least 2 variants segregating per locus?

ADD COMMENT
0
Entering edit mode

For the sake of my argument, the individuals are completely unrelated, i.e. related on the noise level. So the answers to 1-7 would all be something like "average" or "noise level" or "within one population". Then, after having an estimate of expected shared SNVs, I'd know the number should be higher for cousins or siblings.

It is a pairwise comparison, performed multiple times, so first you'd find SNVs shared by individual 1 and 2, then those with individual 3, etc (the number should decrease with every additional individual inclusion). An SNV is included on only one condition: its difference from hg38. I tried doing some basic compounding probability calculations but dropped it as it can't possibly be that simple. :/

ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6