Question

Is It Possible To Infer Population Genetics Parameters Like Ne Using De-Novo Sequencing Data Of Pooled Samples?

10

Entering edit mode

14.7 years ago

Lhl ▴ 760

Hi there,

I have used 454 GFLX to (de-novo) sequence two plant ecotypes (two divergent populations which adapted to each of their own habitats) by polling 16 individuals from each ecotype. To date, i have finished assembling, SNPs and Indel detection. And i also calculate population parameters like Watterson's Theta (θ = 4Neμ), Pi (which is expected to be equal to theta under neutral equilibrium). However, i am not sure whether it is possible to inferring some other parameters like Ne (effective population size), divergence time of the two ecotypes.

By the way, i would like to know how do you identify SNP outliers for you data if you have done or doing the same thing. Is it good to use a Fst based approach or Fisher exact test?

Elzed

sequencing population analysis • 11k views

ADD COMMENT • link updated 11.9 years ago by Giovanni M Dall'Olio 28k • written 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

What system did you use to detect SNPs and INDELs in your dataset?

ADD REPLY • link 13.9 years ago by Erik Garrison ★ 2.4k

0

Entering edit mode

Sorry fot the late reply. By system, do you mean softwares? I tried Mosaik && BWA-SW + Samtools to do alignment and SNP calling.

ADD REPLY • link 13.7 years ago by Lhl ▴ 760

Ram · Answer 1 · 2010-11-16

4

Entering edit mode

14.7 years ago

Casey Bergman 18k

Yes, there is some recent effort to solve on this problem, see Futschik & Schlötterer (2010) Genetics.

EDIT: see associated code base at PoPOOLation (Hat tip to RaghuM's answer on this related thread)

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 14.7 years ago by Casey Bergman 18k

4

Entering edit mode

As you probably are aware, under the standard neutral model you can infer Ne from theta if you assume a mutation rate. You'd have to dig deeper or contact the authors about more complex demographic scenarios. You may want to post your question to evoldir (http://evol.mcmaster.ca/evoldir.html) for a more community-specific response to this question.

ADD REPLY • link 14.7 years ago by Casey Bergman 18k

0

Entering edit mode

Yes, thanks. i read the paper. But do you have any ideas about inferring demographic history,like Ne?

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

Thanks a lot, i will try that.

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

Thanks Casey, that's a very cool community.

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

And does that mean i have to identify regions those are under neutral selection? Could i define a neutral region simply based on those having Theta close to 0?

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

And does that mean i have to identify regions those are under neutral selection? Could i define a neutral region simply based on those having Theta close to Pi?

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

Ram · Answer 2 · 2013-08-14

3

Entering edit mode

11.9 years ago

Giovanni M Dall'Olio 28k

The software PSMC can infer how the effective population size of a species has changed over time, using only one single diploid sequence.

PSMC download
PSMC publication (Li, Durbin 2012)
example of PSMC usage (to estimate the history of effective population size in primates)

Estimated history of effective population size in human populations, from Li and Durbin 2012:

image taken from Li, Durbin 2012

ADD COMMENT • link 11.9 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

I am sorry if I just interrupting the topic discussed above.

Can I know how to scale down Y axis (effective population size)?,

The scale generated on my PSMC plot is too big and the changes in effective population size across time was unable to estimate.

ADD REPLY • link updated 5.9 years ago by Ram 45k • written 10.1 years ago by nadiahtohoku • 0

score 2 · Answer 3 · 2010-11-16

2

Entering edit mode

14.7 years ago

David W 4.9k

Have you considered the Extended Bayesian Skyline (Heled and Drummnd 2008, tutorial here).

Presuming you have aligned sequences, you should be able to infer changes in population size (unless you have an estimate of the mutation rate of some of your genes you won't be able to express in it 'real' numbers, but that's not always the goal anyway)

ADD COMMENT • link 14.7 years ago by David W 4.9k

1

Entering edit mode

Just be aware that skyline assumes no recombinations. To counteract this, we should have sufficient number of loci, I think.

ADD REPLY • link 14.7 years ago by lh3 33k

1

Entering edit mode

I don't about ms (isn't that a simulation program?). To do the Bayesian analysis you'll need to give each 'partition' in your data a substitution model (so non-coding seqs probably don't need teh ful GTR for instance)

One of the problems with using massive multi-loci datasets in this sort of anaylysis is deciding what a partition is. Is tempting to set each locus as one partition, but that can be a PITA computationally and probably over-fits the data. (I don't have to solution to that problem by the way, just a warning ;)

ADD REPLY • link 14.7 years ago by David W 4.9k

0

Entering edit mode

And should i discriminate between coding and non-coding region when using this software?

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

Thanks. That is a good point. However,should i discriminate between coding and non-coding regions when processing my datasets?

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

And do you think it is possible to us ms to solve the same problem?

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

0

Entering edit mode

Thanks David. ms is a coalescent simulation software created by Richard R. Hudson at the University of Chicago. It is available at https://webshare.uchicago.edu/xythoswfs/webui/users/rhudson1/Public/ms.folder?action=frameset&subaction=print&uniq=yzld0b&stk=2B23BE1D462EA92

ADD REPLY • link 14.7 years ago by Lhl ▴ 760

score 1 · Answer 4 · 2011-02-03

1

Entering edit mode

14.5 years ago

Paolo Gratton ▴ 10

Hello!

I have stepped into this thread, which is really interesting. It seems to me that nobody mentioned what looks to me a very important matter. Elzed's data are from 16 pooled individuals, without tagging, right? Is it possible to retrieve the true frequency of each haplotype in each sample? If it is not, how is it possible to use coalescent based algorithms like Beast - EBSP?

I hope this thread is still active, since I guess I am not grasping something and I would really like to know what.

Paolo

ADD COMMENT • link 14.5 years ago by Paolo Gratton ▴ 10

0

Entering edit mode

Thanks for your interests in this topic, Paolo. And i am sorry this late reply because of my travelling to another place out of my own country. I have two pools, with each of them consists of 16 individuals. Each pool has a unique tag. Futschik and Schlötterer (2011) proposed a method to estimate population genetics parameters. http://www.genetics.org/cgi/content/full/186/1/207

I would like to continue our discussion over this rub.

ADD REPLY • link 14.4 years ago by Lhl ▴ 760