Question

Large sample sizes in Lositan

0

Entering edit mode

9.2 years ago

Gary Longo • 0

Hi Tiago and/or other Lositan experts,

I have a data set of 15,982 markers and 62 individuals across 4 populations. I downloaded the large sample version of Lositan and successfully loaded the genepop file into the program on my Mac. However when I run the program it crashes with the cyan message bar "simulation pass to determine initial neutral set". The java console printed this error:

Java Web Start 10.75.2.13
Using JRE version 1.7.0_75-b13 Java HotSpot(TM) 64-Bit Server VM
User home directory = /Users/garylongo
----------------------------------------------------
c:   clear console window
f:   finalize objects on finalization queue
g:   garbage collect
h:   display this help message
m:   print memory usage
o:   trigger logging
p:   reload proxy configuration
q:   hide console
r:   reload policy configuration
s:   dump system and deployment properties
t:   dump thread list
v:   dump thread stack
0-5: set trace level to <n>
----------------------------------------------------
Missing Application-Name manifest attribute for: http://popgen.net/soft/lositan/code2/lib/selwb.jar
Missing Permissions manifest attribute in main jar: http://popgen.net/soft/lositan/code2/lib/selwb.jar
Mac OS X /Users/garylongo /
15982
0.170016 34
Unhandled exception in thread started by <bound method SplitFDist.monitor of <Bio.PopGen.FDist.Async.SplitFDist object at 0x19a>>
Traceback (most recent call last):
  File "/Users/garylongo/.lositan/Bio/PopGen/FDist/Async.py", line 128, in monitor
    self.report_fun(fst)
  File "/Users/garylongo/.lositan/Main.py", line 548, in report
    selLoci = getSelLoci(pv)
  File "/Users/garylongo/.lositan/Main.py", line 418, in getSelLoci
    p = getP(pv[currPos])
IndexError: index out of range: 0

After some trial and error of reducing the data set size, I did get the program to run to completion when I reduced the data set to 5,002 markers. I see that the program should be able to handle 40,000 markers on Mac and 100,000 on PCs. I'm assuming this number is a product or function of sample size and not strictly loci number. Is this true? I have seen other posts that suggest reducing the size of the data set or to run separate analyzes but I would like to run the complete data set in a single run in order to analyze all my data and to avoid skewing Fst calculations in separate runs. Any suggestions would be greatly appreciated.

Thanks for your time!

Cheers,
Gary

Lositan • 2.6k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Gary Longo • 0

0

Entering edit mode

I don't have experience with Lositan, but I have a simple and effective Fst (weir and Cockerham) for two populations. It is written in C++ and works directly from a VCF file.

https://github.com/jewmanchue/vcflib/wiki/Association-testing-with-GPAT

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Zev.Kronenberg 12k