How to define taxa subsets in phg (new version)
9 months ago
dovi ▴ 60

Hi everyone,

I try to understand the PHG pipeline, however I have a hard time to see where can I define the taxa subset that I want to use to impute my skim sequences in the new version.

For example, in the paper where the PHG was presented ( ), they used two datasets, one with the 24 founders, and another one with a diversity panel (which included the 24 founders). How I understood the workflow was the following: One imports all haplotypes into the database (24 founders + diversity), and then it is in the downstream steps, where one chooses which taxa to use to impute the skim sequences (all, just the 24, or any defined set of names).

In the older versions of PHG (< 0.0.20 ) I thought that this could be done by giving the list of names in the parameter "taxa" of the config file (as seen in ). However, in the new PHG version > 0.0.20, I no longer see that the "taxa" list can be given in the configuration file. Therefore I wonder how is it possible to define the taxa subset to use for imputation in the latest version, or whether I just simply understood that concept wrong.


phg • 248 views

