GetHapIdsForTaxonPlugin method question
Entering edit mode
4 months ago
kathrynm • 0


I am trying to extract SNPs for the parental assemblies from my DB.

Here is the docker command I am using:

sudo docker run --name hapPaths_assemblies --rm \
        -v /home/kathrynmichel/KaepplerDeLeon-T330-Mount/Assemblies/PHG/Nov2020_B73v5/:/phg/ \
        -t maizegenetics/phg:latest \
        /tassel-5-standalone/ -debug -Xmx100G \
        -GetHapIdsForTaxonPlugin -configFile /phg/config.txt \
        -taxaList /phg/assemblyTaxa.txt \
        -outputDir /phg/hapPaths/ -methods CONSENSUS_mxDiv0.0001 -endPlugin >logs/getHapPaths_mummer4_7.log

The assemblyTaxa.txt file contains two extra progeny lines for the sake of testing.

When I run the method as either mummer4 or CONSENSUS_mxDiv0.0001, I get this output:

Memory Settings: -Xms512m -Xmx100G
Tassel Pipeline Arguments: -debug -GetHapIdsForTaxonPlugin -configFile /phg/config.txt -taxaList /phg/assemblyTaxa.txt -outputDir /phg/hapPaths/ -methods CONSENSUS_mxDiv0.0001 -endPlugin
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.64  Date: July 9, 2020
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 91022 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_212
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 24
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -GetHapIdsForTaxonPlugin, -configFile, /phg/config.txt, -taxaList, /phg/assemblyTaxa.txt, -outputDir, /phg/hapPaths/, -methods, CONSENSUS_mxDiv0.0001, -endPlugin, -runfork1]
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin: time: Nov 23, 2020 19:11:12
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
GetHapIdsForTaxonPlugin Parameters
configFile: /phg/config.txt
taxaList: [B73_Assembly,B84_assembly,LH145_assembly,NKH8431_assembly,PHB47_assembly,PHJ40_assembly,W10004_0084,W10004_0010]
outputDir: /phg/hapPaths/
methods: CONSENSUS_mxDiv0.0001

[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = /phg/SSphgDB_B73v5 host: user: kathryn type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:/phg/SSphgDB_B73v5
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:  

[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: query statement: select reference_ranges.ref_range_id, chrom, range_start, range_end, from reference_ranges  INNER JOIN ref_range_ref_range_method on ref_range_ref_range_method.ref_range_id=reference_ranges.ref_range_id  INNER JOIN methods on ref_range_ref_range_method.method_id = methods.method_id  AND methods.method_type = 7 ORDER BY reference_ranges.ref_range_id
methods size: 1
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: number of reference ranges: 71354
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - referenceRangesAsMap: time: 1.042750554 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: query statement: SELECT gamete_haplotypes.gamete_grp_id, genotypes.line_name FROM gamete_haplotypes INNER JOIN gametes ON gamete_haplotypes.gameteid = gametes.gameteid INNER JOIN genotypes on gametes.genoid = genotypes.genoid ORDER BY gamete_haplotypes.gamete_grp_id;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: number of taxa lists: 64
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - taxaListMap: time: 0.005584256 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: haplotype method: CONSENSUS_mxDiv0.0001 range group method: null
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: query statement: SELECT haplotypes_id, gamete_grp_id, haplotypes.ref_range_id, sequence, seq_hash, seq_len FROM haplotypes WHERE method_id = 5;
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of nodes: 206856
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - addNodes: number of reference ranges: 71354
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createHaplotypeNodes: time: 9.964170025 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph edges: created when requested  number of nodes: 206856  number of reference ranges: 71354
[pool-1-thread-1] DEBUG net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin - Filter graph on taxaList.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: creating edges from nodes.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.CreateGraphUtils - createEdges: time: 3.1329E-5 secs.
[pool-1-thread-1] INFO net.maizegenetics.pangenome.api.HaplotypeGraph - Created graph number of edges: 0  number of nodes: 0  number of reference ranges: 0
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin - GetHapIdsForTaxonPlugin: finished in 11.79106549 seconds
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin: time: Nov 23, 2020 19:11:24
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin: time: Nov 23, 2020 19:11:24: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.pangenome.hapCalling.GetHapIdsForTaxonPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.

Thoughts? Zack mentioned ImportHaplotypePathFilePlugin + PathsToVCF are the next steps after this... Is there a "new" version way to accomplish these steps? Thank you!

phg • 189 views
Entering edit mode

For future reference, from Peter:

"The taxa list file has to be one taxon per line, no commas. The comma-separated list is correct on the command line, but the file has to be one taxon per line no comma." Also, save it as a .txt file.

Entering edit mode
4 months ago
lcj34 ▴ 80

Hi Kathryn - Is the issue that you got no data as a result of the GetHapIdsForTaxonPlugin ? The logging shows data in the graph prior to filterin for taxa, but looks like there was nothing left in the graph after filtering for the specified taxa.

Can you verify your db has haplotypes created with the method you specified? I'm guessing method mummer4 would match with your *_assembly methods.

Does your PHG genotypes table contain line_name values that match those specified in your taxa list file (perhaps case sensitive issue?). B73_Assembly,B84_assembly,LH145_assembly,NKH8431_assembly,PHB47_assembly,PHJ40_assembly,W10004_0084,W10004_0010

With which method name were the W10004_0084 and W10004_0010 lines added?

Entering edit mode

Correct, I'm not getting any data from GetHapIdsForTaxonPlugin.

I have looked at the number of haplotypes through the R api for this consensus method. The population progeny were added with this consensus method and I have successfully generated a vcf for the progeny population from fastqs.
According to the line_names from the genotypes table, the spelling/cases are correct.

I just now tried using only the Assembly names with either mummer4 or the consensus method, and got the same issue of no nodes. I also tried a separate list of three different progeny lines with consensus and had no luck.

Entering edit mode
4 months ago
pjb39 ▴ 60

GetHapIdsForTaxonPlugin should have returned some Haplotype ids as the above answer suggests. I do not know why that did not work. If what you want is a VCF rather than haplotype id lists, then you need to use PathsToVCF. To export a VCF for assembly (or WGS) haplotypes, you have to build a HaplotypeGraph (using HaplotypeGraphBuilderPlugin) then provide that as input to PathsToVCF. To export SNPs from imputed paths, use ImportHaplotypePathFilePlugin + PathsToVCF. Until recently, you could not include both in a single VCF export. The latest PHG build includes the AddPathsToGraphPlugin. So another possible route is to combine HaplotypeGraphBuilderPlugin + AddPathsToGraphPlugin + PathsToVCFPlugin. If you need the specific commands to run, then I would need to know which of those options you want to use.

Entering edit mode

I would like a VCF for the 6 parental lines, which were loaded in as assemblies (including B73- I previously used AddRefRangeAsAssembly). If you could write the commands, that would be great!

Entering edit mode

Adding to this issue- In my original call to ImputePipelinePlugin, I did not specify refRegionGroup, only pathHaplotypeMethod=CONSENSUS_mxDiv0.0001. I got 17 million SNPs, which is too many for our machine to handle.

I tried to run ImputePipelinePlugin again with pathHaplotypeMethod=CONSENSUS_mxDiv0.0001,refRegionGroup (using the same read method and path method) but got similar messages about 0 nodes and ranges. Am I missing something bigger here?

Edit: After changing the pathMethod to something unique, I got an error that the filtered graph has no nodes.

Entering edit mode

The following command should work: sudo docker run --name hapPaths_assemblies --rm \ -v /home/kathrynmichel/KaepplerDeLeon-T330-Mount/Assemblies/PHG/Nov2020_B73v5/:/phg/ \ -t maizegenetics/phg:latest \ /tassel-5-standalone/ -debug -Xmx100G -configParameters /phg/config.txt \ -HaplotypeGraphBuilderPlugin -methods CONSENSUS_mxDiv0.0001 -includeVariantContexts -endPlugin \ -FilterGraphPlugin -taxaList /phg/assemblyTaxa.txt -endPlugin \ -PathsToVCFPlugin -outputFile /phg/hapPaths/pathsVcf.vcf -endPlugin

change -methods CONSENSUS_mxDiv0.0001 to CONSENSUS_mxDiv0.0001,refRegionGroup. Alternatively, you can add HaplotypeGraphBuilderPlugin.methods=CONSENSUS_mxDiv0.0001,refRegionGroup. The FilterGraphPlugin is only needed if you want to limit the vcf to specific taxa. Without that all of the assembly taxa will be returned. I would expect using only refRegionGroup will give you less than 2 million SNPs but that is just a guess.

Entering edit mode

Getting the paths for the assemblies failed- I'll send you the log file.

Somewhere along the way, my reference ranges became named "genic" instead of "refRegionGroup", so it is working now. Thanks!

Entering edit mode

If the only change you made was changing pathHaplotypeMethod from CONSENSUS_mxDiv0.0001 to CONSENSUS_mxDiv0.0001,refRegionGroup that suggests that refRegionGroup is not a valid referenceRange group name. If that does not make sense, you can send me the log file. Something else in the log may indicate what is happening. You can check the method names in your database using rPHG (or an SQL query if you are comfortable doing that).


Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6