Load genome intervals parameters

Question

Deleted:PHG Load haplotype and create consensus

1

Entering edit mode

2.5 years ago

지용 ▴ 20

Here, presented my PHG scripts, config, wgs_keyfile.

1. Create valid intervals

docker run --name test_assemblies --rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters /phg/Masterconfig.txt -CreateValidIntervalsFilePlugin -intervalsFile /phg/inputDir/reference/glyma.Wm82.gnm4.ann1.T8TQ.gene_models_main.bed -referenceFasta /phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fixed.fna.gz -mergeOverlaps true -generatedFile /phg/validBedFile.bed -endPlugin &> Log/1.Create_validinterval.txt &

2. Create initial DB

docker run --name create_initial_db --rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters /phg/Masterconfig.txt -MakeInitialPHGDBPipelinePlugin -endPlugin &> Log/2.Create_InitialDB.txt &

3. check plugin update

docker run --name create_directory --rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -debug -configParameters /phg/Masterconfig.txt -CheckDBVersionPlugin -outputDir /phg/outputDir -endPlugin -LiquibaseUpdatePlugin -outputDir /phg/outputDir -endPlugin &> Log/3.Check_Plugin.txt &

4. Load haplotype (In docker)

./CreateHaplotypesFromFastq.groovy -config phg/Masterconfig.txt &> phg/Log/4.CreateHAplotypeFromFastq.txt &

5. Create consensus (In docker)

/tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters Masterconfig.txt -HaplotypeGraphBuilderPlugin -configFile Masterconfig.txt -methods GATK_PIPELINE1:CONSENSUS -includeVariantContexts true -endPlugin -RunHapConsensusPipelinePlugin -referenceFasta /inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fixed.fna.gz -dbConfigFile Masterconfig.txt -collapseMethod CONSENSUS -collapseMethodDetails G.max_test -rankingFile rankingFile.txt -clusteringMode kmer_assembly -isTestMethod true -endPlugin &> phg/Log/5.Concensus.txt &

Config file

### config file.

Anything marked with UNASSIGNED needs to be set for at least one of the steps

If it is marked as OPTIONAL, it will only need to be set if you want to run specific steps.

host=localHost user=sqlite password=sqlite DB=/phg/Gmax430 DBtype=sqlite

Load genome intervals parameters

referenceFasta=/phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fna.gz anchors=/phg/validBedFile.bed genomeData=/phg/inputDir/reference/load_genome_data.txt refServerPath=localHost;/DATA/jysong/ver1.0_phg/ref

liquibase results output directory, general output directory

outputDir=/phg/outputDir liquibaseOutdir=/phg/outputDir

Align WGS fastq files to reference genome parameters

File Directories

gvcfFileDir=/phg/inputDir/loadDB/gvcf/ tempFileDir=/phg/inputDir/loadDB/temp/ filteredBamDir=/phg/inputDir/loadDB/bam/filteredBAMs/ dedupedBamDir=/phg/inputDir/loadDB/bam/DedupBAMs/

TASSEL parameters

Xmx=100G tasselLocation=/tassel-5-standalone/run_pipeline.pl

PHG CreateHaplotypes Parameters

wgsKeyFile=/phg/wgs_KeyFile.txt LoadHaplotypesFromGVCFPlugin.gvcfDir=/phg/inputDir/loadDB/gvcf/ LoadHaplotypesFromGVCFPlugin.referenceFasta=/phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fna.gz LoadHaplotypesFromGVCFPlugin.haplotypeMethodName=GATK_PIPELINE LoadHaplotypesFromGVCFPlugin.haplotypeMethodDescription=GATK_PIPELINE extendedWindowSize = 1000 mapQ = 48

GATK and Sentieon Parameters

gatkPath = /gatk/gatk numThreads=35 sentieon_license sentieonPath=/sentieon/bin/sentieon

CreateConsensi parameters

haplotypeMethod = GARK_PIPELINE consensusMethod = CONSENSUS mxDiv = 0.005 seqErr = 0.02 minSites = 20 minTaxa = 2 maxThreads = 60

rankingFile = null

clusteringMode = upgma

Graph Building Parameters

includeVariants = true

FilterGVCF Parameters. Adding any of these will add more filters.#exclusionString=UNASSIGNED

DP_poisson_min=0.0

DP_poisson_max=1.0

DP_min=100

DP_max=UNASSIGNED

GQ_min=10

GQ_max=UNASSIGNED

QUAL_min=30

QUAL_max=UNASSIGNED

filterHets=UNASSIGNED

Imputation Pipeline parameters for VCF files

--- Used by liquibase to check DB version ---

liquibaseOutdir=/phg/outputDir

--- Used for indexing SNP positions ---

pangenomeHaplotypeMethod is the database method or methods for the haplotypes to which SNPs will be indexed

the index file lists the SNP allele to haplotype mapping and is used for mapping reads

pangenomeHaplotypeMethod=CONSENSUS pangenomeDir=/phg/outputDir/pangenome indexFile=/phg/outputDir/vcfIndexFile vcfFile=/phg/inputDir/imputation/vcf/SoyHapMap.SNP.GT.fixed.vcf.414accession.KASP.gz

--- Used for mapping reads

readMethod is the method name for storing the resulting read mappings

countAlleleDepths=true means allele depths will be used for haplotype counts, which is almost always a good choice

inputType=vcf keyFile=/phg/readMapping_key_file.txt readMethod=GBD_readMethod vcfDir=/phg/inputDir/loadDB/gvcf/ countAlleleDepths=true

--- Used for path finding

pathHaplotypeMethod determines which haplotypes will be consider for path finding

pathHaplotypeMethod should be the same as pangenomeHaplotypeMethod, but could be a subset

pathMethod is the method name used for storing the paths

pathHaplotypeMethod=CONSENSUS pathMethod=GBD_pathMethod maxNodes=1000 maxReads=10000 minReads=1 minTaxa=20 minTransitionProb=0.001 numThreads=3 probCorrect=0.99 removeEqual=true splitNodes=true splitProb=0.99 usebf=false maxParents = 1000000 minCoverage = 1.0

parentOutputFile = OPTIONAL

--- used by haploid path finding only

usebf - if true use Forward-Backward algorithm, other Viterbi

usebf=false minP=0.8

--- used by diploid path finding only

maxHap=11 maxReadsKB=100 algorithmType=efficient

--- Used to output a vcf file for pathMethod

outVcfFile=/phg/outputDir/Result.vcf

~ Optional Parameters ~

readMethodDescription=OPTIONAL

pathMethodDescription=OPTIONAL

bfInfoFile=OPTIONAL

~ providing a value for outputDir will write read mappings to file rather than the PHG db ~

outputDir=/phg/

wgs_keyfile enter image description here

The output file of the filtered gvcf file did not show any sequences. And in create consensus step, sever;/path/to/file error occurred.

enter image description here

After changing the colon of gvcfserverpath to a semicolon, executing CreateFromGVCF.groovy and proceeding with the create Consensus step, the same error occurred. Is there something wrong with the config file or keyfile?

haplotype Consensus PHG • 333 views

ADD COMMENT • link updated 2.5 years ago by Pierre Lindenbaum 166k • written 2.5 years ago by 지용 ▴ 20