Impute variants with the PHG. Pipeline and VCF final output for each sample
1
1
Entering edit mode
8 weeks ago
Miguel ▴ 10

Hi, I am trying to use an existing PHG database to impute variants.

input: 2 fastq files each from a different sample

I have 3 questions:

1) In the STEP 3, the manual provide examples of executing workflows, What steps should I use to get to a gvcf or vcf for each of the low coverage samples I have?

2) I already ran some steps and got a VCF file with the name coming from the outVcfFile variable in the config file. but I see a single column even that the input key file has independent 2 samples in 2 fastq files. How can I get a vcf file for each sample or have a genotype column for each sample?

3) Is it required that I use step 3B or | and 3C between step 3A and 3E?

I am following the information described here https://bitbucket.org/bucklerlab/practicalhaplotypegraph/wiki/Home.md it suggest to run the steps in this order:

What I have used so far to have a non error run is:

#STEP 1A  makeDefaultDirectory

singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -MakeDefaultDirectoryPlugin -workingDir /phg/ -endPlugin > 1_A.log #STEP 0.A required but not in the manual singularity exec -B$PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -configParameters /phg/config0_A.txt -CheckDBVersionPlugin -outputDir /phg/ -endPlugin > 0_0.log

#STEP 2.5 Update PHG database schema
singularity exec -B $PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -debug -Xmx1G -configParameters /phg/config2_5.txt -LiquibaseUpdatePlugin -outputDir /phg/outputDir -endPlugin > 2_5.log STEP3A Create a pangenome Fasta File then stop singularity exec -B$PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx80G -debug -configParameters /phg/config_3.txt -ImputePipelinePlugin -imputeTarget pangenome -endPlugin > 3_A.log

STEP 3E Export imputed VCF from fastq files - homozygous
singularity exec -B \$PATH:/phg/ phg_latest.sif /tassel-5-standalone/run_pipeline.pl -Xmx80G -debug -configParameters /phg/config_3.txt -ImputePipelinePlugin -imputeTarget pathToVCF -endPlugin > 3_E.log


Thanks Miguel

PHG • 212 views
0
Entering edit mode
8 weeks ago
pjb39 ▴ 60

You only need to run 3E to generate a VCF file. That will also run all necessary intermediate steps including 3A as long as you have a configuration file with all of the required parameters filled out. If you have already run some of the other steps, those will be skipped and not run again. In case you missed it, there is a link to a sample config file in the section "Writing a config file" near the top of the web page.

0
Entering edit mode

I did notice the section "Writing a config file".

here is the config file I was using for the step 3 mentioned in my original question:

host=localHost
user=sqlite
DB=/phg/phg_v5Assemblies_20200608.db
DBtype=sqlite
liquibaseOutdir=/phg/outputDir
pangenomeHaplotypeMethod=mummer4
pangenomeDir=/phg/outputDir/pangenome
indexKmerLength=21
indexWindowSize=11
indexNumberBases=90G
inputType=fastq
keyFile=/phg/key.inputfromfq.txt
fastqDir=/phg/inputDir/imputation/fastq/
samDir=/phg/inputDir/imputation/sam/
lowMemMode=true
maxRefRangeErr=0.25
outputSecondaryStats=false
maxSecondary=50
fParameter=f1000,5000
minimapLocation=minimap2
pathHaplotypeMethod=mummer4
pathMethod=TEST1
maxNodes=1000
minTaxa=1
minTransitionProb=0.001
probCorrect=0.99
removeEqual=true
splitNodes=true
splitProb=0.99
usebf=false
usebf=false
minP=0.8
maxHap=11
algorithmType=efficient
outVcfFile=Test1_out


Am I missing something in the config file or in the logic of what I am expecting as output?. Since is an imputation on independent samples shouldn't I get a list of imputed snps for each individual instead a single list of snps?

0
Entering edit mode

I ran the step 3E but The output VCF does not have any called genotypes for the sample as the relevant sample columns have a "." all the rows in the file follow this pattern. Do I have something missing in the config file or in the execution of step 3E?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       15      .       A       G       .       .       .
1       16      .       T       A,C,G   .       .       .
1       17      .       A       C,G     .       .       .
1       18      .       A       C       .       .       .
1       24      .       G       T       .       .       .