Empty imputed VCF file in diploidPathToVCF
0
0
Entering edit mode
19 months ago
dovi ▴ 60

Hi evyerone,

I do not manage to get the imputed vcf file. The file is created but empty (just filled with header). I run the ImputePipelinePlugin with "diploidPathToVCF" method. Because it seem that paths were created but output was empty. I then tried with PathsToVCFPlugin as below

tassel-5-standalone/run_pipeline.pl -Xmx4G   
   -debug \
   -configParameters params_config.txt \
   -HaplotypeGraphBuilderPlugin -configFile params_config.txt \
      -methods CONSENSUS_maxDiv0.0005 \
      -includeVariantContexts true \
      -includeSequences false \
   -endPlugin \
   -ImportDiploidPathPlugin \
      -pathMethodName PATH_METHOD_maxDiv0.0005 \
   -endPlugin \
   -PathsToVCFPlugin \
      -outputFile test.vcf \
      -referenceFasta  ref/my_reference.fasta \
      -endPlugin

And the output (last lines) was:

[...]
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Oct 11, 2022 14:55:47
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
ImportDiploidPathPlugin Parameters
pathMethodName: PATH_METHOD_maxDiv0.0005
taxa: null

[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - first connection: dbName from config file = TestDB.db host: localHost user: sqlite type: sqlite
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Database URL: jdbc:sqlite:TestDB.db
[pool-1-thread-1] INFO net.maizegenetics.pangenome.db_loading.DBLoadingUtils - Connected to database:  

[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin - importPathsFromDB: query: SELECT line_name, paths_data FROM paths, genotypes, methods WHERE paths.genoid=genotypes.genoid AND methods.method_id=paths.method_id AND methods.name='PATH_METHOD_maxDiv0.0005'
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin - importPathsFromDB: number of path list: 3
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Oct 11, 2022 14:55:47
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:55:47
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
PathsToVCFPlugin Parameters
outputFile: test.vcf
refRangeFileVCF: null
referenceFasta: ref/my_reference.fasta
makeDiploid: true
positions: null

Genome FASTA character conversion: ACGTNacgtn to ACGTNacgtn
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of ranges: 43665
[pool-1-thread-1] INFO net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin - PathsToVCFPlugin: processData: number of taxa: 3
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:55:56: progress: 0%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:56:51: progress: 10%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:57:42: progress: 20%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:58:32: progress: 30%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 14:59:23: progress: 40%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:00:14: progress: 50%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:01:4: progress: 60%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:01:54: progress: 70%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:02:45: progress: 80%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:03:35: progress: 90%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:04:26: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.pangenome.hapCalling.PathsToVCFPlugin: time: Oct 11, 2022 15:04:26
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.hapCalling.ImportDiploidPathPlugin: time: Oct 11, 2022 15:04:26: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.pangenome.api.HaplotypeGraphBuilderPlugin: time: Oct 11, 2022 15:04:26: progress: 100%

The progress in PathsToVCFPlugin seems that is gathering and processing paths info, but the output file is empty, just has the header info:

##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=3,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##INFO=<ID=AF,Number=3,Type=Integer,Description="Allele Frequency">
##INFO=<ID=ASM_Chr,Number=1,Type=String,Description="Assembly chromosome">
##INFO=<ID=ASM_End,Number=1,Type=Integer,Description="Assembly end position">
##INFO=<ID=ASM_Start,Number=1,Type=Integer,Description="Assembly start position">
##INFO=<ID=ASM_Strand,Number=1,Type=String,Description="Assembly strand">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample1 Sample2 Sample3

Does someone have an idea of why the vcf file would be empty? I have already tried with three different datasets and with none I manage to get the filled vcf file.

I used rPHG to check for the paths and I think that these are OK. So the problem would be outputting them to VCF?

> pathMet <- rPHG::pathsForMethod(
+    configFile = configPath,
+    pathMethod = "PATH_METHOD_maxDiv0.0005"
+ )
>
> dim(pathMet)
[1]     6 37754
>
> pathMet[, 1:10]
           5242   5243   5244   5246   5249   5250   5251   5252   5253   5255
Sample1  192612 188553 191169 188039 189055 188490 189111 191889 189220 189425
Sample1  192613 188553 191169 188039 189055 188490 189112 191890 189221 189426
Sample2 192612 188553 191169 188039     -1 188490 189111 191889 189220 189425
Sample2 192612 188553 191169 188039     -1 188490 189111 191889 189221 189426
Sample3 192612     -1 191169 188039     -1     -1 189111     -1 189220 189425
Sample3 192613     -1 191169 188038     -1     -1 189111     -1 189220 189425

Thank you

PHG • 316 views
ADD COMMENT

Login before adding your answer.

Traffic: 1725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6