Error when running create-maf-vcf at convertGVCFToHVCFForChrom step - PHGv2
0
0
Entering edit mode
3 months ago

After setting up all my data, when running either create-maf-vcf or gvcf2hvcf I get the next error:

phg create-maf-vcf \ --bed ref_ranges.bed \ --reference-file data/reference.fa \ --maf-dir output/alignment_files/ \ -o output/vcf_files \ --db-path vcf_dbs/

[main] INFO net.maizegenetics.phgv2.utils.VariantLoadingUtils 2024-05-16 15:34:00,650: begin Command:conda run -n phgv2-conda tiledbvcf >stat --uri vcf_dbs//hvcf_dataset
[main] INFO net.maizegenetics.phgv2.utils.VariantLoadingUtils 2024-05-16 15:34:01,344: Using TileDB datasets created in folder vcf_dbs/.
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:34:01,572: CreateASMHvcfs: calling buildRefGenomeSeq
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:34:29,215: CreateASMHvcfs: processing /genoma/nfs/new_PHG/output/alignment_files/BARKE.maf
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
SLF4J: Class path contains SLF4J bindings targeting slf4j-api versions 1.7.x or earlier.
SLF4J: Ignoring binding found at [jar:file:/opt/phgv2/phg/lib/logback-classic-1.2.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See https://www.slf4j.org/codes.html#ignoredBindings for an explanation.
getVariantContextsfromMAF: Processing a single genome
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:39:41,716: createASMHvcfs: gvcfVariants.size == 1
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:39:47,804: createASMHvcfs: processing sampleName = BARKE
[main] INFO net.maizegenetics.phgv2.utils.VariantLoadingUtils 2024-05-16 15:42:12,537: bgzipping file output/vcf_files/BARKE.g.vcf
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:45:02,838: createASMHvcfs: calling convertGVCFToHVCF for BARKE
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:45:06,786: in convertGVCFToHVCF: sort and call converGVCFToHVCFForChrom
[main] INFO net.maizegenetics.phgv2.cli.CreateMafVcf 2024-05-16 15:45:06,788: in convertGVCFToHVCFForChrom: bedRanges.size = 8371
[main] INFO net.maizegenetics.phgv2.utils.SeqUtils 2024-05-16 15:45:24,340: queryAgc: Running Agc Command: conda run -n phgv2-conda agc getctg vcf_dbs//assemblies.agc chr1H_OX459902.1@BARKE:381044-382128 chr1H_OX459902.1@BARKE:376477-381043 chr1H_OX459902.1...
[main] INFO net.maizegenetics.phgv2.utils.SeqUtils 2024-05-16 15:45:38,931: queryAgc: finished chrom chr1H_OX459902.1
Exception in thread "main" java.lang.NullPointerException

   at net.maizegenetics.phgv2.cli.CreateMafVcf.buildSeq(CreateMafVcf.kt:532)
   at net.maizegenetics.phgv2.cli.CreateMafVcf.addSequencesToMetaData(CreateMafVcf.kt:516)
   at net.maizegenetics.phgv2.cli.CreateMafVcf.convertGVCFToHVCFForChrom(CreateMafVcf.kt:282)
   at net.maizegenetics.phgv2.cli.CreateMafVcf.convertGVCFToHVCF(CreateMafVcf.kt:188)
   at net.maizegenetics.phgv2.cli.CreateMafVcf.createASMHvcfs(CreateMafVcf.kt:110)
   at net.maizegenetics.phgv2.cli.CreateMafVcf.createASMHvcfs$default(CreateMafVcf.kt:77)
   at net.maizegenetics.phgv2.cli.CreateMafVcf.run(CreateMafVcf.kt:635)
   at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:279)
   at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:292) 
   at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:41)
   at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:457)
   at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:454)
   at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:474)
   at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:481)
   at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:58)

These are for instance the details of the work environment and files:

phg version 2.2.86.135

cat Reference.fa | grep "^>"

chr1H_LR890096.1
chr2H_LR890097.1
chr3H_LR890098.1
chr4H_LR890099.1
chr5H_LR890100.1
chr6H_LR890101.1
chr7H_LR890102.1
CAJHDD010000001.1
CAJHDD010000002.1
CAJHDD010000003.1
.......................

cat Reference.fa | grep "^>" | wc -l

290

About the first sample it is trying to process:

cat BARKE.fa | grep "^>" | head -10

chr1H_OX459902.1 sampleName=Barke
chr2H_OX459903.1 sampleName=Barke
chr3H_OX459904.1 sampleName=Barke
chr4H_OX459905.1 sampleName=Barke
chr5H_OX459906.1 sampleName=Barke
chr6H_OX459907.1 sampleName=Barke
chr7H_OX459908.1 sampleName=Barke
contig:ptg000028l sampleName=Barke
contig:ptg000097l sampleName=Barke
contig:ptg000110l sampleName=Barke
............

cat BARKE | grep "^>" | wc -l

430

Alignment is performed with no issues, create-ref-vcf as well.

Last, that is how the assemblies.agc file looks like, It does not seem to have nothing wrong, as long as we can extract ranges from specific contigs.

agc info vcf_dbs/assemblies.agc

No. samples : 3
k-mer length : 31
Min. match length: 20
Batch size : 50
Reference name : GCA_904849725.1_MorexV3_pseudomolecules.chrnames


Any help will be appreciated, thank you

PHG pangenome PHG_v2 • 228 views
ADD COMMENT

Login before adding your answer.

Traffic: 1196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6