Question

Issues generating the annotated.eff.vcf file : Sanity check his should never happen

0

Entering edit mode

3.8 years ago

cae453 • 0

Hello,

I am having a hard time deciphering the error associated with the vcf annotation file. I used the command:

java -Xmx8g -jar snpEff.jar Tair10.1 /globalhome/cae453/HPC/sample6.vcf > /globalhome/cae453/HPC/sample6.eff.vcf

The output is

java.lang.RuntimeException: Sanity check: This should never happen!
        at org.snpeff.interval.Gene.circularClone(Gene.java:195)
        at org.snpeff.interval.Genes.createCircularGenes(Genes.java:53)
        at org.snpeff.snpEffect.SnpEffectPredictor.buildForest(SnpEffectPredictor.java:146)
        at org.snpeff.SnpEff.loadDb(SnpEff.java:617)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:940)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:923)
        at org.snpeff.SnpEff.run(SnpEff.java:1188)
        at org.snpeff.SnpEff.main(SnpEff.java:168)

In the end, the eff.vcf file is empty

Previously, to perform the annotation I had to build a genome database (Tair10.1) executing the following command:

Building database java -jar snpEff.jar build -gtf22 -v Tair10.1. The database doesn't have errors apparently:

Remove empty chromosomes:

        Marking as 'coding' from CDS information:
        Done: 48147 transcripts marked
#-----------------------------------------------
# Genome name                : 'Arabidopsis_thaliana'
# Genome version             : 'Tair10.1'
# Genome ID                  : 'Tair10.1[0]'
# Has protein coding info    : true
# Has Tr. Support Level info : true
# Genes                      : 38311
# Protein coding genes       : 27444
#-----------------------------------------------
# Transcripts                : 59994
# Avg. transcripts per gene  : 1.57
# TSL transcripts            : 0
#-----------------------------------------------
# Checked transcripts        :
#               AA sequences :      0 ( 0.00% )
#              DNA sequences :      0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 48147
#              Length errors :     35 ( 0.07% )
#  STOP codons in CDS errors :     33 ( 0.07% )
#         START codon errors :     63 ( 0.13% )
#        STOP codon warnings :     38 ( 0.08% )
#              UTR sequences :  44634 ( 74.40% )
#               Total Errors :    100 ( 0.21% )
#-----------------------------------------------
# Cds                        : 286237
# Exons                      : 324728
# Exons with sequence        : 324728
# Exons without sequence     : 0
# Avg. exons per transcript  : 5.41
# WARNING                    : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes      : 7
# Chromosomes                : Format 'chromo_name size codon_table'
#               'NC_003070.9'   30427671        Standard
#               'NC_003076.8'   26975502        Standard
#               'NC_003074.8'   23459830        Standard
#               'NC_003071.7'   19698289        Standard
#               'NC_003075.7'   18585056        Standard
#               'NC_037304.1'   367808  Standard
#               'NC_000932.1'   154478  Standard
#-----------------------------------------------

I also tried building the database using GFF.file but appears more warning messages and error in the built database associated with Start and Stop codon. Even though when I use this database to perform the vcf file annotation the file is not empty but has a lot of WARNING_TRANSCRIPT_NO_START_CODON.

At this moment I don't know how to fix the error associated with the empty vcf. file I will really appreciate any help that you can provide me.

Thank you!

Carlos Erazo

snpEff • 750 views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 3.8 years ago by cae453 • 0