Hello,
I am having a hard time deciphering the error associated with the vcf annotation file. I used the command:
java -Xmx8g -jar snpEff.jar Tair10.1 /globalhome/cae453/HPC/sample6.vcf > /globalhome/cae453/HPC/sample6.eff.vcf
The output is
java.lang.RuntimeException: Sanity check: This should never happen!
at org.snpeff.interval.Gene.circularClone(Gene.java:195)
at org.snpeff.interval.Genes.createCircularGenes(Genes.java:53)
at org.snpeff.snpEffect.SnpEffectPredictor.buildForest(SnpEffectPredictor.java:146)
at org.snpeff.SnpEff.loadDb(SnpEff.java:617)
at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:940)
at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:923)
at org.snpeff.SnpEff.run(SnpEff.java:1188)
at org.snpeff.SnpEff.main(SnpEff.java:168)
In the end, the eff.vcf file is empty
Previously, to perform the annotation I had to build a genome database (Tair10.1) executing the following command:
Building database java -jar snpEff.jar build -gtf22 -v Tair10.1
. The database doesn't have errors apparently:
Remove empty chromosomes:
Marking as 'coding' from CDS information:
Done: 48147 transcripts marked
#-----------------------------------------------
# Genome name : 'Arabidopsis_thaliana'
# Genome version : 'Tair10.1'
# Genome ID : 'Tair10.1[0]'
# Has protein coding info : true
# Has Tr. Support Level info : true
# Genes : 38311
# Protein coding genes : 27444
#-----------------------------------------------
# Transcripts : 59994
# Avg. transcripts per gene : 1.57
# TSL transcripts : 0
#-----------------------------------------------
# Checked transcripts :
# AA sequences : 0 ( 0.00% )
# DNA sequences : 0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 48147
# Length errors : 35 ( 0.07% )
# STOP codons in CDS errors : 33 ( 0.07% )
# START codon errors : 63 ( 0.13% )
# STOP codon warnings : 38 ( 0.08% )
# UTR sequences : 44634 ( 74.40% )
# Total Errors : 100 ( 0.21% )
#-----------------------------------------------
# Cds : 286237
# Exons : 324728
# Exons with sequence : 324728
# Exons without sequence : 0
# Avg. exons per transcript : 5.41
# WARNING : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes : 7
# Chromosomes : Format 'chromo_name size codon_table'
# 'NC_003070.9' 30427671 Standard
# 'NC_003076.8' 26975502 Standard
# 'NC_003074.8' 23459830 Standard
# 'NC_003071.7' 19698289 Standard
# 'NC_003075.7' 18585056 Standard
# 'NC_037304.1' 367808 Standard
# 'NC_000932.1' 154478 Standard
#-----------------------------------------------
I also tried building the database using GFF.file but appears more warning messages and error in the built database associated with Start and Stop codon. Even though when I use this database to perform the vcf file annotation the file is not empty but has a lot of WARNING_TRANSCRIPT_NO_START_CODON
.
At this moment I don't know how to fix the error associated with the empty vcf. file I will really appreciate any help that you can provide me.
Thank you!
Carlos Erazo