I am trying to build the database for Rat genome for annotating with my .VCF files which i had generated using GATK pipeline. problem is: since i have used Rat genome from NCBI genome as my reference (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/895/GCF_000001895.5_Rnor_6.0/GCF_000001895.5_Rnor_6.0_genomic.fna.gz) now i am facing an error while annotating. i have used all the standard procedure as pointed by Broad institute in creating VCF file.
ERRORS: Some errors were detected Error type Number of errors ***ERROR_CHROMOSOME_NOT_FOUND 5257945*** 00:03:13 Creating summary file: snpEff_summary.html 00:03:14 Creating genes file: snpEff_genes.txt 00:03:14 done. 00:03:14 Logging 00:03:15 Checking for updates
.. I have used the inbuilt databases for annotations. I doubt whether this will be a conflict between GTF and GFF file format since i have used NCBI genomes reference genome to align. Please help me in this.
I have thought of building my own database for snpeff and downloaded GFF & genome file from ncbi genome and tried to build a database its showing me the Error, it cant find .fa files, i have tried renaming my .fna to .fa but its dosnt work. please help. error is as follows:
Exons created for 496 transcripts. Deleting redundant exons (if needed): Total transcripts with deleted exons: 0 Collapsing zero length introns (if needed): . 0 Total collapsed transcripts: 4 Reading sequences : FASTA file: '/home/snpEff/./data/genomes/gcf_rnor.fa' not found. FASTA file: '/home/snpEff/./data/gcf_rnor/sequences.fa' not found. java.lang.RuntimeException: Cannot find reference sequence. at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.readExonSequences(SnpEffPredictorFactory.java:689) at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.readExonSequences(SnpEffPredictorFactoryGff.java:428) at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:342) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.createSnpEffPredictor(SnpEffCmdBuild.java:118) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:362) at org.snpeff.SnpEff.run(SnpEff.java:1041) at org.snpeff.SnpEff.main(SnpEff.java:159) java.lang.RuntimeException: Error reading file '/home/snpEff/./data/gcf_rnor/genes.gff' java.lang.RuntimeException: Cannot find reference sequence. at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.createSnpEffPredictor(SnpEffCmdBuild.java:118) at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:362) at org.snpeff.SnpEff.run(SnpEff.java:1041) at org.snpeff.SnpEff.main(SnpEff.java:159) 00:00:28 Logging 00:00:29 Checking for updates...
Can i uses some thing less 'coding ' to annotate my .vcf file - but should be effective and accurate in getting me SNPs and INDELs.
Sorry for lengthy post, and pardon for my English(i am non English speaking human) Hope you guys will help me in resolving this error .
Thanks for helping !!! David_emir