Question: Fatal error reading file snpEff.config with SnpEff
0
gravatar for aguiller.mateos
6 months ago by
aguiller.mateos10 wrote:

Hi, I have recently been introduced to bioinformatics, Variant Calling and the necessary analysis for it. I have obtained several VCF files from FreeBayes and I wanted to run them through snpEff too. This way I could annotate de variants between two genomes of the genus Shewanella. Sadly I have not been able to do this since I keep getting the same error over and over. Here is the code and the running process with the error included:

(base) binso@LAPTOP-P73O7IPS:~/Alignment/VariantCalling$ java -jar ../../miniconda3/pkgs/snpeff-4.3.1t 3/share/snpeff-4.3.1t-3/snpEff.jar Shewanella_putrefaciens_cn_32 ../../miniconda3/pkgs/snpeff-4.3.1t-3/share/snpeff-4.3.1t-3/snpEff.config -v -s VC_SPt_ST2-3D_6.vcf>VC_snpEff.vcf                                                      

00:00:00        SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani 
00:00:00        Command: 'ann' 
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'Shewanella_putrefaciens_cn_32'  
00:00:00        Reading config file: /home/binso/Alignment/VariantCalling/snpEff.config
00:00:00        Reading config file: /home/binso/miniconda3/pkgs/snpeff-4.3.1t-3/share/snpeff-4.3.1t-3/snpEff.config
00:00:00        done                                                                                                                                                   
00:00:00        Reading database for genome version 'Shewanella_putrefaciens_cn_32' from file'/home/binso/miniconda3/pkgs/snpeff4.3.1t3/share/snpeff4.3.1t3/./data/
        Shewanella_putrefaciens_cn_32/snpEffectPredictor.bin' (this might take a while)                                                                                         00:00:01        done                                                                                                                                                   
00:00:01        Loading Motifs and PWMs                                                                                                                                 00:00:01        Building interval forest                                                                                                                                00:00:02        done.
00:00:02        Genome stats :                                                                                                                                         
        #-----------------------------------------------                                                                                                                      
        # Genome name : 'Shewanella_putrefaciens_cn_32'                                                                                                   
        # Genome version : 'Shewanella_putrefaciens_cn_32'                                                                                                         
        # Genome ID: 'Shewanella_putrefaciens_cn_32[0]'                                                                                                        
        # Has protein coding info: true                                                                                                                                     
        # Has Tr. Support Level info : true                                                                                                                                    
        # Genes : 4331                                                                                                                                     
        # Protein coding gene: 4171                                                                                                                           
        #-----------------------------------------------                                                                                                                        
        # Transcripts: 4331                                                                                                                                     
        # Avg. transcripts per gene: 1.00                                                                                                                                    
        # TSL transcripts: 0                                                                                                                                        
        #-----------------------------------------------                                                                                                                        
        # Checked transcripts:                                                                                                                                          
        # AA sequences :   3972 ( 95.23% )                                                                                                                       
        # DNA sequences :      0 ( 0.00% )                                                                                                                         
        #-----------------------------------------------                                                                                                                        
        # Protein coding transcripts : 4171                                                                                                                             
        #Length errors :    182 ( 4.36% )                                                                                                                         
        #STOP codons in CDS errors :     29 ( 0.70% )                                                                                                                         
        #START codon errors :    535 ( 12.83% )                                                                                                                        
        #STOP codon warnings :     17 ( 0.41% )                                                                                                                        
        #UTR sequences :      0 ( 0.00% )                                                                                                                         
        #Total Errors :    540 ( 12.95% )                                                                                                                        
        # WARNING: No protein coding transcript has UTR                                                                                                     
        #-----------------------------------------------                                                                                                                     
        # Cds : 3972                                                                                                                             
        #Exons: 4331                                                                                                                              
        # Exons with sequence : 4331                                                                                                                                     
        # Exons without sequence     : 0                                                                                                                                        
        # Avg. exons per transcript  : 1.00                                                                                                                                    
        #-----------------------------------------------                                                                                                                      
        # Number of chromosomes      : 1                                                                                                                                        
        # Chromosomes                : Format 'chromo_name size codon_table'                                                                                                    
        # 'Chromosome'    4659220 Standard                                                                                                                       
        #-----------------------------------------------

00:00:02        Predicting variants       
    VcfFileIterator.parseVcfLine(132):  Fatal error readingfile'../../miniconda3/pkgs/snpeff4.3.1t3/share/snpeff4.3.1t3/snpEff.config' (line: 17): 
    data.dir = ./data/                                                                                                                                                      java.lang.RuntimeException: java.lang.RuntimeException: Impropper VCF entry: Not enough fields (missing tab separators?).
    data.dir = ./data/                                                                                                                                                              
    at org.snpeff.fileIterator.VcfFileIterator.parseVcfLine(VcfFileIterator.java:133)                                                                                       
    at org.snpeff.fileIterator.VcfFileIterator.readNext(VcfFileIterator.java:184)                                                                                           
    at org.snpeff.fileIterator.VcfFileIterator.readNext(VcfFileIterator.java:57)                                                                                            
    at org.snpeff.fileIterator.FileIterator.hasNext(FileIterator.java:123)                                                                                                  
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotateVcf(SnpEffCmdEff.java:467)                                                                                     
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.annotate(SnpEffCmdEff.java:142)                                                                                        
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1029)                                                                                            
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)                                                                                             
    at org.snpeff.SnpEff.run(SnpEff.java:1183)                                                                                                                              
    at org.snpeff.SnpEff.main(SnpEff.java:162)   

Caused by: java.lang.RuntimeException: Impropper VCF entry: Not enough fields (missing tab separators?).
data.dir = ./data/                                                                                                                                                               
at org.snpeff.vcf.VcfEntry.parse(VcfEntry.java:1007)                                                                                                                    
at org.snpeff.vcf.VcfEntry.<init>(VcfEntry.java:219)                                                                                                                    
at org.snpeff.fileIterator.VcfFileIterator.parseVcfLine(VcfFileIterator.java:130)
... 9 more                                                                                                                                                     
00:00:02       Logging                                                                                                                                                                    
00:00:03        Checking for updates...
00:00:05        Done.

I'm not sure if this is due to the snpEff.config file or my VCF file. Any possible tip or recommendation would be highly appreciated.

ADD COMMENTlink modified 6 months ago • written 6 months ago by aguiller.mateos10
1
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

Caused by: java.lang.RuntimeException: Impropper VCF entry: Not enough fields (missing tab separators?).

it's a problem with your vcf file. Check it's a proper vcf file (eg. try to read it with bcftools view )

ADD COMMENTlink written 6 months ago by Pierre Lindenbaum131k

Thank you for replying, I have tried that with the following command and it seems to work alright:

(base) binso@LAPTOP-P73O7IPS:~/Alignment/VariantCalling$ bcftools view --header-only VC_SPt_ST2-3D_2.vcf

##fileformat=VCFv4.2                                                                     
##FILTER=<ID=PASS,Description="All filters passed">                                                                 
##fileDate=20200414                                                                                                                          
##source=freeBayes v1.3.2-dirty                                                                                              
##reference=12SPt.fasta

It keeps going for a while so I just pasted some of the first lines

ADD REPLYlink modified 6 months ago • written 6 months ago by aguiller.mateos10

It keeps going for a while so I just pasted some of the first lines

can you please try _without_ --header-only ...

ADD REPLYlink written 6 months ago by Pierre Lindenbaum131k
(base) binso@LAPTOP-P73O7IPS:~/Alignment/VariantCalling$ bcftools view VC_SPt_ST2-3D_2.vcf

It prints the whole input file like this:

Shewanella_putrefaciens_CN_32   3954896 .       A       G       2518.72 .       AB=0;ABP=0;AC=1;AF=1;AN=1;AO=84;CIGAR=1X;DP=84;DPB=84;DPRA=0;EPP=4.66476;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=41.2738;MQMR=0;NS=1;NUMALT=1;ODDS=579.957;PAIRED=0.357143;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=2996;QR=0;RO=0;RPL=53;RPP=15.5221;RPPR=0;RPR=31;RUN=1;SAF=37;SAP=5.59539;SAR=47;SRF=0;SRP=0;SRR=0;TYPE=snp     GT:DP:AD:RO:QR:AO:QA:GL 1:84:0,84:0:0:84:2996:-256.727,0                                                     
Shewanella_putrefaciens_CN_32   3955123 .       C       T       1794.75 .       AB=0;ABP=0;AC=1;AF=1;AN=1;AO=61;CIGAR=1X;DP=61;DPB=61;DPRA=0;EPP=7.31765;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=41.3607;MQMR=0;NS=1;NUMALT=1;ODDS=413.255;PAIRED=0.442623;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=2132;QR=0;RO=0;RPL=29;RPP=3.33068;RPPR=0;RPR=32;RUN=1;SAF=32;SAP=3.33068;SAR=29;SRF=0;SRP=0;SRR=0;TYPE=snp     GT:DP:AD:RO:QR:AO:QA:GL 1:61:0,61:0:0:61:2132:-182.941,0                                                     
Shewanella_putrefaciens_CN_32   3955181 .       G       T       2069.34 .       AB=0;ABP=0;AC=1;AF=1;AN=1;AO=74;CIGAR=1X;DP=75;DPB=75;DPRA=0;EPP=12.5178;EPPR=5.18177;GTI=0;LEN=1;MEANALT=1;MQM=41.4459;MQMR=42;NS=1;NUMALT=1;ODDS=476.482;PAIRED=0.445946;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=2487;QR=25;RO=1;RPL=28;RPP=12.5178;RPPR=5.18177;RPR=46;RUN=1;SAF=30;SAP=8.76177;SAR=44;SRF=1;SRP=5.18177;SRR=0;TYPE=snp GT:DP:AD:RO:QR:AO:QA:GL 1:75:1,74:1:25:74:2487:-211.598,0
ADD REPLYlink modified 6 months ago • written 6 months ago by aguiller.mateos10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 775 users visited in the last hour