Question: snpEff generating lots of WARNING_REF_DOES_NOT_MATCH_GENOME
0
gravatar for hawkcharles
5.5 years ago by
hawkcharles10
United States
hawkcharles10 wrote:

So I'm trying to use snpEff to annotate the effects of variants, on an organism without much public data, and I'm getting a lot of warnings that the reference does not match the genome.  I've tried both freeBayes and GATK to call variants and get these warnings from snpEff in either case, despite using the same genome reference.

In the case of freeBayes, I'm running on BAM files we made based on our own sequencing data, it like so:

freebayes --fasta-reference organism_123.fa */*Realigned.bam > variants.fb.vcf

and then snpEff:

java -jar snpEff.jar -v organism.123 variants.fb.vcf > fb.eff.vcf

The database organism.123 was one I generated with snpEff from a .gff file since there isn't a db available publicly.  The .gff is gzipped as data/organism.123/genes.gff.gz and a copy of organism_123.fa was gzipped as data/genomes/organism.123.fa.gz.  I made the db with:

java -jar snpEff.jar build -gff3 -v organism.123

I get thousands of the ref-does-not-match-genome warnings in snpEff's output, along with an order of magnitude more no-start-codon warnings.  The latter could mean my .gff is bad somehow but that couldn't cause the former errors, could it? FreeBayes never even saw that file.  Anything obvious I'm doing wrong here?

It might be informative to see what snpEff thinks the reference is in these cases, but I don't see that in the annotations it produces.

snp • 2.3k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by hawkcharles10
1
gravatar for hawkcharles
5.5 years ago by
hawkcharles10
United States
hawkcharles10 wrote:

Okay, I'm an idiot.  I had reflexively used tar cvf to gzip the .gff and .fa files rather than gzip.  This of course put TAR headers at the beginning of each file and these were, understandably, confusing snpEff.

And for seeing what snpEff thinks the reference is, I discovered the very useful dump database command:

java -jar snpEff.jar dump organism.123 | less

 

ADD COMMENTlink written 5.5 years ago by hawkcharles10

yes that dump command is very handy - should have thought of it. Has helped me out on some occasions.

thanks for letting us know about the solution

ADD REPLYlink written 5.5 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1123 users visited in the last hour