snpEff ERROR_CHROMOSOME_NOT_FOUND GRCh37.75 genome
2
0
Entering edit mode
2.8 years ago
rbioinfo ▴ 40

Dear Biostar community,

I have a targeted resequencing experiment (Illumina) with the goal to detect mutations in certain genes. For this purpose, to align reads, I used the GRCh37 genome from NCBI (https://www.ncbi.nlm.nih.gov/genome/guide/human/). I used bcftools to call the variants and until this step everything was fine. However, when I reached the annotation step and used a prebuilt database from snpEff with the command:

java -Xmx32g -jar snpEff.jar GRCh37.75 variants_norm.vcf > annotated.vcf

It does not produce an appropriate annotation .vcf file. Instead .vcf file is full of "ERROR_CHROMOSOME_NOT_FOUND"

So far, it is one of the most common problems described in snpEff documentation: https://pcingola.github.io/SnpEff/se_troubleshooting/

Chromosome names in genome .fasta file are looked as

NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly

It seems to me, ensemble names were used in a pipeline.

Could you help me please, how can I convert the reference genome to a format that snpEff can process or where I can find a release that could suit the snpEff variant annotation format? I tried to search for a solution and have not found it.

Thank you in advance

annotaion vcf genomics variant snpEff • 1.2k views
ADD COMMENT
1
Entering edit mode
2.8 years ago

snpeff only knows chr1, chr2, chr3... rename your vcf with bcftools annotate Replacing the Chr names and position notions in vcf and something like https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_NCBI2UCSC.txt

ADD COMMENT
0
Entering edit mode

Thank you, it worked

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6