Question

snpEff ERROR_CHROMOSOME_NOT_FOUND GRCh37.75 genome

0

Entering edit mode

4.0 years ago

rbioinfo ▴ 40

Dear Biostar community,

I have a targeted resequencing experiment (Illumina) with the goal to detect mutations in certain genes. For this purpose, to align reads, I used the GRCh37 genome from NCBI (https://www.ncbi.nlm.nih.gov/genome/guide/human/). I used bcftools to call the variants and until this step everything was fine. However, when I reached the annotation step and used a prebuilt database from snpEff with the command:

java -Xmx32g -jar snpEff.jar GRCh37.75 variants_norm.vcf > annotated.vcf

It does not produce an appropriate annotation .vcf file. Instead .vcf file is full of "ERROR_CHROMOSOME_NOT_FOUND"

So far, it is one of the most common problems described in snpEff documentation: https://pcingola.github.io/SnpEff/se_troubleshooting/

Chromosome names in genome .fasta file are looked as

NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly

It seems to me, ensemble names were used in a pipeline.

Could you help me please, how can I convert the reference genome to a format that snpEff can process or where I can find a release that could suit the snpEff variant annotation format? I tried to search for a solution and have not found it.

Thank you in advance

annotaion vcf genomics variant snpEff • 1.5k views

ADD COMMENT • link 4.0 years ago by rbioinfo ▴ 40

score 1 · Answer 1 · 2021-07-18

1

Entering edit mode

4.0 years ago

Pierre Lindenbaum 166k

snpeff only knows chr1, chr2, chr3... rename your vcf with bcftools annotate Replacing the Chr names and position notions in vcf and something like https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_NCBI2UCSC.txt