Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release
1
0
Entering edit mode
4 months ago
Fernando • 0

Hello, Biostars Community,

I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I'm hoping someone might help me resolve.

Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12) instead of the typical chromosome notation (e.g., chr1, chr2, etc.). I've managed to map each RefSeq code to its corresponding chromosome. As a first step for simplicity, I've eliminated the unplaced scaffolds (e.g., NT_113901.1 unplaced-scaffold) using the following command:

zcat freq.vcf.gz | grep -E '^#|^NC_' | gzip > freq_only_NC.vcf.gz

Next, I attempted to use bcftools annotate --rename-chrs to change the encoding to the standard chromosome notation:

bcftools annotate --rename-chrs refseq_to_main_chr_mod.txt -o chr_freq_only_NC.vcf.gz -Oz freq_only_NC.vcf.gz

However, I received the following error:

[W::vcf_parse] Contig 'NC_000007.12' is not defined in the header. (Quick workaround: index the file with tabix.)

Upon trying to create an index for this new VCF, I encountered another error:

tabix -p vcf freq_only_NC.vcf.gz
[E::hts_idx_push] Invalid record on sequence #2: end 1 < begin 2040518
tbx_index_build failed: freq_only_NC.vcf.gz

I am puzzled by this error since the VCFs only list one position and not a range with a beginning and end. Could someone please assist me in understanding and resolving these issues?

Any insights or advice would be greatly appreciated.

bcftools dbSNP tabix vcf • 468 views
ADD COMMENT
1
Entering edit mode
4 months ago

NC_000007.12 is not defined in the header

means that you should find a header with the following syntax:

##contig=<ID=NC_000007.12,..

if not, you should declare the correct chromosomes, using bcftools rehader --fai /path/to/reference.fa.fai in.vcf.gz

ADD COMMENT
0
Entering edit mode

Thanks! at the end is what a problem with the tab in the file!

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6