Converting dbSNP VCF to work with RefSeq chromossome ID
0
0
Entering edit mode
11 weeks ago
avelarbio46 ▴ 20

Hello everyone! I've been trying to use GATK with updated version of the human genome as the GATK files are outdated by ten years.

I've downloaded NCBI reference GCF_000001405.40.fna, which is GRCh38.p14

For dbSNP version, I've downloaded GCF_000001405.40.gz , which is also GRCh38.p14

When extracting the contig names from my reference file, I found:

NC_000001.11 Homo sapiens chromosome 1, GRCh38.p14 Primary Assembly
0 252068378 NT_187361.1 Homo sapiens chromosome 1 unlocalized genomic scaffold, GRCh38.p14 Primary As etc...

Extracting the contig names:

reference contigs = [NC_000001.11, NT_187361.1, NT_187362.1, NT_187363.1, NT_187364.1, NT_187365.1, NT_187366.1, NT_187367.1, NT_187368.1, NT_187369.1, NC_000002.12, NT_187370.1, NT_187371.1, NC_000003.12, NT_167215.1, NC_000004.12, NT_113793.3...

For dbSNP file, I found:

features contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_KI270706v1_random, chr1_KI270707v1_random...

Which causes a bunch of errors with GATK and other anotation tools.

I'm lost to which option would be the best: Converting all BAMs and reference file contig names or converting the dbSNP vcf contig names. I have no idea how to do any of them!

NCBI dbSNP RefSeq • 249 views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

see bcftools annotate --rename-chrs

ADD REPLY

Login before adding your answer.

Traffic: 2390 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6