dbSNP154v2 issue when using GATK
0
0
Entering edit mode
3.5 years ago

Hello Guys,

I hope somebody can help me. I am following a pipeline to call snps from fastq files. I have successfully performed the alignment to hg38 and mks duplicates in my bam file with picard. However, now I am in the step in which I am using GATK to call variants. I know GATK requires a dbSNP file to use as a reference. I have downloaded the latest dbSNP release (dbSNP154 v2) from this website: https://ftp.ncbi.nih.gov/snp/latest_release/VCF/. The chromosomes were named differently in that latest version. So I looked at the assembly report, extracted the columns, and renamed the chromosomes using bcftools annotate --rename-chrs. I had to reorder the rows in this new file using bcf sort, because using tabix to index the file was giving me an error. However, after all this steps, when running GATK on my sample using this dbSNP154vs release I get the following error: htsjdk.samtools.SAMException: Sequence name '' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&+./:;=?@^_|~-]'

If I run GATK on an older version of the dbSNP(like 151), it works perfectly fine. Any ideas on how can I run GATK using the dbSNP154v2 for known sites. Thanks!

Alex

SNP assembly • 1.0k views
ADD COMMENT
0
Entering edit mode

show us the chromosomes names in the VCF.

ADD REPLY
0
Entering edit mode

Hello, I replied below, thanks!

ADD REPLY
0
Entering edit mode

Hello, See below, the 1st column was replaced with the 2nd column in the vcf file. This is just part of the rows that were replaced. I just noticed that in the 2nd column there is a bunch of "na", and probably those "na" are messing up gatk. Is there a way to eliminate those "na" from the vcf file? Thanks!

Alex


RefSeq-Accn UCSC-style-name

NC_000001.11 chr1

NC_000002.12 chr2

NC_000003.12 chr3

NC_000004.12 chr4

NC_000005.10 chr5

NC_000006.12 chr6

NC_000007.14 chr7

NC_000008.11 chr8

NC_000009.12 chr9

NC_000010.11 chr10

NC_000011.10 chr11

NC_000012.12 chr12

NC_000013.11 chr13

NC_000014.9 chr14

NC_000015.10 chr15

NC_000016.10 chr16

NC_000017.11 chr17

NC_000018.10 chr18

NC_000019.10 chr19

NC_000020.11 chr20

NC_000021.9 chr21

NC_000022.11 chr22

NC_000023.11 chrX

NC_000024.10 chrY

NT_187361.1 chr1_KI270706v1_random

NT_187362.1 chr1_KI270707v1_random

NT_187363.1 chr1_KI270708v1_random

NT_187364.1 chr1_KI270709v1_random

NT_187365.1 chr1_KI270710v1_random

NT_187366.1 chr1_KI270711v1_random

NT_187367.1 chr1_KI270712v1_random

NT_187368.1 chr1_KI270713v1_random

NT_187369.1 chr1_KI270714v1_random

NT_187370.1 chr2_KI270715v1_random

NT_187371.1 chr2_KI270716v1_random

NT_167215.1 chr3_GL000221v1_random

NT_113793.3 chr4_GL000008v2_random

NT_113948.1 chr5_GL000208v1_random

NT_187372.1 chr9_KI270717v1_random

NT_187373.1 chr9_KI270718v1_random

NT_187374.1 chr9_KI270719v1_random

NT_187375.1 chr9_KI270720v1_random

NT_187376.1 chr11_KI270721v1_random

NT_113796.3 chr14_GL000009v2_random

NT_113888.1 chr14_GL000194v1_random

NT_167219.1 chr14_GL000225v1_random

NT_187377.1 chr14_KI270722v1_random

NT_187378.1 chr14_KI270723v1_random

NT_187379.1 chr14_KI270724v1_random

NT_187380.1 chr14_KI270725v1_random

NT_187381.1 chr14_KI270726v1_random

NT_187382.1 chr15_KI270727v1_random

NW_021159989.1 na

NW_015495300.1 chr4_KQ983257v1_fix

NW_021159990.1 na

NW_021159991.1 na

NW_021159992.1 na

NW_021159993.1 na

NW_021159994.1 na

NW_021159995.1 na

NT_187685.1 chr19_KI270931v1_alt

NT_187686.1 chr19_KI270932v1_alt

NT_187687.1 chr19_KI270933v1_alt

NT_113949.2 chr19_GL000209v2_alt

NC_012920.1 chrM

na chrUn_KI270752v1

ADD REPLY
0
Entering edit mode

I am wondering how to it is dbSNP153 or dbSNP154? https://ftp.ncbi.nih.gov/snp/latest_release/VCF/ Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6