dbsnp in vcf format compatible with hg38
4
0
Entering edit mode
6.1 years ago
apocalyps52 ▴ 40

Hello,

I need to annotate the snps in a vcf file that I generated from hg38 reference. I try to use SnpSift for annotation however I can't find a dbsnp.vcf compatible with hg38. All links I found online guide me to NCBI dbSNP ftp server where the reference files are GRCh38 build.

Anybody can help me to find the hg38 dbsnp vcf file ?

snp database hg38 Assembly • 4.2k views
1
Entering edit mode

they are the same! Since the last release (GRCh38, Dec. 2013), the hgxxx and GRChxxx naming style have been made uniform. So GRCh38 and hg38 are the same genome ver.

The prev. was GRCh37 or hg19.

3
Entering edit mode
6.1 years ago
apocalyps52 ▴ 40

Thanks guys for your suggestions. I found the following perl script that solved my problem.

perl -pe 's/^([^#])/chr\1/' myfile.vcf > newfile.vcf

1
Entering edit mode
6.1 years ago
agata88 ▴ 850

As far as I know GRCh38 is hg38 :) And GRCh37 is hg19. Best, Agata

1
Entering edit mode
6.1 years ago
agata88 ▴ 850

If I understand correctly, you have incompatibility in files: in one there is chr1 and in other 1. If this is a problem the easiest way is to change VCF file, replace 1 to chr1 or opposite. You can do it by importing file to excel and then use function replace ...or write your own script for example in python.

Best,

Agata

2
Entering edit mode

No, no, no, no importing in excel to change that. Just use a unix tool like sed or use the same data for mapping and for annotation to avoid these problems altogether.

1
Entering edit mode

Why you all are so "NO" to excel, tool like any other :)

5
Entering edit mode

I work in a mainly wet lab group and I've seen terrible things happening in excel which will likely haunt me forever and result in years of therapy.

1
Entering edit mode

Using Excel instead of sed in this situation is like buying a whole shed's worth of tools when all you need is a screwdriver.

1
Entering edit mode

In addition, you take your hammer out of that shed after buying it and knock the screw in the wood with the handle.

0
Entering edit mode

Everybody has an excel (especially students)- so why don't use it :) I can agree that it is maybe not a perfect solution in this situation ...but still a solution.

0
Entering edit mode

I used excel to make changes in my vcf file. It doesn't work as vcf file after saving it.

0
Entering edit mode

You need to copy columns by yourself to txt file (notebook) and save it with .vcf :) Don't give up it will work :) PS. Don't forget the data with # from your raw vcf to add at the beginning.

0
Entering edit mode

I know you mean well but this is just bad advice, can't put it any nicer.

0
Entering edit mode
6.1 years ago
apocalyps52 ▴ 40

Hello,

As I mentioned, my original reference is UCSC format so the chromosome annotation is "chrX" whereas GRCh38 use only number for chromosome.

How can I overcome this incompatibility ?

Thanks

0
Entering edit mode

Go back to the original data and to it again with the reference you' d like to use now. Or maybe the dbSNP file from the GATK bundle: https://software.broadinstitute.org/gatk/download/bundle

0
Entering edit mode

@Devon Ryan has done the mapping for you as explained in this post. Just go to his repository to get the mapping from GRCh38 nomenclature to hg38 one.