Help with vcf annotation
1
0
Entering edit mode
4 months ago
HoWI • 0

Hi everyone, I intend to add rsids to my dantelabs vcf and later merge it with 1240k dataset via plink. I have done it before on my laptop with an older version of dbsnp file (138) , but snp overlap with 1240k dataset was not good. I wanted to try again with the latest dbsnp file (156) but as the uncompressed file is whopping 165gb in size so it is not possible to use my laptop. I am unfamiliar with usegalaxy but still tried to annotate my vcf with bcftools on usegalaxy the resultant file had no rsids.

Can someone please instruct me regarding this?

Also I am getting this error message-

“INFO/RS value encountered and set to missing at NC_000001.10:6319593”.

Snpsift appears to be tailor made for this but I get this error message with it-

“Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3308)
at org.snpsift.annotate.VcfIndexDataChromo.grow(VcfIndexDataChromo.java:103)
at org.snpsift.annotate.VcfIndexDataChromo.add(VcfIndexDataChromo.java:46)
at org.snpsift.annotate.VcfIndex.add(VcfIndex.java:67)
at org.snpsift.annotate.VcfIndex.loadIntervals(VcfIndex.java:245)
at org.snpsift.annotate.VcfIndex.index(VcfIndex.java:183)
at org.snpsift.annotate.DbVcfSorted.open(DbVcfSorted.java:55)
at org.snpsift.annotate.AnnotateVcfDb.open(AnnotateVcfDb.java:395)
at org.snpsift.SnpSiftCmdAnnotate.annotateInit(SnpSiftCmdAnnotate.java:190)
at org.snpsift.SnpSiftCmdAnnotate.annotate(SnpSiftCmdAnnotate.java:70)
at org.snpsift.SnpSiftCmdAnnotate.run(SnpSiftCmdAnnotate.java:410)
at org.snpsift.SnpSiftCmdAnnotate.run(SnpSiftCmdAnnotate.java:397)
at org.snpsift.SnpSift.run(SnpSift.java:588)
at org.snpsift.SnpSift.main(SnpSift.java:76)”
Vcf dbsnp annotation • 614 views
ADD COMMENT
1
Entering edit mode
4 months ago

but snp overlap with 1240k dataset was not good.

are you sure you're using the same build (hg19 vs hg38 ) ? are you using the same chromosome notation than in the dbsnp file (chr1 vs 1, chr1 vs NC_000001, etc... )

I wanted to try again with the latest dbsnp file (156) but as the uncompressed file is whopping 165gb in size

latest human i see is 'only' 24G under https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/

and you shouldn't need to uncompress it

ADD COMMENT
0
Entering edit mode

Thanks. I tried to isolate the 1st column ('cut' function on usegalaxy) of both my vcf and dbsnp file and yes dbsnp uses different chromosomal notation as compared to my vcf's. Can anything be done for this (some replace function on usegalaxy perhaps?) As for uncompression, it appears to be automated on either usegalaxy's or bcftools' part.

ADD REPLY
0
Entering edit mode

Thanks a lot. Also this page was very helpful. https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&chromInfoPage=

ADD REPLY

Login before adding your answer.

Traffic: 1897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6