How Does Ucsc Prepare Data For The Snp132 Table?
1
2
Entering edit mode
12.4 years ago
Chronos ▴ 610

Undoubtedly, UCSC's snp132 table is the most convenient when looking up rsIDs for a list of variants. I have tried using NCBI's dbSNP135 resources, but they are either less convenient to use or do not have all the data in one place: b135_SNPChrPosOnRef_37_3.bcp.gz doesn't provide ref/alt/length information, while 00-All.vcf.gz is harder to work with than simple tab-separated files or a database.

Hence the question: how does UCSC prepare that snp132 table from dbSNP data?

I've looked at the source they published, but it doesn't seem to deal with data preparation (this seems to be the case at least for dbSNP).

Without this knowledge (or, rather, a tool-chain), I'm left with these options:

  • use the slightly outdated snp132 (easiest)
  • parse NCBI's 00-All.vcf.gz into a database table (a little more effort than above)

All I'm really missing in the b135_SNPChrPosOnRef_37_3.bcp.gz file are ref and alt (from which I can infer the length).

Alternatively, I'd love to use some command-line utility to convert NCBI's VCF to a simpler BED-like format (had no success doing that with vcf-to-tab from vcftools).

Edit: as a matter of fact, UCSC's public code repository does have the snpNcbiToUcsc.c source code, in src/hg/snp/snpLoad

ucsc vcftools ncbi dbsnp • 3.0k views
ADD COMMENT
2
Entering edit mode
12.4 years ago

how does UCSC prepare that snp132 table from dbSNP data?

A large part of the process is described in the UCSC wiki: http://genomewiki.ucsc.edu/index.php/DbSNP_Track_Notes

ADD COMMENT
0
Entering edit mode

Thanks, I've updated the question with the [overlooked] path to snpNcbiToUcsc.c

ADD REPLY

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6