I have an old data set from 2008 from a set of HumanCNV370-Quads, and I have downloaded relatively recent a set of extended-VCFs from the Altai Neanderthal and Denisovan genomes. I want to compare the data between the two. I know the genome coordinates for any given base can shift from assembly to assembly, but will the rsID for a given SNP change if and when the coordinate changes?
I have SNP data from 64 samples at ~330,000 rs ids (I know there is no mt/Y data, I am pretty sure this is all autosomal). The data is from an old set of HumanCNV370-Quads from 2008. I don't have the genomic coordinates.
I have download two sets of VCF files from the Denisovan 30× and Altai Neanderthal 50× coverage genomes (available here http://cdna.eva.mpg.de/denisova/VCF/hg19_1000g/ and here http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/). These files are in a cumbersome extended VCF format described here (http://www.sciencemag.org/content/suppl/2012/08/29/science.1224344.DC1/Meyer.SM.pdf page 16) and here (http://www.nature.com/nature/journal/v505/n7481/extref/nature12886-s1.pdf page 14). These files have rsIDs labeled for most sites (of course though not all sites in these genomes have been assigned rsIDs).
I also have Illumina data from 171 samples (the libraries enriched for NRY- and mtDNA), which I am now have in raw, un-rsID-ed, unfiltered VCFs, which I am trying to bring into mix, but I am going to ignore them for now (I have a thread on them here https://www.biostars.org/p/110272/)
For the 330k sites, I have the alleles for the common chimpanzee from the 1000G (phase1_release_v3/20101123) and the dbSNP build 141 for most of the sites from a friend of a friend. The goal is to use f4 statistics to calculate Neanderthal ancestry estimates.
Anyways, to my question, would the rsIDs from the SNP chip still correspond to the rsIDs that I find in extended VCF files? If not, what would I have to do to make them match up?