Hello,
I have following problem:
I downloaded the xml files containing all the SNPs from dbSNP from: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/genotype/
However, if I choose any SNP from the xml file, for example:
SnpLoc genomicAssembly="108:GRCh38.p7" geneId="55869" geneSymbol="HDAC8" chrom="X" start="72440889" locType="2" rsOrientToChrom="fwd" contigAllele="C" contig="NT_011651:18"
ByPop popId="12632" sampleSize="2" AlleleFreq allele="C" freq="1" GTypeFreq gtype="C/C" freq="1"
ByPop SsInfo
SsInfo ssId="1554832747" locSnpId="PHASE3_chrX_1606429" ssOrientToRs="fwd"
ByPop popId="16651" sampleSize="1008"
AlleleFreq allele="A" freq="0.549"/
AlleleFreq allele="C" freq="0.451"/
ByPop
ByPop popId="16652" sampleSize="1006"
AlleleFreq allele="A" freq="0.009"
AlleleFreq allele="C" freq="0.991"
ByPop
ByPop popId="16653" sampleSize="1322"
AlleleFreq allele="A" freq="0.089"
AlleleFreq allele="C" freq="0.911"
ByPop
ByPop popId="16654" sampleSize="694"
AlleleFreq allele="A" freq="0.187"
AlleleFreq allele="C" freq="0.813"
ByPop
ByPop popId="16655" sampleSize="978"
AlleleFreq allele="A" freq="0.197"
AlleleFreq allele="C" freq="0.803"
ByPop
SsInfo
GTypeFreq gtype="C/C" freq="1"
In dbSNP I find is this one: https://www.ncbi.nlm.nih.gov/snp/?term=X%3A72440889
The gene Symbol, base position and chromosome are the same, but it has different rsId, bands (p7 or p12) and different allel frequencies.
Are they really the same SNPs? Why are there different values, especially for the frequency?
Another Example form the xml:
SnpInfo rsId="880002407" observed="C/T" SnpLoc genomicAssembly="108:GRCh38.p7" geneId="105377212" geneSymbol="LOC105377212" chrom="X" start="63263620" locType="2" rsOrientToChrom="rev" contigAllele="A" contig="NT_011651:18"/
If I search it in dbSNP: https://www.ncbi.nlm.nih.gov/snp/?term=X%3A63263620
I find no items.
How is that possible? Why I can not find the SNP in dbSNP if it is present in the xml file (which is from dbSNP)? Should I not be able to find all SNPs from the xml in dbSNP?
How can I find this SNP (and others) in dbSNP using the data from the xml files?
Thank you so much!