Question: Number of SNPs from data from BioMart
4.7 years ago by
jamesxli200710 wrote:

Hi everyone, 

in order to get the SNPs of chromosome 1 ( homo sapiens,) I went to BioMart page on the site I selected "Ensembl Variation 83"/"Homo Sapiens Short Variants(SNPs and indels excluding flanked variants)" as dataset and set the filter to chromosome 1.  The data file I downloaded contains 1 488 472 records all with different rs numbers and different chromosome positions.  As far as I know, that the chromosome 1 should have about 130K SNPs. So, my question is, did I do something wrong by obtaining the data? How should I reconcile this large discrepancy?

thanks in advance


4.7 years ago by
Emily_Ensembl21k wrote:

I expect your 130k prediction is based on old data. There should be 11.6M short variants on human chr1. Perhaps your prediction is pre-1000 Genomes?

Note that BioMart is not capable of dealing with this volume of data, hence getting only 10% of the variants you ought to. I suggest downloading the VCF files and using tabix to get out only chr1.

