Trouble Toward Snps.
3
1
Entering edit mode
13.2 years ago
Roman ▴ 20

I have to download ( ncbi_snp I presume) set of SNPs related with two different groups of people - f.e. japanese and chinese or healthy people and the diseased ones.
Amount of different SNPs must be greater or equal to 10,000 ; each group must contain at least 100 people.

But I meet obstacle - all the SNPs from each group must be the same in my case (because I'm going to compare it at the next level of research and will try to find out which SNPs cause disease I deal with).

So the question is how to download SNPs in concept discussed above ?

snp dbsnp • 2.6k views
ADD COMMENT
2
Entering edit mode
13.2 years ago

The HapMap SNP data come from individuals that are only minimally phenotyped - sex, age perhaps, parent (of a trio tells you if they have a healthy reproduction system). In terms of disease phenotypes, you will really have nothing from HapMap. Yes, lots of great genotype data for differences in allele frequencies, but association to disease will be non-existent. dbGaP (database of genotypes and phenotypes, via NIH) or the Wellcome Trust Consortium data are much better places to look for case-control genotype data. You'll have to register to access the data.

ADD COMMENT
0
Entering edit mode

Could you explain me what's the connection between SNPs and two databases that you specified in your answer ?

ADD REPLY
0
Entering edit mode

You should go and look at these databases - dbGaP has data on genotype-phenotype associations. Many of those genotypes are from SNP data. The Wellcome Trust Consortium has done a lot of work associating genetic variants with disease phenotypes and disease risk.

ADD REPLY
2
Entering edit mode
13.2 years ago
Mutated_Dater ▴ 290

I don't have anything new to the basic 'gist' of Larry's answer but I will help pad out the details for you a bit.

As I'm sure you know there are literally millions of SNPs in an organism. For many of these SNPs there is little or no information about the functional consequences of the SNP. The million dollar question is 'does the snp have a phenotype' and most of the time the answer is 'we don't know!'

Databases such as dbSNP are repositories of variations, mostly SNPs. They will tell you where the SNP is in the genome and what frequency the different alleles of the SNP occur in different populations. However, there isn't much information about the functional consequences of the SNP (i'm sure experts will want to correct me here but I'm simplifying). Also you don't know who the variants belong to and you need 10,000 SNPs in the same person (or persons) don't you?

Next we have GWAS studies where people have tried to find out what diseases, if any, the SNPs are linked to. In a GWAS 100s of 1000s of individuals with and without a specific disease are genotyped for a certain set of SNPs to see if there is any statistical association between the SNPs and the disease. The outcome will be a p value for each SNP (or marker) indicating whether it is associated with the disease or not

So if I am reading your question correctly (and I'm not entirely sure I am) you need at least 10,000 SNPs from 200 people, 100 with a disease and a hundred without. That sounds to me like the need the data from a GWAS. You could look at this catalogue of GWAS studies. What do you think Larry? From looking at past answers on here Larry is an authority on SNPs

ADD COMMENT
0
Entering edit mode

Exactly. Some GWAS studies may satisfy the case-control aspect of this question, but others won't as they are of a different design. This is why the Welcome Trust Case-Control Consortium is better.

ADD REPLY
0
Entering edit mode

I was not aware of the welcome trust work so i will look into that myself out of interest. I've just had a look at dbGAP as I've never had opportunity to use it before and that certainly seems a very good source of data for the OP.

ADD REPLY
0
Entering edit mode

Thanks for the more deep explanation, m_d. I've checked out GWAS site your refer to above. Yes, there are lots of disease-sorted studies, each one have a list of SNPs related to the study. But there are only a few SNPs in each research. So that it's like a solved problem (as I understand). Where is the source data of all of these GWAS studies? I still can't find them as I still can't find any data sources responsed to my restrictions. P.S. "As of 01/27/11, this table includes 794 publications and 3942 SNPs."(Taken from GWAS website). Does this phrase say that GWAS-website have no enough SNP?

ADD REPLY
0
Entering edit mode

what about this database: An Open Access Database of Genome-wide Association Results. They say all studies had at least 50,000 SNPs. The quote from the GWAS website could mean they have only reported SNPs with significant associations rather than all SNPs. Null results in this area are surely worthy of recording though so I'm not sure. Perhaps due to space constraints they can only list the SNPs with an association which doesn't help you. You may have to trawl through some papers and look in their supplmentary materials for the full data sets.

ADD REPLY
0
Entering edit mode

this looks promising - there are data sets on here: http://www.ebi.ac.uk/ega/page.php.

ADD REPLY
0
Entering edit mode
13.2 years ago

Trying the FTP site of hapmap: ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/latest/forward/non-redundant

with:

  • HCB: Han Chinese in Beijing, China
  • JPT: Japanese in Tokyo, Japan

However, I'm not sure that you'll find more than 100 individuals in each file.

ADD COMMENT
1
Entering edit mode

yes, currently all hapmap pops have over 100 individuals, apart from ASW and MEX which have ~50 each. I agree that this is a valuable resource for what you are looking for, although if you aim for ~10K SNPs maybe you'll find the ~4M hapmap SNPs a bit to large.

ADD REPLY

Login before adding your answer.

Traffic: 2608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6