I have a few questions about the distribution of disease-SNPs on the genome. I apology in advance if some of them are very banal to you. I'm not a bioinformatician but a statistician. I've tried to obtain answers on my own, by searching on the web and on this forum but I'm still unsure if I got them right and I didn't find all I was looking for.
Therefore your help will be very much appreciated...!
So, if my sources are correct, each genome has approx. 3.3mio SNPs among which about 1.3mio are in intragenic regions. Among the latter, 25'000 to 40'000 SNPs are in protein-coding regions.
Q1: if I understood correctly, a gene is not wholly considered as a protein-coding region. What is exactly the difference between a gene and its protein-coding part? And what about exons? Are they a synonym for protein-coding regions or for genes or...?
Q2: Are the boundaries of the protein-coding region considered as a known information or have they to be inferred (how, in that case?)?
Q3: Is it ok to believe that the proportion of SNPs associated with a given phenotype will be much higher in the protein-coding region than in the rest of the genome? Can we say the same for SNPs in intragenic regions when compared with those in intergenic regions?
Q4: Given that the number of SNPs in non-coding regions is much higher than in coding-regions, I guess that most GWAS will mark much more SNPs as "significantly associated with the phenotype" in the non-coding regions. Is this guess correct? But still, in terms of proportions with respect to these to regions, a SNP in coding-regions is more likely to be associated with the phenotype, is that right?
Q5: In the common Illumina chips (for instance with 500k SNPs) is the proportion of SNPs found in protein coding region favoured with respect to SNPs in intergenic regions? If yes, what is the typical proportion of such SNPs on these chips? The proportion of SNPs found significant in GWAS would also depend on these choices made by the chip constructed then.
I thank you very much in advance....