I am doing Bioinformatics project based on microarray gene expression data and there are some basic issues I don't fully understand. I was hoping members of this forum may be able to help me. Please could you address the following points in turn
Is there a naming convention for Affy probe sets? This is an example page from GEO and it seems as though the probe sets have a naming convention but I cannot figure it out. Some names end in 'at' and others end in 'st.' Many names have '-5' or '-3' or 'M' in them too.
How can probes distinguish between mRNA that has and has not been processed (e.g. intron splicing). Is this possible? I expect most researchers want to know the processed mRNA (see next point)
How do probes in general account for the fact that genes can have specific transcript variants? Does the probe target a common sequence in all isoforms or do you get different probes for the different transcripts? Examples would be helpful. I presume most researchers want to know which specific transcript variants are present in a cell
How do probes account for sequence variations such as SNPs? A variation within a gene shouldnt affect the level of transcription of a gene (or should it?) but it could affect the binding of a transcript to a target probe. Are probe sequences designed such that they exclude known SNPs
probe sets contain a set of overlapping probes for a target sequence. Do you expect a target mRNA sequence to bind equally to each of these probes? Do the statistical analysis take into account the 'average' binding of an mRNA to all of the probe in a probe set to give a picture of the expression level of an mRNA?
Thank you for your time