Question: Genomic Location Of Micro Array Probe Sequences
5
gravatar for Georg Summer
7.9 years ago by
Georg Summer140
Maastricht
Georg Summer140 wrote:

I have a the individual probe sequences of a micro array. Now I would like to know where in the genome of the associated organism the individual probe sequences map. I am not interested in the genes in these areas but solely the location chrA bp x - y of matches. (I expect one probe to match different positions in the complete genome)

The Microarray Probe Mapping of Ensembl provides similar functionality but for know I would like to avoid the hassle of setting Ensembl up.

Can anyone point me to a data source that has this information?

genome microarray • 2.4k views
ADD COMMENTlink written 7.9 years ago by Georg Summer140

What array technology are you using? I am asking because some resources indeed already map probes or probesets from specific technology. And the specific details (especially probe length) of you technology might influence what the optimal tool would be,.

ADD REPLYlink written 7.9 years ago by Chris Evelo9.9k

initially it will be probe sequences of affy chips but might eventually evolve to sequences in general. that is why I do not want to rely to much on manufacturer supplied data (additional reasons see comment on the answer of Michael Dundrup)

ADD REPLYlink written 7.9 years ago by Georg Summer140

If you want to generalize your pipeline to deal with any sequences then I'd recommend setting up blat.

ADD REPLYlink written 7.9 years ago by Gareth Palidwor1.6k
2
gravatar for Michael Dondrup
7.9 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

You can use a short-read aligner to map the probes, it depends a bit on the length of the probe. Have the genome sequence and the probe sequence in a fasta file, then you can use blat, or Lastz immediately. Also Mosaik, SHRiMP, or SSAHA2 could be used. There are also a lot of Array annotation packages in Bioconductor. If it is a custom array, you can also perform matching in R using the Biostrings package and if available for your organism, install the BSGenome package for it.

I suggest your array manufacturer provides you with technical support information that contain these mappings, but it is always good to check those, e.g. for probes hitting multiple locations.

ADD COMMENTlink written 7.9 years ago by Michael Dondrup45k

well i am actually interested into multiple hit locations. call it paranoia but i have a general distrust for micro arrays and the manufacturer supplied support material. up-to-date-ness is not always their strength, so i prefer manufacturer decoupled pipelines

ADD REPLYlink written 7.9 years ago by Georg Summer140

The custom cdfs (cf my answer to this question) are not manufacturer produced. But Affymetrix is actually quite open about there software approaches and a lot of developmental libraries are available from them as open source.

ADD REPLYlink written 7.9 years ago by Chris Evelo9.9k

The custom cdfs (cf my answer to this question) are not manufacturer produced. But Affymetrix is actually quite open about their software approaches and a lot of developmental libraries are available from them as open source.

ADD REPLYlink written 7.9 years ago by Chris Evelo9.9k
2
gravatar for Chris Evelo
7.9 years ago by
Chris Evelo9.9k
Maastricht, The Netherlands
Chris Evelo9.9k wrote:

For Affymetrix arrays like you are using now, your problems with probes with multiple hits should already be covered in the so called [?]custom cdf's[?]. That is in fact why they were created, see [?]this publication[?]. The custom probesets are newly selected combinations of individual probes that each are selected based on the fact that they hit the target uniquely.

Since you are in Maastricht you might want to know that we already have experience with running BLAT on complete sets of ENSEMBL gene sequences and selecting the unique hits. That is how we selected the probesets used on [?]the NuGO arrays[?]. Part of the procedure is production of a table that for every probe contains the information what it hits. So if you really want to use individual probes (there are quite some thermodyamic and statistical reasons why that is not necessarily a good idea) that table could be a good start.

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Chris Evelo9.9k

in my case or lets better say in the ideas i am toying around with i am not so much interested in the probesets and what the cover but the actual individual probes. i am coming at this problem from a quite different way than the "usual" micro array analysis.

ADD REPLYlink written 7.9 years ago by Georg Summer140

I edited my answer to better address your specific interest in individual probes. But please be aware that individual probesequences an give highly variable signals because of for instance GC content and hairpins present. Statistical evaluation of probesets is key to Affymetrix analysis.

ADD REPLYlink written 7.9 years ago by Chris Evelo9.9k

I edited my answer to better address your specific interest in individual probes. But please be aware that individual probe sequences can give highly variable signals because of for instance GC content and hairpins present. Statistical evaluation of probesets is key to Affymetrix analysis. – Chris Evelo 0 secs ago

ADD REPLYlink written 7.9 years ago by Chris Evelo9.9k

I edited my answer to better address your specific interest in individual probes. But please be aware that individual probe sequences can give highly variable signals because of for instance GC content and hairpins present. Statistical evaluation of probesets is key to Affymetrix analysis.

ADD REPLYlink written 7.9 years ago by Chris Evelo9.9k
2
gravatar for Jeremy Leipzig
7.9 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

The UCSC genome table browser has several Affy tables mapped to various organisms:

affyU133Plus2 in hg19 for example

these tables are psl formatted:

bin matches misMatches  repMatches  nCount  qNumInsert  qBaseInsert tNumInsert  tBaseInsert strand  qName   qSize   qStart  qEnd    tName   tSize   tStart  tEnd    blockCount  blockSizes  qStarts tStarts
585 530 4   0   23  3   41  3   898 -   225995_x_at 637 5   603 chr1    249250621   14361   15816   5   93,144,229,70,21,   34,132,278,541,611, 14361,14454,14599,14968,15795,
ADD COMMENTlink written 7.9 years ago by Jeremy Leipzig18k
1
gravatar for Gareth Palidwor
7.9 years ago by
Gareth Palidwor1.6k
Ottawa
Gareth Palidwor1.6k wrote:

I suspect the simplest way to achieve your goal is to use the Ensembl API. It's really not that hard to set up...

I recently did something similar where I wanted to restrict my analysis to MOE430 probesets where all probes mapped to one gene and no other. I can probably dig up the code for you if you choose to use Ensembl.

ADD COMMENTlink written 7.9 years ago by Gareth Palidwor1.6k

if it develops in that direction (Ensembl) i'll get back to you thanks.

ADD REPLYlink written 7.9 years ago by Georg Summer140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2174 users visited in the last hour