Question: Oligo Design From Ests
9.5 years ago
Ketil4.0k wrote:

When designing oligos for a microarray from ESTs, it seems to be crucial to choose the correct direction (strand) for the oligos, but I can't seem to find anything in the literature on this, or how to do this. (I've written a small tool to conditionally reverse-complement ESTs using a dynamic programming algorithm that takes into account BlastX hits and poly-A etc, but I'm unsure how important this is for the results.)

Any pointers or opinions?

Correct strand is certainly crucial - wrong strand = no hybridization!

I guess there are 3 approaches:

  • if a researcher generates the ESTs in their lab, presumably they know whether they are 5' or 3' clones
  • if ESTs are obtained from dbEST, sequences are annotated as 5' or 3'
  • otherwise as you say, a software tool to match ESTs to chromosomal sequence is required

This looks useful: "ESTPiper – a web-based analysis pipeline for expressed sequence tags". It includes a tool for microarray oligonucleotide probe design and the paper discusses other EST analysis tools, some of which also do probe design.

Google search for EST strand probe design also throws up plenty of useful-looking results.

I tried to rely on strand annotation from dbEST, but soon realized it is often wrong. I think this is due to clones being inserted the wrong way into the vector, I guess the rate this happens depends on the kit being used. This also means that although the researcher "knows" the orientation, she will often be wrong.

written 9.5 years ago by Ketil
1) There is SeqClean utility which may help you get rid of artifacts:

You may also try to construct rRNA library and for using it with SeqClean.

2) check your ESTs for common repeats. While some transposons are expressed, I am not sure if you want to have some pre-mRNA intronic sequences on the chip.

3) did you tried assembling all your ESTs?

4) you may also tblastx your ESTs against these from related species. There may not be any proteins @NCBI covering the less conserved protein parts from your species of interest.

Thanks for bringing this up. I have had really poor results from the various EST cleaning tools, some old notes at Do you have any independent resources demonstrating the effectiveness of these tools?

written 9.5 years ago by Ketil

If the question is: did I create an artificial "EST" set with vectors/ribosomal sequences thrown in, then point mutated/ mutated with indels/flipped (all this can be seen in real EST data), then looked at results, then answer is: not yet. I am mapping 3+ millions of various species ESTs to a novel genome, some part of EST sets I mapped without seqclean pre-processing, so I hope to see the difference. GMAP, which I use for ESTs mapping is often overeager to call a match (on a protein level) with UTRs, and possibly with other sequences, so less dubious ends should give less nonsense matches.

written 9.5 years ago by Darked89
