Question

Cross-Hybridization Of 25-Mer Probes.

0

Entering edit mode

12.2 years ago

Nelson • 0

Hi everyone,

I have done blast with Affymetrix probe sets of the HG-U133 Plus 2 with NCBI human refSeq database. So I would like to know what will be the length of cross-hybridization of probes with RefSeq database. I mean that the number of base pairs.

For example, when we consider for 50 mer probes, if the probes are matching with database sequences greater then or equal to 25 base pair then we usually consider this is a cross-hybridization. For 25 mer probes, I can consider >=17bp length as the cross hybridization. Help me out to solve this problem.

Thanks in Advance

With regards,
Nelson

affymetrix probeset • 2.6k views

ADD COMMENT • link updated 5 months ago by Ram 43k • written 12.2 years ago by Nelson • 0

0

Entering edit mode

FYI, there is a typo in your title. Should be 'cross hybridization'...

ADD REPLY • link 12.2 years ago by Malachi Griffith 19k

score 2 · Answer 1 · 2012-01-31

It seems like this task must have been addressed by this point. For example, as described in this publication:

A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array

Using the BLAST program, we matched probes with documented and postulated human transcripts. This resulted in the redefinition of approximately 37% of the probes on the U133 plus 2.0 array. This updated identification specifically points out where the identification is complicated by cross-hybridization from splice variants or closely related genes.

If nothing else, you can probably find some guidance for your own approach in there...

score 0 · Answer 2 · 2012-02-01

I would something other than BLAST for this purpose. Good options are BLAT and Bowtie.

Basically, what you need to do is:

Obtain Refseq human database in a suitable format; e.g. FASTA from the FTP site
Obtain file of 25-mers
Run either BLAT (just requires query/database sequences to be in FASTA format) or Bowtie (requires more pre-processing of sequences; read the documentation)
Parse the output for alignments with length >= 17 bp; again, BLAT is a good choice due to multiple options for output format which are easy to parse