I recently followed [this question](Human Exome Capture Library Coordinates Download) in order to obtain sequences for exon-capture probes sold in Agilent kits. Column headers plus one line of the file looks like this:
TargetID ProbeID Sequence Replication Strand Coordinates
mRNA|AL390972 A_36_B233385 <a string of 120 ACGTs> NA + chr1:100111836-100111955
I want to know which way the probes are aimed: will a given probe amplify regions towards lower genomic coordinates or higher? Since I couldn't find any sort of readme file, I need help figuring out where the 3' and 5' ends are and how that relates to the coordinates. For a simple example, suppose I were to convert this 'drawing' into a file of the sort that Agilent publishes.
coords: 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 sequence: AAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGG probe: 5-AGGGGAAAA-3 ==> probe on reverse strand: <== 3-TTTCCCCTT-5 probe on forward strand: 5-AAAAGGGGA-3 ==> direction of replication: ==> or <==
Should (a few columns of) the file look like this? This way, everything goes 5' to 3', and I've reverse-complemented the probe on the minus strand.
sequence coords strand
AGGGGAAAA chr22:12-20 +
AAAGGGGAA chr22:33-41 -
AAAAGGGGA chr22:49-57 +
Thank you for your help.