Does any one know if and where I can download the files that contain the coordinates that Agilent Sure Select Exome capture kits target? I am interested in both the older version and the newer 50 MB version. It would also be nice to have TruSeq coordinates to but the Agilent is more important now. Thanks
I was told recently by Agilent to download the data from here https://earray.chem.agilent.com/suredesign
Just create an account, log in, and select the "Find Designs" tab, then under that select "Agilent Catalog", then there should be a list of the different SureSelect related bed files and whatnot.
Also here is the info from the Agilent SureDesign help site on what the various files you will download from there actually mean:
The three BED-format track files that SureDesign creates for each custom SureSelect design are described below. You can import these files into a compatible genome browser to graphically view the locations of the tracks in the genome. For detailed information on the tracks and how they can help you analyze your design, see Design analysis using tracks.
[design ID]_Regions.bed - This BED file contains a single track of the target regions of interest that SureDesign used to select the probes. You can use this track to see the exact regions that the program was attempting to cover when selecting the probes.
[design ID]_Covered.bed - This BED file contains a single track of the genomic regions that are covered by one or more probes in the design. The fourth column of the file contains annotation information. You can use this file for assessing coverage metrics.
[design ID]_AllTracks.bed - This multitrack BED file includes the following tracks:
- The Target Regions track is identical to the track in the Regions BED file.
- The Covered probes track is identical to the track in the Covered BED file.
- The Missed Regions track contains any regions from the Target Regions track that are not included in the Covered probes track.
- The Probes track contains the regions of all probes in the design.
The three text files for a custom SureSelect design are described below. You can view these files in any text editor program (e.g. NotePad) or spreadsheet program (e.g. Excel). Any tables embedded in the text files are tab-delimited and contain column headers. Lines of text that start with a # character are comment lines.
[design ID]_Targets.txt - This file contains a list of the target identifiers that you entered when creating the design.
[design ID]_Probes.txt - This file is a list of the probes in the design, with specific information about each probe, including its probe ID, sequence, genomic coordinates, and the target it is intended to capture.
Note that a probe may be listed in the Probes text file multiple times if it covers multiple targets. This can occur if the target identifiers you entered map to overlapping regions or are synonyms for the same gene (e.g. HER2 and ERBB2). Although these probes are listed multiple times in the file, they are not replicated in the design.
[design ID]_Report.txt - This file contains summary information on the design, the probes, the targets, and the parameters used to create the design.
A manifest would be the best, but You could always build it yourself since the data comes from: - coding exons annotated by the GENCODE project (http://www.sanger.ac.uk/gencode/) - all exons annotated in the consensus CDS (CCDS – March 2009) database as well as 10 base pairs of flanking sequence - small non-coding RNAs from miRBase (v.13) - and Rfam.
(the 50mb one)
You might be able to get it through eArray. eArray is Agilents microarray and targeted design tool
You will need to access Agilent's earray system (earray)
If you are new to earray, you will first need to register and then it is free to use.
In the top right corner of the screen choose Application type Sureselect Target enrichment.
From there choose “Libraries” and then “Browse Libraries”.
Choose the catalogue kit that you are interested in and click download.
A list of the annotation files available will then appear, choose the file type you require and click download.
These are the regions the Agilent design was targetted at, created by merging coding regions of Havana and Ensembl genes (not just CCDS genes), adding miRNAs and adding the flanks as mentioned. If you need the positions of the final targets, please contact Agilent though.
I was told by the very helpful Agilent staff that you can login to eArray to get the bed file. I found the link once but I have since lost it again.
eArray is here but I forgot the hoops I needed to jump (after registering) to get the file https://earray.chem.agilent.com/earray/