Question: Human Exome Capture Library Coordinates Download
12
gravatar for Biomed
9.7 years ago by
Biomed4.7k
Bethesda, MD, USA
Biomed4.7k wrote:

Does any one know if and where I can download the files that contain the coordinates that Agilent Sure Select Exome capture kits target? I am interested in both the older version and the newer 50 MB version. It would also be nice to have TruSeq coordinates to but the Agilent is more important now. Thanks

ADD COMMENTlink modified 7.4 years ago by John St. John1.2k • written 9.7 years ago by Biomed4.7k

biomed: Did you manage to generate / download a file corresponding to Agilent Sure Select Exome 50 MB version ? I am looking for the same here.

ADD REPLYlink written 9.5 years ago by Khader Shameer18k

Yes, but not through a publicly available site. How can I contact you to send the file?

ADD REPLYlink written 9.5 years ago by Biomed4.7k

Is it possible for you to send me the files corresponding to Agilent SureSelect Exome? I am looking for them or give me direction where I can get them?

ADD REPLYlink written 7.8 years ago by faezeh.dorri0
16
gravatar for John St. John
7.4 years ago by
John St. John1.2k
San Francisco, CA, Cancer Therapeutics Innovation Group
John St. John1.2k wrote:

I was told recently by Agilent to download the data from here https://earray.chem.agilent.com/suredesign

Just create an account, log in, and select the "Find Designs" tab, then under that select "Agilent Catalog", then there should be a list of the different SureSelect related bed files and whatnot.

Also here is the info from the Agilent SureDesign help site on what the various files you will download from there actually mean:

BED files:

The three BED-format track files that SureDesign creates for each custom SureSelect design are described below. You can import these files into a compatible genome browser to graphically view the locations of the tracks in the genome. For detailed information on the tracks and how they can help you analyze your design, see Design analysis using tracks.

[design ID]_Regions.bed - This BED file contains a single track of the target regions of interest that SureDesign used to select the probes. You can use this track to see the exact regions that the program was attempting to cover when selecting the probes.

[design ID]_Covered.bed - This BED file contains a single track of the genomic regions that are covered by one or more probes in the design. The fourth column of the file contains annotation information. You can use this file for assessing coverage metrics.

[design ID]_AllTracks.bed - This multitrack BED file includes the following tracks:

  • The Target Regions track is identical to the track in the Regions BED file.
  • The Covered probes track is identical to the track in the Covered BED file.
  • The Missed Regions track contains any regions from the Target Regions track that are not included in the Covered probes track.
  • The Probes track contains the regions of all probes in the design.

Text files:

The three text files for a custom SureSelect design are described below. You can view these files in any text editor program (e.g. NotePad) or spreadsheet program (e.g. Excel). Any tables embedded in the text files are tab-delimited and contain column headers. Lines of text that start with a # character are comment lines.

[design ID]_Targets.txt - This file contains a list of the target identifiers that you entered when creating the design.

[design ID]_Probes.txt - This file is a list of the probes in the design, with specific information about each probe, including its probe ID, sequence, genomic coordinates, and the target it is intended to capture.

Note that a probe may be listed in the Probes text file multiple times if it covers multiple targets. This can occur if the target identifiers you entered map to overlapping regions or are synonyms for the same gene (e.g. HER2 and ERBB2). Although these probes are listed multiple times in the file, they are not replicated in the design.

[design ID]_Report.txt - This file contains summary information on the design, the probes, the targets, and the parameters used to create the design.

ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by John St. John1.2k

Thanks, this worked well for me.

ADD REPLYlink written 6.8 years ago by SteveL80
3
gravatar for Louis Letourneau
9.7 years ago by
Montreal
Louis Letourneau810 wrote:

A manifest would be the best, but You could always build it yourself since the data comes from: - coding exons annotated by the GENCODE project (http://www.sanger.ac.uk/gencode/) - all exons annotated in the consensus CDS (CCDS – March 2009) database as well as 10 base pairs of flanking sequence - small non-coding RNAs from miRBase (v.13) - and Rfam.

From: http://www.genomics.agilent.com/CollectionSubpage.aspx?PageType=Product&SubPageType=ProductData&PageID=2318

(the 50mb one)

You might be able to get it through eArray. eArray is Agilents microarray and targeted design tool

ADD COMMENTlink modified 9.7 years ago • written 9.7 years ago by Louis Letourneau810

Thanks for pointing to the Gencode project. Would you be able to point to a specific file in the ftp site ftp://ftp.sanger.ac.uk/pub/gencode to use in lieu of the Agilent Sure Select data?

ADD REPLYlink written 9.7 years ago by Biomed4.7k

A good starting point would be their release 5 GTF formated file: ftp://ftp.sanger.ac.uk/pub/gencode/release_5/gencode.v5.annotation.gtf.gz

ADD REPLYlink written 9.7 years ago by Louis Letourneau810
3
gravatar for Travis
8.6 years ago by
Travis2.8k
USA
Travis2.8k wrote:

You will need to access Agilent's earray system (earray)

If you are new to earray, you will first need to register and then it is free to use.

In the top right corner of the screen choose Application type Sureselect Target enrichment.

From there choose “Libraries” and then “Browse Libraries”.

Choose the catalogue kit that you are interested in and click download.

A list of the annotation files available will then appear, choose the file type you require and click download.

ADD COMMENTlink written 8.6 years ago by Travis2.8k
2
gravatar for Felix
9.4 years ago by
Felix50
Felix50 wrote:

Before you try re-creating the entire region list, please use the files at ftp://ftp.sanger.ac.uk/pub/fsk/exome/ e.g. exome_B_NCBI36.bed.

These are the regions the Agilent design was targetted at, created by merging coding regions of Havana and Ensembl genes (not just CCDS genes), adding miRNAs and adding the flanks as mentioned. If you need the positions of the final targets, please contact Agilent though.

ADD COMMENTlink written 9.4 years ago by Felix50

Sure that these are the targets that they were going for with their probe set? I just got off the phone with Agilent and they wouldn't tell me how to get a .bed file of the gene regions they were targeting specifically, although that would be really useful information to have. What they provide though is a .bed file of the probe coordinates they chose. Also the link you provided here appears to be broken. Also do you know which version of the Agilent sureselect kit was targeting this annotation? I am looking for info on v2.

ADD REPLYlink modified 7.9 years ago • written 7.9 years ago by John St. John1.2k

Did you find the files? I am looking for the same here.

ADD REPLYlink written 7.8 years ago by faezeh.dorri0
2
gravatar for Kevin
9.0 years ago by
Kevin640
Kevin640 wrote:

I was told by the very helpful Agilent staff that you can login to eArray to get the bed file. I found the link once but I have since lost it again.

eArray is here but I forgot the hoops I needed to jump (after registering) to get the file https://earray.chem.agilent.com/earray/

ADD COMMENTlink written 9.0 years ago by Kevin640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1029 users visited in the last hour