Question: Trouble filtering CDS regions from Nextera Enrichment design
gravatar for idedios
6.7 years ago by
USA/Irvine/NeoGenomics Laboratories
idedios30 wrote:

So I designed a panel on Illumina DesignStudio where I have over 200 genes that each need to be trimmed of non-coding exons. Going about it manually would take a couple weeks since the design has about 12,000 probes. For the probes that I filtered manually, I used IGV to look at the reference hg19 UCSC genome and check for amino acid sequences for the regions of the probes.

I wanted to know if there is an easier way of doing this, by using a shell script to parse my probe regions file and compare it to the reference genome without using IGV to view the reference.

sequencing • 1.5k views
ADD COMMENTlink modified 6.7 years ago by rbagnall1.7k • written 6.7 years ago by idedios30
gravatar for rbagnall
6.7 years ago by
rbagnall1.7k wrote:

Hi Idedios,

You can get a bed file of coding regions from your genes from UCSC table browser. Select the following options..

group - genes and gene predictions

track - RefSeq genes

table - refGene

region - genome

identifiers - paste list (and paste a list of gene names in the new window)

output format - BED (browser extensible data)

output file - coding_regions.bed

click 'get output' and select Coding exons and click get BED

This will give a list of coding exons for each gene in BED format. You could then compare this list to a bedfile of your illumina probes using BED tools, for example. Use intersectBed to retrieve only the coding regions of your illumina probes

intersectBed -a coding_regions.bed -b illumina_probe_regions.bed > coding_illumina_probe_regions.bed
ADD COMMENTlink modified 13 months ago by _r_am32k • written 6.7 years ago by rbagnall1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1968 users visited in the last hour