Trouble filtering CDS regions from Nextera Enrichment design
1
0
Entering edit mode
9.9 years ago
idedios ▴ 30

So I designed a panel on Illumina DesignStudio where I have over 200 genes that each need to be trimmed of non-coding exons. Going about it manually would take a couple weeks since the design has about 12,000 probes. For the probes that I filtered manually, I used IGV to look at the reference hg19 UCSC genome and check for amino acid sequences for the regions of the probes.

I wanted to know if there is an easier way of doing this, by using a shell script to parse my probe regions file and compare it to the reference genome without using IGV to view the reference.

sequencing • 2.0k views
ADD COMMENT
3
Entering edit mode
9.9 years ago
rbagnall ★ 1.8k

Hi Idedios,

You can get a bed file of coding regions from your genes from UCSC table browser. Select the following options..

group - genes and gene predictions

track - RefSeq genes

table - refGene

region - genome

identifiers - paste list (and paste a list of gene names in the new window)

output format - BED (browser extensible data)

output file - coding_regions.bed

click 'get output' and select Coding exons and click get BED

This will give a list of coding exons for each gene in BED format. You could then compare this list to a bedfile of your illumina probes using BED tools, for example. Use intersectBed to retrieve only the coding regions of your illumina probes

intersectBed -a coding_regions.bed -b illumina_probe_regions.bed > coding_illumina_probe_regions.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6