Is A Genome Position In An Exon Or Intron?
2
2
Entering edit mode
10.6 years ago
Justin ▴ 460

Say I have a position in the hg19 reference genome, e.g. chr1:56.

How do I know programmatically if it's contained in an intron or an exon or otherwise?

exon intron • 4.9k views
ADD COMMENT
9
Entering edit mode
10.6 years ago

use the following awk and the mysql server of the UCSC genome browser with the table knownGene ( http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.sql ):

BEGIN   {
    FS="[\t]";
    result="GENOMIC";
    # warning: ucsc positions are 0-based 
    POS=int(position);
    }

    {
    if(result=="EXON") next;
    txStart=int($4);
       txEnd=int($5);
        if(txStart>=txEnd) next;
        result="INTRON_OR_UTR";

    exonCount=int($8);
    split($9,exonStarts,"[,]");
    split($10,exonEnds,"[,]");
    len=0;

    for(i=1;i<= exonCount;i++)
        {
        if(POS>=int(exonStarts[i]) && POS<int(exonEnds[i]))
            {
            result="EXON";
            break;
            }
        }
    }
END    {

    printf("%s\n",result);

    }

examples:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'select * from knownGene where chrom="chr3" and NOT(txStart>184035527 or txEnd<=184035527)' | awk -v position=184035527 -f f.awk
EXON

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'select * from knownGene where chrom="chr3" and NOT(txStart>184035441 or txEnd<=184035441)' | awk -v position=184035441 -f f.awk
INTRON_OR_UTR

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'select * from knownGene where chrom="chr3" and NOT(txStart>1 or txEnd<=1)' | awk -v position=1 -f f.awk
GENOMIC
ADD COMMENT
1
Entering edit mode
10.6 years ago

You can very well use UCSC genome browser (table browser) or Ensemble get the position of introns and exons for a set of genes (if your data is huge).

In ucsc table browser set the following options:

clade: mammal     
genome:human    
assembly: Feb.2009(GRCH37/hg19)
group: gene and gene prediction tracks    
track: Refseq   
table: Refseq  
region:genome 
output format:  custom track

then click on getoutput. I hope from here i need not tell what to do

ADD COMMENT

Login before adding your answer.

Traffic: 2750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6