Question: Is A Genome Position In An Exon Or Intron?
2
gravatar for Justin
5.0 years ago by
Justin440
United States
Justin440 wrote:

Say I have a position in the hg19 reference genome, e.g. chr1:56.

How do I know programmatically if it's contained in an intron or an exon or otherwise?

exon intron • 3.2k views
ADD COMMENTlink modified 5.0 years ago by Pierre Lindenbaum112k • written 5.0 years ago by Justin440
9
gravatar for Pierre Lindenbaum
5.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

use the following awk and the mysql server of the UCSC genome browser with the table knownGene ( http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.sql ):

BEGIN   {
    FS="[\t]";
    result="GENOMIC";
    # warning: ucsc positions are 0-based 
    POS=int(position);
    }

    {
    if(result=="EXON") next;
    txStart=int($4);
       txEnd=int($5);
        if(txStart>=txEnd) next;
        result="INTRON_OR_UTR";

    exonCount=int($8);
    split($9,exonStarts,"[,]");
    split($10,exonEnds,"[,]");
    len=0;

    for(i=1;i<= exonCount;i++)
        {
        if(POS>=int(exonStarts[i]) && POS<int(exonEnds[i]))
            {
            result="EXON";
            break;
            }
        }
    }
END    {

    printf("%s\n",result);

    }

examples:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'select * from knownGene where chrom="chr3" and NOT(txStart>184035527 or txEnd<=184035527)' | awk -v position=184035527 -f f.awk
EXON

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'select * from knownGene where chrom="chr3" and NOT(txStart>184035441 or txEnd<=184035441)' | awk -v position=184035441 -f f.awk
INTRON_OR_UTR

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'select * from knownGene where chrom="chr3" and NOT(txStart>1 or txEnd<=1)' | awk -v position=1 -f f.awk
GENOMIC
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Pierre Lindenbaum112k
1
gravatar for ancient_learner
5.0 years ago by
India
ancient_learner610 wrote:

You can very well use UCSC genome browser (table browser) or Ensemble get the position of introns and exons for a set of genes (if your data is huge).

In ucsc table browser set the following options:

clade: mammal     
genome:human    
assembly: Feb.2009(GRCH37/hg19)
group: gene and gene prediction tracks    
track: Refseq   
table: Refseq  
region:genome 
output format:  custom track

then click on getoutput. I hope from here i need not tell what to do

ADD COMMENTlink written 5.0 years ago by ancient_learner610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2042 users visited in the last hour