Question: Bed File With Introns Only
gravatar for Pfs
7.8 years ago by
United States
Pfs490 wrote:

How can I make a BED (or other format) file with introns only, starting with the GTF (or similar) file?

Thanks in advance.

ucsc bed intron browser • 14k views
ADD COMMENTlink modified 3.8 years ago by ####190 • written 7.8 years ago by Pfs490

also see responses to this question:

ADD REPLYlink written 7.8 years ago by brentp23k
gravatar for Malachi Griffith
7.8 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Following is a set of detailed instructions on how to get a BED file of all introns from the UCSC table browser. Note that most of the following options will be set by default. So the number of steps required is not as bad as it seems

  1. Go to the UCSC table browser.
  2. Select desired species and assembly
  3. Select group: Genes and Gene Prediction Tracks
  4. Select track: UCSC Genes (or Refseq, Ensembl, etc.)
  5. Select table: knownGene
  6. Select region: genome (or you can test on a single chromosome or smaller region)
  7. Select output format: BED - browser extensible data
  8. Enter output file: UCSC_Introns.tsv
  9. Select file type returned: gzip compressed
  10. Hit the 'get output' button
  11. A second page of options relating to the BED file will appear.
  12. Under 'create one BED record per:'. Select 'Introns plus'
  13. Add desired flank for introns being returned, or leave as 0 to get just the introns
  14. Hit the 'get BED' option

You will get output that looks like this for every UCSC gene:

chr3    124449474    124453939    uc003ehl.3_intron_0_0_chr3_124449475_f    0    +
chr3    124454093    124456414    uc003ehl.3_intron_1_0_chr3_124454094_f    0    +
chr3    124457086    124458870    uc003ehl.3_intron_2_0_chr3_124457087_f    0    +
chr3    124459046    124460998    uc003ehl.3_intron_3_0_chr3_124459047_f    0    +
chr3    124461113    124462761    uc003ehl.3_intron_4_0_chr3_124461114_f    0    +

As a sanity check you can go back to the UCSC genome browser, select add custom tracks, paste in some of your BED data (such as the block above corresponding to the human gene UMPS on hg19), hit 'submit', and then go to genome browser. The result should look something like this:

alt text

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Malachi Griffith17k

This doesn't answer how to convert a given GTF file.

ADD REPLYlink written 3.2 years ago by SmallChess490

This is very useful. For some reason, this worked for UCSC and Refseq genes but not for Ensembl. Any suggestions? Thanks!

ADD REPLYlink written 2.7 years ago by mmitra30
gravatar for biorepine
6.8 years ago by
biorepine1.4k wrote:
  1. convert gtf to bed using this script

  2. convert bed to either exons or introns using this script

ADD COMMENTlink written 6.8 years ago by biorepine1.4k
gravatar for Xianjun
4.3 years ago by
Great Boston Area
Xianjun250 wrote:

Here is an easy example code to convert bed12 --> intron, 5' UTR, 3' UTR, CDS etc.

If you want to get meta-intron (i.e. merge overlapped introns from one gene into one intron), you can use the code snip below:

cat exons.meta.bed | sort -k4,4 -k2,2n | awk '{OFS="\t"; if($4!=id) {if(e!="") print chr,s,e,id,1,str; chr=$1;s=$3;id=$4;str=$6;e="";} else {e=$2;print chr,s,e,id,1,str;s=$3;e="";}}END{if(e!="") print chr,s,e,id,1,str;}' > introns.meta.bed

where exons.meta.bed is in a bed6 format with gene_ID (e.g. ENSGxxxx) as name.


ADD COMMENTlink written 4.3 years ago by Xianjun250
gravatar for Chuangye
7.8 years ago by
Chuangye80 wrote:

If you have known the organism, please use the "Table" utilities of UCSC genome browser.

ADD COMMENTlink written 7.8 years ago by Chuangye80

I looked at it but I can download a BED file with the exons information. Are you suggesting that I perfom some kind of set-complement operation, where I remove the exon segments from the gene segment? I assume it would work, but I was hoping for a ready-made solution. Thanks!

ADD REPLYlink written 7.8 years ago by Pfs490

In the UCSC Genome browser's table browser, if you select any gene type track, you should use the "Introns plus X bases" option on the form which follows clicking "Get output".

ADD REPLYlink written 7.8 years ago by Eric Fournier1.4k
gravatar for ####
3.8 years ago by
####190 wrote:

Thank you for the answer

ADD COMMENTlink written 3.8 years ago by ####190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 540 users visited in the last hour