Question: How to add gene annotation to a UCSC assembly hub?
4
gravatar for Ian
4.3 years ago by
Ian5.5k
University of Manchester, UK
Ian5.5k wrote:

I am making my first UCSC assembly hub to display a non-UCSC annotated genome within the browser.  All is well except that I cannot work out how to add the gene annotation, which is currently in GFF3 format.  I am aware that track hubs only except the "big" file versions, so presumably a bigBed version of the annotation is required.  Does anyone know of a handy method of converting GFF3 to BED/bigBED?  I think BED12 is required I to retain the differentiation between CDS, UTR and introns...

Thank you!

P.S. I have Googled this! Convert .Gff3 File To 12-Column .Bed File is a help, but I would be interested to know if there have been developments since then.

EDIT:
GTF or GFF2 can be used for gene annotation!

 

 

gtf hub ucsc assembly gff3 • 2.2k views
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Ian5.5k
4
gravatar for Ian
4.3 years ago by
Ian5.5k
University of Manchester, UK
Ian5.5k wrote:

In the end I contacted UCSC browser directly.  I got a helpful and detailed reply that I have edited to make it clearer how the necessary programs can be obtained.  This is run in 64bit Linux.  IMPORTANT NOTE: my question specified GFF3 as the starting format for the annotation, but it appeared to be much easy using GTF / GFF2.

 

Fetch the programs:

wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faToTwoBit
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitInfo
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gtfToGenePred
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/extractGtf.pl
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/genePredToBed
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/genePredCheck
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ixIxx
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
chmod +x genePredToBed genePredToBed genePredCheck bedToBigBed faToTwoBit twoBitInfo ixIxx

 

Download Perl scripts from their GIT repository:
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/utils/automation

extractGtf.pl
ensemblInfo.pl

 

Method:

# Create twoBit version of genome
faToTwoBit genome.fa genome.2bit

# Get chromosome length from twoBit genome
twoBitInfo genome.2bit stdout | sort -k2rn > genome.chrom.sizes

# Convert GTF annotation to genePred format
gtfToGenePred -infoOut=infoOut.txt -genePredExt genome.gtf genome.gp

# Check the genePred output is valid
genePredCheck genome.gp

# Convert genePred format to BED format
genePredToBed genome.gp stdout | sort -k1,1 -k2,2n > genome.bed

# Convert BED to bigBed
# extraIndex required for position/search
bedToBigBed -type=bed12 -extraIndex=name genome.bed genome.chrom.sizes genome.bb

# Required for indexing step
grep -v "^#" infoOut.txt | awk '{printf "%s\t%s,%s,%s,%s,%s\n", $1,$2,$3,$8,$9,$10}' > genome.nameIndex.txt

# Create index for position/search function in browser
ixIxx genome.nameIndex.txt genome.nameIndex.ix genome.nameIndex.ixx

 

 


 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Ian5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1715 users visited in the last hour