Question

Annotating Sequences For Gbrowse - Which Is The Database And Which Is The Query?

3

Entering edit mode

13.9 years ago

Jeremy Leipzig 22k

Let's say I have some small sequences that I wish to display in Gbrowse. I want to create tracks from Blast results to show where genic regions might be.

Do I create a Blast index of the small sequences or of the known gene database?

If I use the gene database as an index, are blast-to-gff conversion scripts designed to use query coordinates instead of reference coordinates?

blast gff • 3.0k views

ADD COMMENT • link updated 13.9 years ago by Neilfws 49k • written 13.9 years ago by Jeremy Leipzig 22k

Ram · Answer 1 · 2010-06-02

3

Entering edit mode

13.9 years ago

Neilfws 49k

GBrowse uses GFF files, in which column 1 is described as "The ID of the landmark used to establish the coordinate system for the current feature." So, you want "reference" coordinates.

The best way to think about this is that both your known genes and your BLAST alignments are features which can be mapped to a chromosome. Your BLAST database should not be the small sequences, but I'm not sure that it should be the "known gene database" either. I would approach this by creating a known gene track with chromosome as the reference and a BLAST track by BLASTing the small sequence (query) against the chromosome (database).

Alternatively, it may be that you just want to show the BLAST alignment compared with a gene, in which case the BLAST database is known genes and you'd be creating a large number of "overview" features (one for each gene), with the reference coordinate system going from gene start to gene end. This could get quite messy in GBrowse.

If you're interested in "gene-centric" visualisations, it may be better to use Bioperl's Bio::Graphics module to generate individual PNG plots per gene.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.9 years ago by Neilfws 49k

0

Entering edit mode

hmm there is are no chromosomes available yet for this species and my known genes are actually ESTs (I oversold that to reduce confusion). Someone must have a pipeline for this type of small-scale BAC visualization.

ADD REPLY • link 13.9 years ago by Jeremy Leipzig 22k

0

Entering edit mode

You can use contig IDs in the first column of GFF. Check ESTs/genes for repeat sequences, then use them as a query against contigs DB. Watch for long fasta headers in both (= create your own shorter & uniq ids if needed).

ADD REPLY • link 13.9 years ago by Darked89 4.6k

0

Entering edit mode

OK. I wrote the answer late last night, so apologies for lack of clarity. You definitely want to BLAST small sequences (query) versus ESTs (database). I'd still consider generating plots per EST, rather than GBrowse, if the ESTs are not mapped to some kind of larger reference sequence.

ADD REPLY • link 13.9 years ago by Neilfws 49k