What Steps Do You Take To Re-Annotate Sequences?
1
1
Entering edit mode
12.6 years ago
Matt ▴ 70

What steps do you take to re-annotate sequences to find out the gene or marker?

Give an input file of 1 to N rows of just identifier (unique) and sequence (As,Cs,Gs,Ts). What are the best steps to annotate?

  • Blast? Which Blast?
  • Filter results to unique matches, multiple matches, and no matches? How do you handle multiple matches?
  • Find coordinates and "match" to NCBI, Ensemble, others?

Any tools or high level workflow that you could share would be GREATLY appreciated.

sequence annotation gene gene function • 2.0k views
ADD COMMENT
0
Entering edit mode

If these are something like microarray probes, you might consider answers to this question:

A: Pipeline To Map 60-Mers To Genes

ADD REPLY
1
Entering edit mode
12.6 years ago

What I will do in this space is give you the types of data that are at the high level hierarchy of my human genome database. These data types will likely be applicable to most eukaryotic gene annotation efforts. Prokaryote gene annotation is not something with which I have much experience and so should not offer suggestions, other than to say I think the environment in which the organism normally lives and was isolated is important. I have a lot of gene expression and protein expression (proteomics) data that I use to ascertain function or candidacy for further experiments in our lab, but I rely on all types of data to make that call.

As I work with human data, I am not so concerned with mapping sequence by BLAST. I simply note the genome build and the associated gene coordinates, ignoring exon coords (for now).

[?]

Lastly, I have a free-text "knowledge" field where I enter info from lab meeting, abstracts, etc. (typically with reference).

In addition, I have a metabolite database where I link small molecule to gene.

That list should give you some good ideas on what to collect in order to more confidently describe the potential function of a gene and its encoded protein.

ADD COMMENT
0
Entering edit mode

I am searching for a "pipeline" process for this.

Step A. BLAST Step B. Take output from A and do... Step B1. Take unique hits and keep. Step B2. Take non hitters. Step B3. Take multiple hits and ... Step C. Take output from Step B and... Use some program that will find genes based upon locations? Step D. ...

ADD REPLY
0
Entering edit mode

Sure, I understand. The topics I present above would be run as parallel processes, with the exception that Gene IDs and Mapping would be done up front. For example, collecting synteny info and gene expression data can be done simultaneously or independent of one another. BLAST is not necessary for this. I work with gene IDs and the mapping info to proceed to each of the subsequent, parallel steps in data gathering/annotation.

ADD REPLY

Login before adding your answer.

Traffic: 2525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6