Question: Merging/Intersecting Different Gene Annotations - Should I Extend Coordinates?
0
gravatar for PoGibas
5.5 years ago by
PoGibas4.7k
Vilnius
PoGibas4.7k wrote:

I want to create gene data-set (as big as possible), hence I am using several gene annotations. However, genes in different annotations overlap (it's the same gene). For reducing biases I overlap different annotations and if genes overlap leave only one gene.

Question:

To ensure this overlap I was thinking to expand gene coordinates - is this necessary? If so, how big extension should be (5bp/100bp)?

Example:

Want to create lncRNA data-set (in the following steps it will be used to search for genomic features).
Input:

  1. GENCODE lncRNA annotation (version 18 - 04/09/2013);
  2. Cabili lncRNA annotation (Cabili et al., 2011 (CSHLP)).

Workflow:

  1. Extract GENCODE genes start/end coordinates;
  2. Extract Cabili genes start/end coordinates;
  3. Extend Cabili coordinates ( -/+ nbp );
  4. Use BedTools intersect;
  5. If genes intersect leave GENCODE gene (as it's a newer annotation (though this step is really subjective)).

I do realize that this extension question depends on the situation and how reliable annotation is, but still hope that someone could suggest something.

bedtools merge • 2.2k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 5.5 years ago by PoGibas4.7k

What do you plan on doing with this dataset?

ADD REPLYlink written 5.5 years ago by Damian Kao15k

I updated my question: "in the following steps it will be used to search for genomic features"

ADD REPLYlink written 5.5 years ago by PoGibas4.7k
1

You should think about what you exactly will want to do with these features. For RNA-seq? For wetlab (primers/probes..)? For phylogenetic studies? Your strategy of how you want to merge the features might be different for these purposes. There probably isn't one single method of merging these annotations that will be good for all purposes.

ADD REPLYlink written 5.5 years ago by Damian Kao15k

This should be simply enrichment analysis for any feature (e.g., sequence motif, chromatin modification, repeat count).

ADD REPLYlink written 5.5 years ago by PoGibas4.7k
1
gravatar for JacobS
5.5 years ago by
JacobS890
Cleveland, Ohio
JacobS890 wrote:

My first instinct is that arbitrarily extending coordinates to try to resolves differences between two annotations is a dangerous practice. You wouldn't want to accidently combine two nearby features of no related function just because of their proximity. I'm not sure what kind of organism you are working with, but there are such things as annotation combiners that are specifically designed to use various forms of evidence from several programs to build a final, comprehensive annotation. JIGSAW comes to mind, and a quick websearch found this link, but you should search for other combiners to fit your need. For example, I think JIGSAW is only for eukaryotes, while something like GenePRIMP is only for prokaryotes.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by JacobS890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1030 users visited in the last hour