How to select the best gene/transcriptomics from multiple hits?
3 months ago
a.bibek52 • 0


I have been trying to reconstruct the plastome of Arabidopsis thaliana (GenBank: MZ323108.1) using the GeSeq ( To reconstruct/reannotate the plastome, I downloaded the FASTA and GenBank (.gb) files. I uploaded the FASTA file to the GeSeq website for annotation and selected the following criterion for annotation:

  1. Fasta file to annotate: Circular, Plastid (land plant), Annotation Options: Annotate plastid Inverted Repeat (IR) and Annotate plastid trans-spliced rps12, Annotation Support: Support annotation by Chloƫ, Annotation revision: Keep the best annotation only.
  2. BLAT Search: Protein Identity = 50; rRNA, tRNA, DNA search identity = 90; Annotate: CDS, rRNA, tRNA; Options: Ignore genes annotated as locus tag
  3. HMMER Profile Search: Chloroplast land plants, Annotate CDS + rRNA
  4. 3rd Party tRNA Annontators: ARAGORN v1.2.38 and tRNAscan-SE v2.0.7 with default parameter
  5. 3rd Party Stand-Alone Annotators: Chloƫ v0.1.0 (Annotate CDS + tRNA + rRNA)

So, when I observe the result, I find some of the genes are annotated multiple times with different CDS, protein sizes, and structures and gives the OGDRAW showing 2,3 arrows to the same gene. I also used NCBI Blast to see whether they give hits or not and found all the hits are valid. Now, I am confused about what CDS should be considered.

Moreover, while using the Genbank (.gb) file with OGDRAW, it does not provide multiple naming for the same gene or multiple locations for the same gene.

Therefore, anyone who has been to this issue earlier and has a valid suggestion, please help me.

Thank you.

