Question: From evidence-based alignments on de-novo assembly, to gene identification
0
gravatar for chefarov
20 months ago by
chefarov120
Greece
chefarov120 wrote:

I am trying to perform gene prediction after a de-novo assembly of dna-seq reads (E. Coli).

After producing the scaffolds I used bowtie2 to map ESTs (random ones from E. Coli) on the scaffolds. Thus I end up with sam/bam files that contain the alignments of the evidence-based data (e.g ESTs) to the scaffolds. My goal is to identify gene regions on the scaffolds.

The all time classic paper A beginner’s guide to eukaryotic genome annotation suggests to cluster the alignments in order to identify overlapping alignments and predictions. Any practical idea of how do I do that?

Thanks

PS1: I would prefer either a) any ideas of manual approach (simple steps) or b) python/BASH-based toolkits

PS2: An overview of the SAM file (alignments):

  @HD   VN:1.0  SO:unsorted @SQ SN:scaffold1|size105789 LN:105789 @SQ   SN:scaffold2|size142352 LN:142352 @SQ   SN:scaffold3|size57540  LN:57540 .... @SQ   SN:scaffold132|size37   LN:37 @PG   ID:bowtie2  PN:bowtie2  VN:2.3.3    CL:"/usr/bin/bowtie2-align-s--wrapper basic-0 -f -x SRR001665_scaffolds -S SRR001665_on_scaffolds.sam -U ESTS/seven_ests.fasta" 
  gi|14475471|gb|BI067949.1|    4   *   0   0   *   *   0   0   AGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTCTATCNNCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAAAGCACCGCCGGACATCAGCGCTATCTCTGCTCTCACTGCCGTAAAACATGGCAACTGCAGTTCACTTACACCGCTTCTCAACCCGGTACGCACCAGAAAATCATTGATATGGCCATGAATGGCGTTGGATGCCGGGCAACAGCCCGCATTATGGGCGTTGGCCTCAACACGATTTTACGTCACTTAAAAAACTCAGGCCGCAGTCGGTAACCTCGCGCATACAGCCGGGCAGTGACGTCATCGTCTGCGCGGAAATGGACGAACAGTGGGGCTATGTCGGGGCTAAATCGCGCCAGCGCTGGCTGTTTTACGCGTATGACAGTCTCCGGAAGACGGTTGTTGCGCACGTATTCGGTGAACGCACTATGGCGACGCTGGGGCGTCTTATGAGCCTGCTGTCACCCTTTGACGTGGTGATATGGATGACGGATGGCTGGCCGCTGTATGAATCCCGCCTGAAGGGAAAGCTGCACGTAATCAGCAAGCGATATACGCAGCGAATTGAGCGGCATAACCTGAATCTGAGGCAGCACCTNNNNCGNNN    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    YT:Z:UU 
  gi|14007620|gb|BG713670.1|    0   scaffold38|size43565    37568   42  629M    *   0   0 ACACAAAGAAAAATTGAATAAACTGTATGATTTAAAAGATTATCGGGAGAGTTACCTCCCGATATAAAAGGAAGGATTTACAGAATGTGACCTAAGGTCTGGCGTAAATGTGCACCGGAACCGAGAAGGCCCGGATTGTCATGGACGATGAGATACACCGGAATATCATGGACATATTCTTTAAAGCGCCCTTTATCTTCAAATGCGGCACGGAAACCGGAGGCTTTGAAGAACTCAAGGAAGCGCGGCACGATACCGCCCGCAATAAACACGCCGCCAAATGTCCCGAGATTGAGCGCCAGATTGCCGCCAAAACGGCCCATAATGACGCAAAACAGCGACAATGCGCGGCGGCAATCGGTGCAGCTGTCAGCCAGCGCGCGTTCGGTAATATCTTTTGGCTTGAGATTTTCTGGCAGGCGGTTGTCAGCTTTCACAATTGCGCGATACAAATTCACCAGCCCAGGGCCAGAAAGCACGCGCTCCGCCGAAACATGACCAATTTCCGCACGCAATATTTCGAGGATAATGGCCTCTTCTTCACTATTCGGCGCAAAATCAACGTGACCGCCTTCGCCTGGCAAGCTTACCCAACGCTTATCGACATGGACCNNNTGCGCAACCCCAAC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:-9 XN:i:0  XM:i:4  XO:i:0  XG:i:0  NM:i:4  MD:Z:612A0G0A13G0   YT:Z:UU 
  gi|14007330|gb|BG713380.1|    16  scaffold21|size132647   11225   40  484M    *   0   0   GGTTGGCTGGGGGTATTCTTGCCCGGGTCNNATACGTCATCTAACGCCCTGTTCGCCGCGCTGCAAGCCGCCGCAGCTCANCAAATTGGCGTCTCTGATCTGTTGNNGGTTGCCGCCAATACCACCGGTGGCGTCGCCGGTAAGATGATCTCCCCGCAATCTATCGCTATCGCCTGTACGGCGGTAGGCCTGGTGGGCAAAGAGTNNGATTTGTTCCGCTTTACTGTCAAACACAGCCTGATCTTCACCTGTATAGTGGGCGTGATCACCACGCTTCAGGCTTATGTCTTAACGTGGATGATTCCTTAATGATTGTTTTACCCAGACGCCTGTCAGACGAGGTTGCCGATCGTGTGCGGGCNNNNNNTGATGAAAAAAACCTGTAAGCGGGCATGAAGTTGCCCGCTGAGCGCCAACTGGNTATGCAACTCGGCGTATCACGTCATTCACTGCGCGAGGCGCTGGCAAAACTGGTGNNNGAAGG    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    AS:i:-83    XN:i:0  XM:i:28 XO:i:0  XG:i:0  NM:i:28 MD:Z:2C15C2A7G0G4C33A7A2A24C0T28A41G27C0T154G0C0T0G0A0T16G36C0G21A32A0G0T5  YT:Z:UU 
  gi|14007281|gb|BG713331.1|    16  scaffold21|size132647   11064   42  645M    *   0   0   NCNNNNNCGGCAGCACGCTGAAAGAACTGNCTCTGCCCATCTACTCCATCGGTATGGTGCTGGCATTCGCCTTTATTTCGAACTATTCCGGACTGTCATCAACACTGGCGCTGGCACTGGCGCACACCGGTCATGCATTCACCTTCTTCTCGCCGTTCCTCGGCTGGCTGGGGGTATTCCTGACCGGGTCGGATACCTCATCTAACGCCCTGTTCGCCGCGCTGCAAGCCACCGCAGCACAACAAATTGGCGTCTCTGATCTGTTGCTGGTTGCCGCCAATACCACCGGTGGCGTCACCGGTAAGATGATCTCCCCGCAATCTATCGCTATCGCCTGTGCGGCGGTAGGCCTGGTGGGCAAAGAGTCTGATTTGTTCCGCTTTACTGTCAAACACAGCCTGATCTTCACCTGTATAGTGGGCGTGATCACCACGCTTCAGGCTTATGTCTTAACGTGGATGATTCCTTAATGATTGTTTTACCCAGACGCCTGTCAGACGAGGTTGCCGATCGTGTGCGGGCGCTGATTGATGAAAAAAACCTGGAAGCGGGCATGAAGTTGCCCGCTGAGCGCCAACTGGCGATGCAACTCGGCGTATCACGTAATTCACTGCGCGAGGCGCTGGCAAAACTGGTGAGTGAAGG   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:-7 XN:i:0  XM:i:7  XO:i:0  XG:i:0  NM:i:7  MD:Z:0G1A0C0C0T0T22G615 YT:Z:UU 
  gi|14006980|gb|BG713030.1|    0   scaffold38|size43565    24794   42  449M    *   0   0   TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGTTGTCGGCGTCCATTTTATGTCGCTGGAACTG   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:449    YT:Z:UU 
  gi|14006658|gb|BG712708.1|    0   scaffold38|size43565    24794   42  417M1I32M   *   0   0   TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGNNNGTCGGCGTCCATTTTATGTCGCTGGAACTG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  AS:i:-10    XN:i:0  XM:i:2  XO:i:1  XG:i:1  NM:i:3  MD:Z:417T0T30   YT:Z:UU gi|14004118|gb|BG710168.1|  0   scaffold38|size43565    24794   42  449M    *   0   0   TGCGATACAACAATTCGTATCTACAGAAGGTAACTATGTTTCCACAATGCAAATTTTCCCGCGAGTTTCTACATCCTCGCTACTGGCTCACATGGTTTGGGCTTGGTGTACTCTGGCTTTGGGTACAGCTTCCTTATCCTGTTCTCTGCTTTCTCGGCACGCGTATTGGCGCAATGGCGCGACCATTCCTGAAACGTCGTGAATCTATCGCCCGTAAAAACCTGGAACTTTGTTTCCCGCAGCATTCTGCGGAAGAACGCGAGAAGATGATTGCCGAAAACTTTCGTTCACTCGGCATGGCGCTGGTAGAAACCGGCATGGCATGGTTCTGGCCCGACAGTCGCGTACGTAAATGGTTTGATGTTGAAGGGTTGGATAACCTTAAACGCGCACAAATGCAAAATCGCGGCGTAATGGTTGTCGGCGTCCATTTTATGTCGCTGGAACTG   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:449    YT:Z:UU 
  gi|226767304|gb|GO523315.1|   4   *   0   0   *   *   0   0   ACTGGGGAAACCTTGCAGTTACGGAACTTAAACGCCTGGCAGCACGTGCCCCTTTCAGCACCTGGCGTAATCCGGAAGAGGCCCGCACCAATCGCCCTTCCAACGTGATGCGCAGCCTGAATGGTCAATGGGACT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU 
  gi|209377782|gb|GE310270.1|   4   *   0   0   *   *   0   0   AGTTGTAGTTTTTCAACTCATAGATGAGCACTACCCCTTTTGGGGGTTAATCACAAGTTTATCACCGATTGATGGCCCTTAAAGGGGGATTTCTTCTGGAGTTTCCCCTTCACCTGATTTGCAGGAAAGTAAATCACCGCTTTCACAACAGTGACCCACTACTACACACTAAACAACTGGTAAATCTTTTTAAGAGGATTGATCTTAACCAAGCTTAACAATCTTAATTTAATGCTAGGCACCATAGAGTGATGGTCTAGTTATATCATTTAAACCTGAATTAACTTTAACAAATTGAAAGCCTGGCTCCTCATGAGACTAGTTCTTTGTGCTAACCATATCTACTATTTCACATAGTAGAATACCTGAGTTTGCTACTAGGAATGTTCCTGGCTCAATTTCAAGTTTTAAATTTCTTTGATTTTACTGGTTGAATTCATTGATCTTATTTACTGT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    YT:Z:UU
ADD COMMENTlink modified 20 months ago • written 20 months ago by chefarov120
2

You could use a tool like Prokka to do gene identification/annotation easily. NCBI also makes their prokaryotic annotation pipeline available, assuming you will be making this data public at some point.

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax68k

Thank you for your reply. Unfortunately this tool isn't suitable for me for several reasons: a) too complicated for what I want, b) doesn't have proper documentation, c) written in bioperl which means it will be difficult for me to integrate it in my python/bash based pipeline. I am more interested in a manual approach (simple steps) to gene identification. However if you could suggest a python-based equivalent tool or a simpler tool I could take an inside look at it. Thanks again :) I have updated my question to clarify this.

ADD REPLYlink modified 20 months ago • written 20 months ago by chefarov120
1

prodigal does an excellent job in predicting genes. I would suggest running cmscan (RFAM) on top of that. That's basically what prokka does (+ some other things). prodigal is standalone. You could use RNAseq data to enhance the computational predictions, you can map the reads->bed file->merge bed using bedtools.

ADD REPLYlink written 20 months ago by Asaf5.6k

Thank you Asaf, prodigal seems a very good choice! I am most probably going to use that. Do you have in mind anything equivalent to propose for eykariotic genomes?

ADD REPLYlink written 20 months ago by chefarov120
1

Unfortunately no. I'm not aware of a good public tool for eukaryotes.

ADD REPLYlink written 20 months ago by Asaf5.6k
1

Indeed things in eukaryotes are more complicated, thus there isn't a simple supervised learning tool like prodigal. Moreover as a future reference, for people coming up to this thread, I was wrong before about lack of documentation in PROKKA, since I just found an external source http://metagenomics-workshop.readthedocs.io/en/latest/annotation/index.html which is a tutorial (steps) for annotation.

ADD REPLYlink written 20 months ago by chefarov120

A final question that I have as regards the use of rna-seq data to enhance the predictions: As I understand you mean to map the reads to the scaffolds and then combine somehow the two separate results (1: genes from prodigal, 2: alignments), using bedtools? I am asking because the only way that I know to combine evidence-based data with ab-initio results is to feed the ab-initio predictor with the evidence-based data at runtime (something that prodigal doesn't seem to be capable of). Thanks again!

ADD REPLYlink written 20 months ago by chefarov120
1

Prodigal will give you the coding regions while the RNA-seq results will give you the transcripts, including ncRNAs. If you'll combine them you'll get the genes with the UTRs. I'm not aware of a tool that does this combination neatly, bedtools might be a good tool to get the ORFs from prodigal aligned with the experimental transcripts.

ADD REPLYlink written 20 months ago by Asaf5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour