Question: Augustus gene prediction for non model organism
1
gravatar for Vijay Lakhujani
21 months ago by
Vijay Lakhujani4.0k
India
Vijay Lakhujani4.0k wrote:

I am trying to use augustus for gene prediction of a non model organism and currently looking at this link.

They have used blat for alignment to generate "hint" file but since I already have transciptome data on Illumina, I want to first generate a bam file (may be by using bowtie) and then convert it to wig file (as per the tutorial)

  1. Bigwig format is recommended over wig as per UCSC. How should I proceed? There are programs available for sam/bam to bigwig conversion. If I somehow convert sam to bigwig, will Augustus support it?

  2. Is there any alternative approach for gene prediction in such cases? Any link to tutorial, tool or pointers will be highly appreciated. I am quite flexible with the approach/tool/method.

ADD COMMENTlink modified 15 months ago • written 21 months ago by Vijay Lakhujani4.0k
1
gravatar for Vijay Lakhujani
15 months ago by
Vijay Lakhujani4.0k
India
Vijay Lakhujani4.0k wrote:

I am finding this tutorial helpful, just in case if anybody wants -

ADD COMMENTlink written 15 months ago by Vijay Lakhujani4.0k
0
gravatar for Macspider
21 months ago by
Macspider2.8k
Vienna - BOKU
Macspider2.8k wrote:

I benchmarked gene prediction with Augustus for around 1 year. The standard procedure includes:

  1. Mapping of RNASeq reads (bam output)
  2. Obtain intron hints with bam2hints
  3. Obtain coverage bigwig file with bam2wig
  4. Obtain exon hints with wig2hints
  5. Combine hints into gff hintsfile and feed it to Augustus

There are some interesting checkpoints that you could add to improve you results.

  1. Pile up as many RNASeq public and private you can.
  2. Map with two mapping algorithms: perhaps HISAT2 and BLAT, so you have high confidence mapping with HISAT2 and lower seq identity mapping with BLAT, so that you can account for different samples that may come from slightly different genomes. Hints from HISAT2 come first, in case of conflict.
  3. Read this and pay attention to the RNASeq-noise-reduction.pl script.

Hope this will help you!

ADD COMMENTlink modified 21 months ago • written 21 months ago by Macspider2.8k

Hi Macspider

Thanks for the inputs. I mapped transcriptome data to the genome and got the bam file. Later sorted it by coordinates and used bam2hints to get the hints file. There is no error message, however, the hints.gff file is empty

My Commands

1. Mapping by bowtie2

bowtie2 -p 16 -x my_assembly -1 my_data_R1.fastq.gz -2 my_data_R2.fastq.gz -S my_data.sam

samtools view -bS  my_data.sam > my_data.bam

samtools sort my_data.bam my_data.sorted

2. BAM to hints file

bam2hints --in my_data.sorted.bam --out hints.gff

I see a message at the end - which does not look like an error

Wait a moment, calculating maximum block size that needs to be allocated... .. done
ADD REPLYlink modified 21 months ago • written 21 months ago by Vijay Lakhujani4.0k

I used bam2hints to calculate only introns, with --intronsonly. However, this flag is deprecated because it's the default now so shouldn't be the issue.

How did you sort your bam file? By name or by coordinate?

ADD REPLYlink written 21 months ago by Macspider2.8k

default, by coordinate

ADD REPLYlink written 21 months ago by Vijay Lakhujani4.0k

Read this: http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/augustus/docs/readme.rnaseq.html

You might find useful info!

ADD REPLYlink written 21 months ago by Macspider2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour