Augustus gene prediction for non model organism
2
1
Entering edit mode
5.4 years ago

I am trying to use augustus for gene prediction of a non model organism and currently looking at this link.

They have used blat for alignment to generate "hint" file but since I already have transciptome data on Illumina, I want to first generate a bam file (may be by using bowtie) and then convert it to wig file (as per the tutorial)

  1. Bigwig format is recommended over wig as per UCSC. How should I proceed? There are programs available for sam/bam to bigwig conversion. If I somehow convert sam to bigwig, will Augustus support it?

  2. Is there any alternative approach for gene prediction in such cases? Any link to tutorial, tool or pointers will be highly appreciated. I am quite flexible with the approach/tool/method.

gene prediction augustus non-model • 9.8k views
ADD COMMENT
2
Entering edit mode
4.8 years ago

I am finding this tutorial helpful, just in case if anybody wants -

ADD COMMENT
1
Entering edit mode
5.4 years ago
Macspider ★ 3.6k

I benchmarked gene prediction with Augustus for around 1 year. The standard procedure includes:

  1. Mapping of RNASeq reads (bam output)
  2. Obtain intron hints with bam2hints
  3. Obtain coverage bigwig file with bam2wig
  4. Obtain exon hints with wig2hints
  5. Combine hints into gff hintsfile and feed it to Augustus

There are some interesting checkpoints that you could add to improve you results.

  1. Pile up as many RNASeq public and private you can.
  2. Map with two mapping algorithms: perhaps HISAT2 and BLAT, so you have high confidence mapping with HISAT2 and lower seq identity mapping with BLAT, so that you can account for different samples that may come from slightly different genomes. Hints from HISAT2 come first, in case of conflict.
  3. Read this and pay attention to the RNASeq-noise-reduction.pl script.

Hope this will help you!

ADD COMMENT
0
Entering edit mode

Hi Macspider

Thanks for the inputs. I mapped transcriptome data to the genome and got the bam file. Later sorted it by coordinates and used bam2hints to get the hints file. There is no error message, however, the hints.gff file is empty

My Commands

1. Mapping by bowtie2

bowtie2 -p 16 -x my_assembly -1 my_data_R1.fastq.gz -2 my_data_R2.fastq.gz -S my_data.sam

samtools view -bS  my_data.sam > my_data.bam

samtools sort my_data.bam my_data.sorted

2. BAM to hints file

bam2hints --in my_data.sorted.bam --out hints.gff

I see a message at the end - which does not look like an error

Wait a moment, calculating maximum block size that needs to be allocated... .. done
ADD REPLY
0
Entering edit mode

I used bam2hints to calculate only introns, with --intronsonly. However, this flag is deprecated because it's the default now so shouldn't be the issue.

How did you sort your bam file? By name or by coordinate?

ADD REPLY
0
Entering edit mode

default, by coordinate

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6