Question: Augustus gene prediction for non model organism
1
gravatar for lakhujanivijay
2.3 years ago by
lakhujanivijay4.5k
India
lakhujanivijay4.5k wrote:

I am trying to use augustus for gene prediction of a non model organism and currently looking at this link.

They have used blat for alignment to generate "hint" file but since I already have transciptome data on Illumina, I want to first generate a bam file (may be by using bowtie) and then convert it to wig file (as per the tutorial)

  1. Bigwig format is recommended over wig as per UCSC. How should I proceed? There are programs available for sam/bam to bigwig conversion. If I somehow convert sam to bigwig, will Augustus support it?

  2. Is there any alternative approach for gene prediction in such cases? Any link to tutorial, tool or pointers will be highly appreciated. I am quite flexible with the approach/tool/method.

ADD COMMENTlink modified 21 months ago • written 2.3 years ago by lakhujanivijay4.5k
1
gravatar for Macspider
2.3 years ago by
Macspider3.0k
Vienna - BOKU
Macspider3.0k wrote:

I benchmarked gene prediction with Augustus for around 1 year. The standard procedure includes:

  1. Mapping of RNASeq reads (bam output)
  2. Obtain intron hints with bam2hints
  3. Obtain coverage bigwig file with bam2wig
  4. Obtain exon hints with wig2hints
  5. Combine hints into gff hintsfile and feed it to Augustus

There are some interesting checkpoints that you could add to improve you results.

  1. Pile up as many RNASeq public and private you can.
  2. Map with two mapping algorithms: perhaps HISAT2 and BLAT, so you have high confidence mapping with HISAT2 and lower seq identity mapping with BLAT, so that you can account for different samples that may come from slightly different genomes. Hints from HISAT2 come first, in case of conflict.
  3. Read this and pay attention to the RNASeq-noise-reduction.pl script.

Hope this will help you!

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Macspider3.0k

Hi Macspider

Thanks for the inputs. I mapped transcriptome data to the genome and got the bam file. Later sorted it by coordinates and used bam2hints to get the hints file. There is no error message, however, the hints.gff file is empty

My Commands

1. Mapping by bowtie2

bowtie2 -p 16 -x my_assembly -1 my_data_R1.fastq.gz -2 my_data_R2.fastq.gz -S my_data.sam

samtools view -bS  my_data.sam > my_data.bam

samtools sort my_data.bam my_data.sorted

2. BAM to hints file

bam2hints --in my_data.sorted.bam --out hints.gff

I see a message at the end - which does not look like an error

Wait a moment, calculating maximum block size that needs to be allocated... .. done
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by lakhujanivijay4.5k

I used bam2hints to calculate only introns, with --intronsonly. However, this flag is deprecated because it's the default now so shouldn't be the issue.

How did you sort your bam file? By name or by coordinate?

ADD REPLYlink written 2.3 years ago by Macspider3.0k

default, by coordinate

ADD REPLYlink written 2.3 years ago by lakhujanivijay4.5k

Read this: http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/augustus/docs/readme.rnaseq.html

You might find useful info!

ADD REPLYlink written 2.2 years ago by Macspider3.0k
1
gravatar for lakhujanivijay
21 months ago by
lakhujanivijay4.5k
India
lakhujanivijay4.5k wrote:

I am finding this tutorial helpful, just in case if anybody wants -

ADD COMMENTlink written 21 months ago by lakhujanivijay4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1890 users visited in the last hour