getting "std::bad_alloc" message when using filterBam (augustus3.1)
1
0
Entering edit mode
8.7 years ago
freddiejung ▴ 60

Hi,

I am trying to construct a gene model of new species using augustus3.1.

I have some RNAseq data, so I utilized it for 'intron hints'.

Augustus tutorial says FilterBam program can be used for more accurate gene prediction (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=IncorporatingRNAseq.Tophat), so I decided to use it.

However, everytime I run filterBam, it always ended up with message like this:

processed line 74100terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc"

Despite of this message filtered BAM file was generated but I do not know this BAM file is reliable or not.

Where I went wrong?

paired-end augustus filterBam • 4.3k views
ADD COMMENT
0
Entering edit mode
8.7 years ago
Mehmet ▴ 820

This can be because of insufficient memory. Which platform are you running Augustus? (I guess you are in Japan and use supercomputer system like I do?).

ADD COMMENT
0
Entering edit mode

Hi, Mehmet,

I run Augustus on my Mac PC(3.5Ghz quad core Intel core i5, 16G 1,600Mhz DDR3 SDRAM).

ADD REPLY
0
Entering edit mode

OK. As far as I understand from your e-mail, you want to predict genes for your new species and you want to use RNA-seq hints (Intron, Exon and Intron+Exon hints, separately).For this,

  1. Produce intron hints:

    You don't need to filter bam file. instead of this, you should you do:

    augustus-3.0.1/auxprogs/bam2hints/bam2hints --intronsonly --minintronlen= --in=yourbamfile --out=intronhints.gff
    #you can write, for example 15. for minintronlen option. (filtering bam file takes long time depending on the size of your bam file and also I tried filtering before but it is not so good. The filterbam script is not working well).
    
  2. Produce exon hints:

    cufflinks options yourbamfile.
    #this will produce a transcripts.gtf file which contains only exon parts (exonhints). then you need yo convert it into .gff format. For this;
    cufflinkGTF2augustusExonParthints.pl transcripts.gtf      #this will produce a transcripts.gtf.augustusexonpart.gff file
    
  3. Exon+intron hints:

    cat intronhints.gff transcripts.gtf.augustusexonpart.gff
    #then you can use these three hint files as a hint file in augustus separately.
    #I use Japanese supercomputer system for all bioinformatics analyses. You should talk to your supervisor to get an access to the supercomputer system. Otherwise, with your personal computer your analyses will take too long, and for another analyses your RAM will not be sufficient.
    

if you need more help, please ask.

ADD REPLY
0
Entering edit mode

Mehmet,

Thank you for answering my question!

I try steps you suggested, but I have a quick question.

augustus manual says it is better to run augustus (Bowtie/Tophat mapping) with untrimmed reads, and when I created intron-hints through Bowtie/Tophat-augustus pipeline, I utilized untrimmed RNAseq reads and obtained "accepted_hits.bam" file.

Is this file usable for getting cufflink transcript.gtf file?

Or should I do quality trimming?

ADD REPLY
0
Entering edit mode

Hi,

Yes it is useful. You should use accepted_hits.bam file to create intron and exon parts. It is okay. I always use accepted_hits.bam file to get hints.

ADD REPLY
0
Entering edit mode

Thank you so much for your kind advice.

I am now merging transcript.gtf from different samples by cuffmerge.

Where can I get cufflinkGTF2augustusExonParthints.pl?

Is the gtf2gff.pl in /scripts dir not compatible with cufflink output *.gtf files?

ADD REPLY
0
Entering edit mode

The script that I sent you was written by my friend. You just need to convert gtf to gff. You can use it with --printExon option.

ADD REPLY
0
Entering edit mode

Hi, Can I have one more question?

(I have been trying up with masking my species' genome with RepeatMasker and RepeatModeler, and it took me for almost 2 weeks!)

Anyway, I just merged hints(intronhint, exonhint, repeat-sequencehint) and ran augustus.

Then got this message:

No source specified (e.g. by source=M in the last column)

Error in hint line: scaffold*XX*_cov*XXX*    Cufflinks    exon   * ddddddd*    *ddddddd*    .    .    .    gene_id "*XXXX*_*dddddd*"; transcript_id "*XXXX*_*dddddd*"; exon_number "*d*"; oId "CUFF.*ddddd*.*d*"; tss_id "TSS*ddddd*";

...

Could not read strand.
Maybe you used spaces instead of tabulators?

I think this message means my exon hints were not used for gene prediction.

If you don't mind, please help me for solving this problem.

ADD REPLY
0
Entering edit mode

Hi,

Why did you use masked sequence? You don't need to use it.

Can you write your Augustus command which was used for gene prediction with RNA-seq hints.

This is the command that I used:

/home/user/applications/augustus.2.5.5/bin/augustus \
  --species=myspecies \
  --extrinsicCfgFile=/home/user/applications/augustus.2.5.5/config/extrinsic/extrinsic.M.RM.E.W.cfg \
  --alternatives-from-evidence=true \
  --alternatives-from-sampling=false \
  --hintsfile=yourfile.gff \
  --gff3=on genome.fa > output

For repeat masker and repeat modeller:

Could you run these two programs? if no, I can show you how to use.

ADD REPLY
0
Entering edit mode

Thank you,

This is the command I used:

augustus \
  --species=My_species \
  --exonnames=on \
  --codingseq=on \
  --protein=on \
  --extrinsicCfgFile=./param_file/extrinsic.M.RM.E.W.cfg \
  --alternatives-from-evidence=true \
  --hintsfile=merged_hint.gff \
  --allow_hinted_splicesites=atac genome.fa > output

Why did you use masked sequence? You don't need to use it.

To be exact, I ran augustus on the unmasked genome("genome.fa") but supplied repeat information as nonexonpart hints.

I generated repeat-sequece hint from output file of RepeatMasker (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.IncorporateRepeats) and merged it with other hint files (exon hints, intron hints) when I ran augustus.

ADD REPLY
0
Entering edit mode

You can use masked genome file as genome file, but you don't need to use repeated sequence as hints file. Instead of this, you can remove repeats ,which were identified by repeat masker, using HaploMerger tool. HaploMerger removes repeats from your genome.

1.RepeatModeler finds repeats

  1. RepeatMasker masks them
  2. HaploMerger removes them

with new genome file, without repeats, you can run a new gene prediction.

ADD REPLY
0
Entering edit mode

Hi Mehmet,

I'm looking for this transcript as well, as I'm following the steps you post on your blog. Was wondering if you can share the script with me as well?

Thanks in advance, Stefany

ADD REPLY

Login before adding your answer.

Traffic: 3111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6