Question: getting "std::bad_alloc" message when using filterBam (augustus3.1)
0
gravatar for freddiejung
3.7 years ago by
freddiejung20
Japan
freddiejung20 wrote:

Hi,

 

I am trying to construct a gene model of new species using augustus3.1.

I have some RNAseq data, so I utilized it for 'intron hints'.

Augustus tutorial says FilterBam program can be used for more accurate gene prediction(http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=IncorporatingRNAseq.Tophat), so I decided to use it.

However, everytime I run filterBam, it always ended up with message like this:

processed line 74100terminate called after throwing an instance of 'std::bad_alloc' 

what(): std::bad_alloc"

Despite of this message filetered BAM file was generated but I do not know this BAM file is reliable or not.

Where I went wrong?

 

 

 

filterbam pair-end augustus • 1.6k views
ADD COMMENTlink modified 3.7 years ago by Mehmet460 • written 3.7 years ago by freddiejung20
0
gravatar for Mehmet
3.7 years ago by
Mehmet460
Japan
Mehmet460 wrote:

This can be because of insufficient memory. Which platform are you running Augustus? (I guess you are in Japan and use supercomputer system like I do?). 

ADD COMMENTlink written 3.7 years ago by Mehmet460

Hi, Mehmet,

I run Augustus on my Mac PC(3.5Ghz quad core Intel core i5, 16G 1,600Mhz DDR3 SDRAM).

ADD REPLYlink written 3.7 years ago by freddiejung20

OK. As far as I understand from your e-mail, you want to predict genes for your new species and you want to use RNA-seq hints (Intron, Exon and Intron+Exon hints, separately).For this,

1.Produce intron hints:

you don't need to filter bam file. instead of this, you should you do:

augustus-3.0.1/auxprogs/bam2hints/bam2hints --intronsonly --minintronlen= --in=yourbamfile --out=intronhints.gff

#you can write, for example 15. for minintronlen option. (filtering bam file takes long time depending on the size of your bam file and also I tried filtering before but it is not so good. The filterbam script is not working well).

2.Produce exon hints:

cufflinks options yourbamfile.

#this will produce a transcripts.gtf file which contains only exon parts (exonhints). then you need yo convert it into .gff format. For this;

cufflinkGTF2augustusExonParthints.pl transcripts.gtf      #this will produce a transcripts.gtf.augustusexonpart.gff  file

3. Exon+intron hints:

cat intronhints.gff transcripts.gtf.augustusexonpart.gff

#then you can use these three hint files as a hint file in augustus separately.

#I use Japanese supercomputer system for all bioinformatics analyses. You should talk to your supervisor to get an access to the supercomputer system. Otherwise, with your personal computer your analyses will take too long, and for another analyses your RAM will not be sufficient. 

if you need more help, please ask.

ADD REPLYlink written 3.7 years ago by Mehmet460

Mehmet,

Thank you for answering my question! 

I try steps you suggested, but I have a quick question.

augustus manual says it is better to run augustus (Bowtie/Tophat mapping) with untrimmed reads, and when I created intron-hints through Bowtie/Tophat-augustus pipeline, I utilized untrimmed RNAseq reads and obtained "accepted_hits.bam" file.

Is this file usable for getting cufflink transcript.gtf file?

Or should I do quality trimming?

ADD REPLYlink written 3.7 years ago by freddiejung20

Hi, 

yes it is useful. you should use accepted_hits.bam file to create intron and exon parts. it is okay. I always use accepted_hits.bam file to get hints.

ADD REPLYlink written 3.7 years ago by Mehmet460

Thank you so much for your kind advice.

I am now merging transcript.gtf from different samples by cuffmerge.

Where can I get cufflinkGTF2augustusExonParthints.pl?

Is the "gtf2gff.pl" in /scripts dir not compatible with chfflink output *.gtf files?

ADD REPLYlink written 3.7 years ago by freddiejung20

the script that i sent you was written by my friend. you just need to convert gtf to gff. you can use it with --printExon option.

ADD REPLYlink written 3.7 years ago by Mehmet460

Hi, Can I have one more question?

(I have been trying up with masking my species' genome with RepeatMasker and RepeatModeler, 

and it took me for almost 2 weeks!)

Anyway, I just merged hints(intronhint, exonhint, repeat-sequencehint) and ran augustus.

Then got this message:

No source specified (e.g. by source=M in the last column)

Error in hint line: scaffoldXX_covXXX    Cufflinks    exon    ddddddd    ddddddd    .    .    .    gene_id "XXXX_dddddd"; transcript_id "XXXX_dddddd"; exon_number "d"; oId "CUFF.ddddd.d"; tss_id "TSSddddd";

Could not read strand.
Maybe you used spaces instead of tabulators?

##########

I think this message means my exon hints were not used for gene prediction.

If you don’t mind, please help me for solving this problem.

 

 

ADD REPLYlink written 3.6 years ago by freddiejung20

hi,

Why did you use masked sequence? You don't need to use it.

Can you write your Augustus command which was used for gene prediction with RNA-seq hints.

 

this is the command that I used:

 /home/user/applications/augustus.2.5.5/bin/augustus --species=myspecies --extrinsicCfgFile=/home/user/applications/augustus.2.5.5/config/extrinsic/extrinsic.M.RM.E.W.cfg --alternatives-from-evidence=true --alternatives-from-sampling=false --hintsfile=yourfile.gff --gff3=on genome.fa > output.

for repeat masker and repeat modeller:

Could you run these two programs? if no, I can show you how to use.

 

ADD REPLYlink written 3.6 years ago by Mehmet460

Thank you,

This is the command I used:

augustus --species=My_species --exonnames=on --codingseq=on --protein=on --extrinsicCfgFile=./param_file/extrinsic.M.RM.E.W.cfg --alternatives-from-evidence=true --hintsfile=merged_hint.gff --allow_hinted_splicesites=atac genome.fa > output

 

>Why did you use masked sequence? You don't need to use it. 

To be exact, I ran augustus on the unmasked genome("genome.fa") but supplied repeat information as nonexonpart hints. 

I generated repeat-sequece hint from output file of RepeatMasker (http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=Augustus.IncorporateRepeats) and merged it with other hint files (exon hints, intron hints) when I ran augustus.

ADD REPLYlink written 3.6 years ago by freddiejung20

you can use masked genome file as genome file, but you don't need to use repeated sequence as hints file. instead of this, you can remove repeats ,which were identified by repeat masker,  using HaploMerger tool. HaploMerger removes repeats from your genome.

1.RepeatModeler finds repeats.

2. RepeatMasker masks them

3.HaploMerger removes 

with new genome file, without repeats, you can run a new gene prediction.

ADD REPLYlink written 3.6 years ago by Mehmet460

Hi Mehmet,

I'm looking for this transcript as well, as I'm following the steps you post on your blog. Was wondering if you can share the script with me as well?

Thanks in advance, Stefany

ADD REPLYlink written 2.6 years ago by stefanysolano.cr0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour