Question

BUSCO genome mode giving poor results

2

Entering edit mode

8.8 years ago

Chris Cole ▴ 800

I've successfully managed to run BUSCO on transcriptome data with some problems (see here), but it's working now.

In genome mode, however, I'm struggling to get decent results and I can't figure out why? For the exact same species in transcriptome mode I get 342 (79%) 'Complete BUSCOs', yet in genome mode I only get 151 (35%).

Has anyone else seen this?

busco error • 7.8k views

ADD COMMENT • link 8.8 years ago by Chris Cole ▴ 800

1

Entering edit mode

This is the command:

python3 BUSCO_v1.22.py -o output -in genome.fasta -l /db/busco/eukaryota -m genome

With these versions of the tools:

hmmer 3.1b2
blast 2.2.29+
augustus 3.1.0 (having problems locally with 3.0.3)
python 3.4.3

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

Is the transcriptome extracted from the genome using gene annotations? If not, it's possible that your transcriptome is just more complete/correct than the genome.

ADD REPLY • link 8.8 years ago by Damian Kao 16k

1

Entering edit mode

I have heard (second-hand, admittedly) of cases where transcripts map back to the genome >95%, but the genome completeness is low; see my below reply re: nematode. However, if one uses the transcripts to derive gene models (e.g. using MAKER), then uses BUSCO on the gene model sequence, the % completeness goes up.

Based on the BUSCO manual:

BUSCO genome assembly assessment first identifies candidate regions from the genome to be assessed with tBLASTn searches using BUSCO consensus sequences. Gene structures are then predicted using Augustus with BUSCO block profiles. Finally, these predicted genes, or all genes from an annotated gene set or transcriptome, are assessed using HMMER and lineage- specific BUSCO profiles to classify matches as complete, duplicated, or fragmented, or when there are no matches, as missing.

So, maybe Augustus has a hard time deriving accurate gene models de novo leading to poor BUSCO scores, but when assisted using transcriptome data BUSCO works more effectively?

ADD REPLY • link 8.8 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

Yeah, could well be.

I'm not convinced that BUSCO is doing the right thing here either, but working through the code is like walking through treacle...

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

Both are data derived from a curated species, Dictyostelium discoideum, so should be pretty reliable and similar.

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

score 1 · Answer 1 · 2016-09-22

1

Entering edit mode

8.8 years ago

colindaven 7.7k

Weird, we run it in genome mode all the time and generally get 70-90 %. I have never seen results like 35%.

(These are for large plant genomes, where Augustus works pretty well).

ADD COMMENT • link 8.8 years ago by colindaven 7.7k

1

Entering edit mode

I have seen this for some nematode genomes, even when doing the extended run; the % varies quite a bit but is always low (in some cases, less than 20%). Interestingly, CEGMA gave more consistent results.

We have wondered whether this has something to do w/ Augustus making poor calls, though I'm not sure how BUSCO is using it internally.

ADD REPLY • link 8.8 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

I have tried to decipher the code, but am struggling to work out where it's going wrong. I agree with you, it's likely to be something to so with Augustus (seeing as that's the key difference between 'trans' and 'genome' mode), but am not sure whether it's the software itself or BUSCOs implementation of it.

In your nematodes have you compared the genome result with an ORF or transcriptome run?

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

1

Entering edit mode

We haven't done this directly. But we have had another group report much lower scores when using whole genome vs. just gene models (which were derived via assembled RNA-Seq + Braker I believe). It's something I'd like to confirm but I wouldn't be terribly surprised if that does hold true.

ADD REPLY • link 8.8 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

So one problem was that Augustus was crashing consistently for some genes, but BUSCO pipes all errors to /dev/null so it was never report until I removed the /dev/null redirect.

Managed to fix some of the (local) causes of failure, but am still getting core dumps for a small number genes.

I am circumspect regarding this software.

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

1

Entering edit mode

Tools that hide failures w/o documenting them are definitely a worry.

ADD REPLY • link 8.8 years ago by Chris Fields ★ 2.2k

0

Entering edit mode

Hi Colin,

Do have an example species that works for you and I could use as a test, please? Preferably a relatively small one for speed reasons. Thanks.

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

I don't have a small test, these are all very large plant genomes, so probably wouldn't help much. Have you independently tried using Augustus separately or from within Maker to do gene prediction in Dictyostelium - perhaps Dicty genes are not well predicted ?

ADD REPLY • link 8.8 years ago by colindaven 7.7k

score 1 · Answer 2 · 2016-09-22

1

Entering edit mode

8.8 years ago

Farbod ★ 3.4k

Dear Chris, Hi Are you intend to check your transcriptome assembly (as you have mentioned : "transcriptome data") ?

or some genome assembly assessment ?

if the first is your aim you should use :

python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans

And it is usually recommend running the most closest set available for the species being analysed. If your species is a fish, it is better to choose vertebrate instead of eukaryota.

In addition, according to the duplication situation of the genome ( showing with "D" in BUSCO results), it is possible that the number of Complete Busco results have been decreased and the number of Duplicate Busco increased.

ADD COMMENT • link 8.8 years ago by Farbod ★ 3.4k

0

Entering edit mode

I'm wanting to do both and compare the results.

I've already done this with CEGMA and the results are similar. A reviewer recommended we use BUSCO instead, but now am getting these very different results. Am tempted to ditch BUSCO and just stick with CEGMA.

ADD REPLY • link 8.8 years ago by Chris Cole ▴ 800

score 1 · Answer 3 · 2016-09-27

1

Entering edit mode

8.8 years ago

Chris Cole ▴ 800

I have given up on BUSCO v1.22 as it appears the Augustus step is not running correctly on our set-up for some reason.

Reverting to v1.1b1 gives more sensible results (70-80% complete) in genome mode, but the results in transcriptome mode are now different when compared to v1.22. Nothing in changelog suggests this should be the case.

There's something odd going with BUSCO and different versions, but I don't currently have time to dig anymore into this.

ADD COMMENT • link 8.8 years ago by Chris Cole ▴ 800

0

Entering edit mode

Hi Dear Chris,

It's a long time ago and I'm facing a similar situation.

For transcriptome completeness assessment, the result between BUSCO v 2.0 and CEGMA could be very different, i.e. BUSCO completness (C:55.1%[S:35.6%,D:19.5%],F:6.3%,M:38.6%,n:978) but cegma produces higher scores (74.19 completeness; 83.06 partial).

So I wonder how you deal with this in the end, you just stick only to CEGMA?

ADD REPLY • link 8.4 years ago by qingxiangg ▴ 40