Tool:BBTools 39.03 released!
0
12
Entering edit mode
6 months ago

Hi Everyone,

I just released a new version of BBTools (39.03). There are some exciting new features like neural networks. But let me list them:

1) New program called "bbcrisprfinder.sh". It finds CRISPRs... designed for short reads in metagenomes (but it's also OK on full genomes). It works better if you supply it a reference list to which to align all the suspected CRISPRs. Or use an iterative approach.

2) "checkstrand.sh". I think it is fantastic. The goal is to analyze RNA-seq data and determine how stranded it is. It's probably more relevant to institutions like JGI that try various protocols and need to evaluate them, but... if you ever have a (supposedly) strand-specific library and want to know how successful it was, this could help! The important point is that it does not require a reference, and thus works on metagenomes or novel species. Checkstrand works on anything - proks, euks, shotgun, RNA-seq, etc. The more information you can provide, the better, but it still works on raw, unannotated sequence. ...but, for example, if you have a bacteria that has been assembled, you should use something like checkstrand.sh in=reads.fq ref=assembly.fa passes=2. Checkstrand has multiple different metrics for strandedness, and is designed to work without an assembly, on random metagenomic data... but, you can get more detailed information with a reference, and (for proks, at least) even more detailed information, without a gff. Although you CAN set a gff with "gff=" and it will be used instead of the internal prok-only gene-caller. Basically, the point is to glean all possible information on library strandedness (such as kmer frequency, stop codon frequency, etc) from individual reads or read pairs, and report it to the user.

3) SIMD vectorization. You can enable this in some programs via the "simd" flag. It makes no difference except in nn-heavy programs like bbmerge or train.sh. But, it makes some programs 2.5x faster.

4) Neural Networks. I'm working on this. It makes BBMerge much more accurate. Fortunately with BBMerge I can generate all the synthetic data I want, but it's much more difficult for training things like CRISPRs and gene-calling where the correct answer is not known.

5) Oh... and there are some other new things like callgenes.sh now allows a "passes=2" flag. You can do however many passes you want, but it seems to converge after 3 passes, which is almost identical to 2 passes. The way it works is that it calls genes based on a model made from every prokaryotic genome in RefSeq that have annotations (an accompanying gff file). I thought it worked really well, but 2-pass mode was dramatically better. It now calls genes using the existing frequencies, and gets them ~95% concordant with the NCBI annotations (which may or may not be correct, but up from 92%). Now, with a second pass, it recodes the coding and noncoding hexamer probabilities, and all the hexamers around starts and stops, according to the organism being analyzed. Anyway, I'd make passes=2 the default except that it is not very useful for metagenomes, so just enable that if you have an isolate prokaryote.

BBTools BBMap • 1.1k views
ADD COMMENT
0
Entering edit mode

WooHoo!

if you ever have a (supposedly) strand-specific library and want to know how successful it was, this could help!

Looking forward to trying checkstrand.sh.

ADD REPLY
0
Entering edit mode

Is the sourceforge link the main entry point to the release? It is called bbmap and not bbtools so that could be a source of confusion

https://sourceforge.net/projects/bbmap/

Since we are on this topic, would you consider moving to a more modern source code management system? Say GitHub? I find the Sourceforge interface painfully antiquated and obsolete.

ADD REPLY
0
Entering edit mode

Actually, it's on Bitbucket too: https://bitbucket.org/berkeleylab/jgi-bbtools. I like Sourceforge because it lets me host compiled code and assorted non-plaintext files which is (or at least, was, last time I checked) more difficult in version control systems. But more to the point, the Department of Energy has a policy that all open source software has the number of downloads tracked, which Sourceforge does.

Unfortunately "bbtools" was already taken at Sourceforge, so...

ADD REPLY

Login before adding your answer.

Traffic: 1485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6