Tool: ASCIIGenome: Text Only Genome Viewer!
36
gravatar for dariober
2.6 years ago by
dariober9.9k
Glasgow - UK
dariober9.9k wrote:

Hi All- Since thousands excellent genome browsers are not enough, I decided to roll my own.

ASCIIGenome displays genomic tracks directly on terminal screen by means of ascii characters, no graphical interface (I know, it sounds weird). The purpose is to allow quick visualization of alignment and annotation files without the need of popping up the GUI, scroll through menus etc.

As far as I know, the program most similar to ASCIIGenome is samtools tview, but I thought it was far too limited. Ideally ASCIIGenome combines samtools tview with IGV.

It goes without saying, ASCIIGenome is still in development, but I'm using it for my own work and I find it useful. There is a fairly comprehensive README, so feel free to give it a try, installation is quite minimal as it is written in Java.

Here's a screenshot to give an idea.

enter image description here

I'm curious to know how much interest there is out there for such a tool, so please post comments, bugs, criticism etc.

ADD COMMENTlink modified 3 months ago by quokka10 • written 2.6 years ago by dariober9.9k
1

It is actually a great idea. I prefer to use the terminal rather than opening the browser. Good work!

ADD REPLYlink written 2.6 years ago by Giovanni M Dall'Olio26k

very cool ! which java library do you use to display on terminal, jcurses ?

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum116k
1

Hi Pierre, thanks. I use jline to set up the console and enable some Unix shortcut like autocomplete and UP and DOWN keys to scroll commands. Otherwise colours are rendered simply by wrapping text in ANSI escape codes (see here). Does it answer your question?

Oh, and I got some code from jvarkit (see credits)!

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by dariober9.9k
1

ah, I see, I didn't test your program. As far as I understand what you said It "just" "print" dump the result, there is no interaction like in "samtools tview" (which requires the curses library to "gotoxy" ). Am I right ?

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum116k
1

Mmm... There is user interaction in the sense that once ASCIIGenome is started you can navigate around the genome, find stuff etc. But yes, every time a command is issue (say ff, move forward), the jline console clears the screen before new text is printed by means of simple System.out.println(). As far as I can tell, from a user's perspective there is little difference with samtools tview, but under the hood the two programs might be handling things differently. (I'll have a look at jcurses)

ADD REPLYlink written 2.6 years ago by dariober9.9k

In your case, using System.out is fine,( furthermore jcurses requires a JNI library)

ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum116k
1

Fiiiiiinally :D Thank you Dariober! Awesome work!

ADD REPLYlink written 2.6 years ago by John12k
1

wow! +100 for this. cool stuff.

ADD REPLYlink written 2.6 years ago by poisonAlien2.7k
1

Thank you all for positive feedback, at least the idea got good reception. But please, feel free to test it and report bugs, missing features, unclear docs, etc... As I said above, this is still early stages and virtually I am the only one having used it.

ADD REPLYlink written 2.6 years ago by dariober9.9k
1

I've created a brew formula to install this, however it won't be accepted in homebrew/science because the github repository is not notable enough (<20 forks, <20 watchers and <50 stars). https://github.com/dariober/ASCIIGenome/issues/3

ADD REPLYlink written 2.6 years ago by Giovanni M Dall'Olio26k

This really is a great viewer I must say!

ADD REPLYlink written 3 months ago by quokka10
2
gravatar for robin.friedman
2.6 years ago by
robin.friedman20 wrote:

Created an account just to reply: Already using it, love the idea, and definitely has huge potential! However, a few small feature adds would help a lot with my genome browser workflow (searching for a gene, zooming out to see it in its entirety, browsing around to compare densities across exons, etc.):

  • Separating the regex for "print" from that for "visible". For example, I'd like to print only "transcript" lines of my GTF but view exons as well.
  • Making "print" toggle printing on/off would save time compared to "print ^$". Or providing another command to turn off printing would work fine.
  • A shortcut for "zoom out to the highest level that the BAM tracks still display" would also save time.
  • Being able to repeat other commands like with "zi" and "zo". For example, "f 2" or "next 2".

Also on my feature wishlist would be gene name display on GTF tracks (perhaps in the middle of the GGGGGGGGGGG), and zooming to both boundaries of a gene rather than just the start. But those sound a bit more complicated to do...

ADD COMMENTlink written 2.6 years ago by robin.friedman20

Hi- Thanks for feedback! Here's some thoughts:

Separating the regex for "print" from that for "visible"

I see... I figured one would pretty much always want to see what you print and vice-versa. As a temporary work around, you could load the same gtf track twice. This can be done because the same file can be loaded more than once and the copies are independent of each other (which is a waste of resources, by the way...). Then for example assign visible transcript ^$ my.gtf#1 to one track to show only transcripts and visible transcript|exon ^$ my.gtf#2 to the other track to show both exons and transcripts (play around with regexes to be more specific).

Making "print" toggle printing on/off

True. The way it works now is a bit fiddly but it was easier to implement, but I should change it to toggle mode.

A shortcut for "zoom out to the highest level that the BAM tracks still display"

At the moment bam files are "turned off" when the window size is >100kb, which is an educated guess to trade off between speed and usefulness. I guess it should be possible to zoom out to span 100kb max, it's a good idea.

Being able to repeat other commands

I see... In fact having both f and ff is a bit redundant. I could put a f <float> to say "move forward <float> times the current window size", the same for other commands. Note that pressing enter without any argument will repeat the previous command.

gene name display on GTF tracks

Absolutely! There are quite a few things that should be improved in the visualisation of GTF records, this is one. Another is that CDSs are present also as exons in the gtf format and both are displayed which is unnecessary, only CDS should show. Also, each gtf line is independent of the others, there is no parsing into trascripts or genes (it was easier this way...) which means exons are not connected into transcripts.

zooming to both boundaries of a gene rather than just the start

To some extent this is implemented in the find_all command, for example find_all "ACTB" will find all records on the same chromosome containing ACTB (again, get the regex right).

But thanks! Keep testing and report back any issues!

ADD REPLYlink written 2.6 years ago by dariober9.9k

Thanks for the tips!

ADD REPLYlink written 2.6 years ago by robin.friedman20

Keep sending issues, requests etc... I should upload a new version soon, much improved over the current one, which is quite preliminary.

ADD REPLYlink written 2.6 years ago by dariober9.9k
2
gravatar for Damian Kao
2.6 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

This is awesome. I've been wanting something like this for a while now. Having to get out of terminal to visualize stuff can really interrupt your workflow sometimes.

ADD COMMENTlink written 2.6 years ago by Damian Kao15k
2
gravatar for dariober
2.3 years ago by
dariober9.9k
Glasgow - UK
dariober9.9k wrote:

Hi All- If still interested in ASCIIGenome, I uploaded version 0.4.0 which is a large improvement over previous versions.

Apart from several bug fixes, here's some addtions:

  • Batch processing with --batchFile option: ASCIIGenome will iterate through each interval in a batch file in bed or gff format. This is super useful to generate a gallery of screenshots in target regions (e.g. genes, peaks, etc). Much much faster then iterating ASCIIGenome in a for-loop since the JVM and files are loaded only once!

  • Colours

    • Now the png output has colours.

    • ASCIIGenome sets the background colour of the terminal to white, unless started with --noFormat. In this way the visual look of ASCIIGenome should be independent of the user's colour scheme of the terminal.

  • Save session Session settings can be saved to file to be reloaded in a new run of ASCIIGenome (Experimental).

  • New commands Some new commands not listed above: bookmark to mark positions of interest, cmdHistory shows the list of executed commands, hideTitle for more compact view, dropTracks, l (left) and r (right) command. See here for a list commands.

  • Better API Some commands have been renamed and improved in API. Some examples: filter now is grep -i incl_regex -e excl_re <tracks>; commands mapq -f -F have been grouped in the single samtools command. Bookmarks and regex tracks can be saved to file with the familiar *nix operators > and >>.

  • Performance

    • Memory Memory footprint is now smaller since files are never fully read in memory. Bed or gff files without tabix index are sorted, block compressed and indexed as needed to temporary files.

    • Speed Operation that don't change the underlying data, e.g. change colour, do not parse the raw files again. This makes quite a difference in speed by saving unnecessary computation.

  • TDF files can switch to be normalized to reads per million.

  • Major refactoring should make further development easier.

With lots of changes there are probably some bugs creeping. Feel free to file bugs, issues, any and all comments here or on GitHub.

ADD COMMENTlink written 2.3 years ago by dariober9.9k
2
gravatar for dariober
2.0 years ago by
dariober9.9k
Glasgow - UK
dariober9.9k wrote:

Hi All- I just wanted to mention that a paper describing ASCIIGenome is out in Bioinformatics.

But most important, if you still like the idea but got disappointed by earlier versions, please give it a shot to the newer one.

ADD COMMENTlink written 2.0 years ago by dariober9.9k
1
gravatar for brent_wilson
2.6 years ago by
brent_wilson90
Cofactor Genomics (St. Louis, MO)
brent_wilson90 wrote:

I completely agree with the above! I can see a lot of people being interested in this.

Brent Wilson, PhD | Project Scientist | Cofactor Genomics

4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647

Catch the latest from Cofactor on our blog.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by brent_wilson90
1
gravatar for dariober
2.6 years ago by
dariober9.9k
Glasgow - UK
dariober9.9k wrote:

Hi All- Encouraged by the positive feedback I did quite a bit of work on ASCIIGenome. If anyone is still interested, I uploaded the new version 0.2.0 which has lots of improvements compared to the previous one, which was quite experimental. See the CHANGELOG for a list of changes.

Feel free to report bugs, issues, complaints, comments...

ADD COMMENTlink written 2.6 years ago by dariober9.9k
1
gravatar for endrebak
2.6 years ago by
endrebak690
endrebak690 wrote:

That is brains exploding level of genius. I'm glad I stole the SICER rewrite away from you so you could come up with this :)

Various ideas in random order (the list will probably grow):

1

Making UCSC tracks easy to discover and include would be neat, like for example if you know you want to display all tracks from UCSC from the HaCaT cell line you could easily do it with a regex, like

java asciigenome mytrack.bigwig --ucsc '*hacat*' # displays my track, and some HaCaT ones from UCSC

(You could probably think of a better CLI than the one suggested above.)

Perhaps a UCSC command line track downloader/getter is a good idea on its own?

2

Is it possible to write the "images" to a regular text file? If not, that would be a neat option.

3

Would be nice if the user could supply a gene track and get the distance to to the nearest gene (in each direction) on a line on the bottom together with the names of any genes overlapping with the currently viewed region. If I want to view regions from a TF chip-seq experiment it would be nice to know which genes are close.

4

Can arbitrarily many tracks be viewed? I have a time series experiment with ten timepoints and would love to view some regions with all times on top of each other.

5

It would be great if I could input a file of regions I want displayed and have them all saved in one big pdf/png. After ChIP-Seq experiments I am left with a few hundred regions I want to look more closely at. It would be nice I could just input one file with these regions together with a bigwig/bedgraph and asciigenome would create one big report that I could scroll through afterwards. It would be nice if asciigenome also could allow the user to give each region included in the report a name and a description like:

# label, organism, assembly, chromosome, start, end, description, upstream, downstream
IL10, human, hg18, chr1, 205007571, 205012462, "involved in immunity", 10000, 1000
PRNP, human, hg18, chr20, 4615157, 4630234, "Prion Protein", 10000, 10000
HSPB4, human, hg18, chr21, 43462210, 43465982, "Heat-shock protein", 10000, 10000
    ...

See the example report here: http://bioinfoblog.it/2011/12/a-script-to-fetch-images-from-the-ucsc-browser/

Ps. will start to use this tomorrow as I hate GUIs/the mouse.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by endrebak690

Hi endrebak, thanks for your comments! Here's some thoughts.

Making UCSC tracks easy to discover and include would be neat

I clearly see how this can be useful, I need to figure out how to capture UCSC tracks, not sure how easy this is going to be. For now, if you have a file with track URLs addresses (e.g. from encode) you can pass the URLs of interests as input to ASCIIGenome. But note that connecting to remote data can be slow.

Is it possible to write the "images"

Yes, either as txt or png there are various options see Saving screenshots.

get the distance to to the nearest gene

I see... I'll think about it, it seems a task where bedtools is ideal. About gene name in current view, feature names are shown provided there is enough space on the feature, see also Formatting of reads and features

Can arbitrarily many tracks be viewed

There is no hardcoded limit. I think in practice you are limited by the fact that after so many tracks it may get a bit messy, but that's up to you. Note also that bed and gtf files are loaded in memory unless they are indexed with tabix, but for smallish tracks, like chipseq peaks it should not be a problem.

It would be great if I could input a file of regions I want displayed and have them all saved in one big pdf/png.

I think the answer is yes you can. See again the example in Saving screenshots which uses a for-loop to go through regions (a little bit of familiarity with bash is assumed). For manipulating images in batch I can recommend imagemagik convert.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by dariober9.9k

What a wonderful piece of software! Thanks. Ps. the UCSC trackfinder seems to be a very good idea, but it is only suitable as a standalone program I think.

ADD REPLYlink written 2.5 years ago by endrebak690
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1538 users visited in the last hour