amino acid alignment

Question

Forum:How to Use Biostars Part-3: Formatting Text and Using GitHub Gists

10

Entering edit mode

5.5 years ago

Ram 43k

This post addresses the following points:

Formatting text: Strikethrough
Formatting text: Tabular data
Using GitHub Gist to post code/text
Formatting text: The Edit/Formatting bar (Work in Progress)

meta how-to documentation • 3.5k views

ADD COMMENT • link 3 months ago by Ram 43k

0

Entering edit mode

Ram are there options for mathematical formulas? As an example, consider the expressions on the Wikipedia page for the Gamma distribution. How is it recommended to "deal with" these?

ADD REPLY • link 3 months ago by LauferVA 4.2k

0

Entering edit mode

I think latex works to an extent. I'd appreciate any help in testing this as it's not a feature I use.

ADD REPLY • link 3 months ago by Ram 43k

0

Entering edit mode

There was certainly talk of having TeX rendering built in when the new forum roll out happened. Not sure if it ever materialised though.

Based on a quick test the inline formatting doesn't work ( $ ... $ )

ADD REPLY • link 3 months ago by Joe 21k

0

Entering edit mode

I tested code fences and plain text, neither of them worked either. I think only developers can speak to this feature.

ADD REPLY • link 3 months ago by Ram 43k

score 5 · Answer 1 · 2018-10-30

Using GitHub Gist to post code/text

TL;DR: Create a new gist on github by pasting your content/uploading a file, copy the link to the gist and paste it here on Biostars. Paste it directly in the text; don't use the hyperlink toolbar option.

Biostars has a character limit to each new post. I think the current limit is 5000 characters, which is a number rarely hit unless one is posting a lot of code or text.

The way to paste a long piece of code (or any text content, for that matter) is to embed a GitHub gist in the body of the post. This way, the content is hosted at GitHub and can be reused anywhere that supports embedding gists (which is a lot of places).

GitHub gists also have the advantage of supporting language-specific syntax highlighting, which makes embedded code look better (and thus easier on the eyes). The embed panel also minimizes to not take up too much screen space, and users can always retrieve the raw text with a few clicks.

I'll be pasting the texts from above in a GitHub gist to demonstrate how to do this.

A GitHub account is a prerequisite for this - these are free and as a bioinformatician you should have one :-)

Go to https://gist.github.com
Sign in to github
You should be taken to the "New gist" page, as shown below:
Add a Gist description and a filename (including the proper extension for the content of the post). I'm pasting plain text here, so my extension will be .txt. This is an example for a gist with Perl code (see it in action here) and this gist contains Java code (see it in action here).
Paste the content in the large box. If you already have a file that you'd like to upload, use the Add file button. Choose your preferences for Indenting (Spaces/Tabs), Indent Size and Wrap (or leave them at their defaults).
Click on one of the Create gist buttons. A public gist is visible to search engines, where a secret gist is only accessible via the exact URL. I prefer the second, but it is a matter of personal preference.
This should take you to your new gist. Copy the URL from the URL bar/address bar.
Paste it directly in the text of your biostars post. Do not use the hyperlink option in the toolbar, as that will bypass the engine's embedding algorithm.
You're done, the biostars engine will take care of embedding.

My gist URL (where I pasted the tab-separated text from the section on formatting tab-delimited text) is: https://gist.github.com/RamRS/2fae39017a6e04acf3fa7abc3a8fe6ef

And when I paste just the URL, this is what happens:

Now you know how to use GitHub gists to your advantage!

score 2 · Answer 2 · 2018-10-30

Formatting text: Strikethrough

TL;DR: Use <s>TEXT TO STRIKETHROUGH</s> to strike-through text.

There are times we wish to change our statements, but retain the context. This often happens when we say something we believe to be true, but we gain some information and then go back and change this statement. This transition can be represented by striking out the previous thought process.

For example, my original statement could be:

I don't think it's possible to perform decimal point calculations in bash.

Someone then replies to it showing how the bc tool can be used for bash calculations, and this leaves me with three options:

Leave my comment as-is, so readers would need to read the entire thread to know I was mistaken
Edit my comment, remove all the content and mention bc (Might get confusing owing to a lack of context)
Retain my old context and add new content, showing that my thinking has evolved, like so:

~~I don't think it's possible to perform decimal point calculations in bash.~~ EDIT: I was mistaken, see comments below for a description of the bc tool that performs bash calculations.

That is where strikethroughs are useful.

How do we use them? Unfortunately, biostars markdown does not support the strikethrough tokens(~TEXT TO STRIKETHROUGH~) yet. However, it does support the HTML tag that does strikethrough (<s>). By surrounding text with the tag <s>TEXT TO STRIKETHROUGH</s>, we can produce the strikethrough effect like so: ~~TEXT TO STRIKETHROUGH~~.

Of course, there are many more contexts where strikethroughs are useful, I'll leave it up to you to explore them!

score 2 · Answer 3 · 2018-10-30

Formatting text: Tabular Data

TL;DR: Use cat file.tsv | column -t -s $'\t' to output text in a visually pleasing manner, then copy paste to either biostars (using code formatting) or to a github gist.

Often, we need to use tabular data in our post content, and machine-parseable tabular content is not always easy on the eyes, as it can look quite mis-aligned. All content pasted below has been subject to code formatting (using the 101010 button in the toolbar, highlighted in the image below)

code_formatting

See, for example, a dataset from HGNC:

HGNC ID Approved Symbol Approved Name   Previous Symbols    Synonyms
HGNC:5  A1BG    alpha-1-B glycoprotein
HGNC:37133  A1BG-AS1    A1BG antisense RNA 1    NCRNA00181, A1BGAS, A1BG-AS FLJ23569
HGNC:24086  A1CF    APOBEC1 complementation factor      ACF, ASP, ACF64, ACF65, APOBEC1CF
HGNC:7  A2M alpha-2-macroglobulin       FWP007, S863-7, CPAMD5
HGNC:27057  A2M-AS1 A2M antisense RNA 1
HGNC:23336  A2ML1   alpha-2-macroglobulin like 1    CPAMD9  FLJ25179, p170
HGNC:41022  A2ML1-AS1   A2ML1 antisense RNA 1
HGNC:41523  A2ML1-AS2   A2ML1 antisense RNA 2
HGNC:8  A2MP1   alpha-2-macroglobulin pseudogene 1  A2MP

You can see it's messy. Fields are separated by a single TAB character, and while that makes it easy for utilities such as awk or cut, the human eyes cannot view pick out the 5th column in the 7th row without some effort (and risking a wide margin for error). This problem is amplified when the columns are homogeneous, for example, when we're viewing Normalized log2 counts in RNAseq data.

gene    TCGA.A1.A0SE    TCGA.A1.A0SH    TCGA.A1.A0SJ    TCGA.A1.A0SK    TCGA.A1.A0SM    TCGA.A1.A0SO    TCGA.A1.A0SP    TCGA.A2.A04P    TCGA.A2.A04Q
hsa-let-7a-1    12.6169 12.5752 12.6773 11.8037 12.7343 11.3008 12.4393 12.7181 11.8223
hsa-let-7a-2    13.6169 13.5573 13.6806 12.8041 13.7251 12.3484 13.4698 13.7182 12.8123
hsa-let-7a-3    12.6344 12.5841 12.692  11.8315 12.7827 11.3531 12.4921 12.7756 11.8996
hsa-let-7b  15.4405 15.5052 15.6086 14.5116 16.037  12.8137 15.033  14.0804 13.7427
hsa-let-7c  12.0564 12.8274 11.4256 9.6178  10.8023 11.5737 10.8517 12.7046 11.4696
hsa-let-7d  8.6969  9.3829  8.6306  10.4122 8.2413  10.2243 9.9569  10.8403 9.8383

How can this be made better for display? Do we manually move each column around so they're aligned? No! This is where the column utility is really handy. column is used to format delimiter-separated text to make it pretty. The output is a little more challenging to parse with utilities, but is a lot easier on the eyes.

This is how the above snippets look like when formatted by pipeing the content to | column -t -s $'\t'

HGNC ID     Approved Symbol  Approved Name                                    Previous Symbols             Synonyms
HGNC:5      A1BG             alpha-1-B glycoprotein
HGNC:37133  A1BG-AS1         A1BG antisense RNA 1                             NCRNA00181, A1BGAS, A1BG-AS  FLJ23569
HGNC:24086  A1CF             APOBEC1 complementation factor                                                ACF, ASP, ACF64, ACF65, APOBEC1CF
HGNC:7      A2M              alpha-2-macroglobulin                                                         FWP007, S863-7, CPAMD5
HGNC:27057  A2M-AS1          A2M antisense RNA 1
HGNC:23336  A2ML1            alpha-2-macroglobulin like 1                     CPAMD9                       FLJ25179, p170
HGNC:41022  A2ML1-AS1        A2ML1 antisense RNA 1
HGNC:41523  A2ML1-AS2        A2ML1 antisense RNA 2
HGNC:8      A2MP1            alpha-2-macroglobulin pseudogene 1               A2MP

And the RNAseq dataset:

gene          TCGA.A1.A0SE  TCGA.A1.A0SH  TCGA.A1.A0SJ  TCGA.A1.A0SK  TCGA.A1.A0SM  TCGA.A1.A0SO  TCGA.A1.A0SP  TCGA.A2.A04P  TCGA.A2.A04Q
hsa-let-7a-1  12.6169       12.5752       12.6773       11.8037       12.7343       11.3008       12.4393       12.7181       11.8223
hsa-let-7a-2  13.6169       13.5573       13.6806       12.8041       13.7251       12.3484       13.4698       13.7182       12.8123
hsa-let-7a-3  12.6344       12.5841       12.692        11.8315       12.7827       11.3531       12.4921       12.7756       11.8996
hsa-let-7b    15.4405       15.5052       15.6086       14.5116       16.037        12.8137       15.033        14.0804       13.7427
hsa-let-7c    12.0564       12.8274       11.4256       9.6178        10.8023       11.5737       10.8517       12.7046       11.4696
hsa-let-7d    8.6969        9.3829        8.6306        10.4122       8.2413        10.2243       9.9569        10.8403       9.8383

Don't these look prettier and easier on the eyes? You can combine this with the gist trick to make your content look better and enable people to download the raw text!

score 1 · Answer 4 · 2024-01-10

Work in Progress -- Formatting text: Code

TL;DR: Use one of three options: surround a few words with backticks (`); use the 101010 option highlighted below or use fenced code blocks (a set of three backticks to signal beginning and ending of code blocks). DO NOT use the double quote button.

When we need to copy-paste content from the command prompt/terminal or showcase a piece of code, we need the symbols in that code to appear as-is. We also need the content to be monospaced, which means that each character occupies the same width so AAAA and IIII line up. Also, markdown treats single line breaks as spaces but with code formatting, they're treated as proper new lines.

Without code formatting, the content below looks odd:

amino acid alignment

AAAA |||| IIII

The same content with code formatting:

# amino acid alignment
AAAA
||||
IIII

Notice how the l, | (pipe) and I characters look different with code formatting.

There are also times where we need to include keywords in the body of text that we are typing. For example, I had to include the characters being shown differently in my previous sentence and they had to be part of the sentence. This is called inline code formatting and is part of basic markdown syntax. It can be done simply by surrounding the text with backticks. `text content` becomes text content. It is still monospaced, but is good only for short stretches of content. If we use this for multiple lines or even a single long line, things can start looking wonky.

We can also use the highlighted 101010 button in the toolbar, which will indent any selected content by 4 spaces, giving it the basic code-block formatting. This is equivalent to using plain code fences. The way to use this is to select the content to be formatted before clicking the 101010 button.

code_formatting

I prefer code fences to indentation for two reasons:

It uses fewer characters - the indentation adds 4 spaces to every line. This can waste quite a bit of space especially in long posts that near the character limit.
Indented blocks are always treated as bash code whereas we can specify code language in fenced code. By adding language details right after the opening code fence, we can leverage the parser's syntax highlighting capabilities. See the difference in the code blocks below:

from os import listdir

listdir('./')

vs

from os import listdir

listdir('./')