Forum:How to Use Biostars Part-3: Formatting Text and Using GitHub Gists
3
8
Entering edit mode
2.9 years ago
Ram 34k

This post addresses the following points:

• Formatting text: Strikethrough
• Formatting text: Tabular data
• Using GitHub Gist to post code/text
• Formatting text: The Edit/Formatting bar (Work in Progress)
how-to meta documentation Page • 1.3k views
3
Entering edit mode
2.9 years ago
Ram 34k

## Using GitHub Gist to post code/text

TL;DR: Create a new gist on github by pasting your content/uploading a file, copy the link to the gist and paste it here on Biostars. Paste it directly in the text; don't use the hyperlink toolbar option.

Biostars has a character limit to each new post. I think the current limit is 5000 characters, which is a number rarely hit unless one is posting a lot of code or text.

The way to paste a long piece of code (or any text content, for that matter) is to embed a GitHub gist in the body of the post. This way, the content is hosted at GitHub and can be reused anywhere that supports embedding gists (which is a lot of places).

GitHub gists also have the advantage of supporting language-specific syntax highlighting, which makes embedded code look better (and thus easier on the eyes). The embed panel also minimizes to not take up too much screen space, and users can always retrieve the raw text with a few clicks.

I'll be pasting the texts from above in a GitHub gist to demonstrate how to do this.

A GitHub account is a prerequisite for this - these are free and as a bioinformatician you should have one :-)

2. You should be taken to the "New gist" page, as shown below:

3. Add a Gist description and a filename (including the proper extension for the content of the post). I'm pasting plain text here, so my extension will be .txt. This is an example for a gist with Perl code (see it in action here) and this gist contains Java code (see it in action here).

4. Paste the content in the large box. If you already have a file that you'd like to upload, use the Add file button. Choose your preferences for Indenting (Spaces/Tabs), Indent Size and Wrap (or leave them at their defaults).

5. Click on one of the Create gist buttons. A public gist is visible to search engines, where a secret gist is only accessible via the exact URL. I prefer the second, but it is a matter of personal preference.

6. This should take you to your new gist. Copy the URL from the URL bar/address bar.

7. Paste it directly in the text of your biostars post. Do not use the hyperlink option in the toolbar, as that will bypass the engine's embedding algorithm.

8. You're done, the biostars engine will take care of embedding.

My gist URL (where I pasted the tab-separated text from the section on formatting tab-delimited text) is:

And when I paste just the URL, this is what happens:

0
Entering edit mode

Is there any size limit for using gists?

0
Entering edit mode

A quick google search says file size should not exceed 100M. Not sure if it applies for github gists as well.

0
Entering edit mode

Dont think there is an easily reachable upper limit but gists should still be used judiciously so as not to make people scroll for days to reach the next post. To my knowledge biostars will not abridge long gists vertically (but will horizontally).

1
Entering edit mode

Ah, so biostars does not squelch long gists - that is good to know! EDIT: All we need to do is add .gist-data { max-height: 500px} to the CSS, and we can override the defaults used by the gist embedder.

2
Entering edit mode
2.9 years ago
Ram 34k

## Formatting text: Strikethrough

TL;DR: Use <s>TEXT TO STRIKETHROUGH</s> to strike-through text.

There are times we wish to change our statements, but retain the context. This often happens when we say something we believe to be true, but we gain some information and then go back and change this statement. This transition can be represented by striking out the previous thought process.

For example, my original statement could be:

I don't think it's possible to perform decimal point calculations in bash.

Someone then replies to it showing how the bc tool can be used for bash calculations, and this leaves me with three options:

• Leave my comment as-is, so readers would need to read the entire thread to know I was mistaken
• Edit my comment, remove all the content and mention bc (Might get confusing owing to a lack of context)
• Retain my old context and add new content, showing that my thinking has evolved, like so:

I don't think it's possible to perform decimal point calculations in bash. EDIT: I was mistaken, see comments below for a description of the bc tool that performs bash calculations.

That is where strikethroughs are useful.

How do we use them? Unfortunately, biostars markdown does not support the strikethrough tokens(~TEXT TO STRIKETHROUGH~) yet. However, it does support the HTML tag that does strikethrough (<s>). By surrounding text with the tag <s>TEXT TO STRIKETHROUGH</s>, we can produce the strikethrough effect like so: TEXT TO STRIKETHROUGH.

Of course, there are many more contexts where strikethroughs are useful, I'll leave it up to you to explore them!

2
Entering edit mode
2.9 years ago
Ram 34k

## Formatting text: Tabular Data

TL;DR: Use cat file.tsv | column -t -s '\t' to output text in a visually pleasing manner, then copy paste to either biostars (using code formatting) or to a github gist. Often, we need to use tabular data in our post content, and machine-parseable tabular content is not always easy on the eyes, as it can look quite mis-aligned. All content pasted below has been subject to code formatting (using the 101010 button in the toolbar, highlighted in the image below) See, for example, a dataset from HGNC: HGNC ID Approved Symbol Approved Name Previous Symbols Synonyms HGNC:5 A1BG alpha-1-B glycoprotein HGNC:37133 A1BG-AS1 A1BG antisense RNA 1 NCRNA00181, A1BGAS, A1BG-AS FLJ23569 HGNC:24086 A1CF APOBEC1 complementation factor ACF, ASP, ACF64, ACF65, APOBEC1CF HGNC:7 A2M alpha-2-macroglobulin FWP007, S863-7, CPAMD5 HGNC:27057 A2M-AS1 A2M antisense RNA 1 HGNC:23336 A2ML1 alpha-2-macroglobulin like 1 CPAMD9 FLJ25179, p170 HGNC:41022 A2ML1-AS1 A2ML1 antisense RNA 1 HGNC:41523 A2ML1-AS2 A2ML1 antisense RNA 2 HGNC:8 A2MP1 alpha-2-macroglobulin pseudogene 1 A2MP  You can see it's messy. Fields are separated by a single TAB character, and while that makes it easy for utilities such as awk or cut, the human eyes cannot view pick out the 5th column in the 7th row without some effort (and risking a wide margin for error). This problem is amplified when the columns are homogeneous, for example, when we're viewing Normalized log2 counts in RNAseq data. gene TCGA.A1.A0SE TCGA.A1.A0SH TCGA.A1.A0SJ TCGA.A1.A0SK TCGA.A1.A0SM TCGA.A1.A0SO TCGA.A1.A0SP TCGA.A2.A04P TCGA.A2.A04Q hsa-let-7a-1 12.6169 12.5752 12.6773 11.8037 12.7343 11.3008 12.4393 12.7181 11.8223 hsa-let-7a-2 13.6169 13.5573 13.6806 12.8041 13.7251 12.3484 13.4698 13.7182 12.8123 hsa-let-7a-3 12.6344 12.5841 12.692 11.8315 12.7827 11.3531 12.4921 12.7756 11.8996 hsa-let-7b 15.4405 15.5052 15.6086 14.5116 16.037 12.8137 15.033 14.0804 13.7427 hsa-let-7c 12.0564 12.8274 11.4256 9.6178 10.8023 11.5737 10.8517 12.7046 11.4696 hsa-let-7d 8.6969 9.3829 8.6306 10.4122 8.2413 10.2243 9.9569 10.8403 9.8383  How can this be made better for display? Do we manually move each column around so they're aligned? No! This is where the column utility is really handy. column is used to format delimiter-separated text to make it pretty. The output is a little more challenging to parse with utilities, but is a lot easier on the eyes. This is how the above snippets look like when formatted by pipeing the content to | column -t -s'\t'

HGNC ID     Approved Symbol  Approved Name                                    Previous Symbols             Synonyms
HGNC:5      A1BG             alpha-1-B glycoprotein
HGNC:37133  A1BG-AS1         A1BG antisense RNA 1                             NCRNA00181, A1BGAS, A1BG-AS  FLJ23569
HGNC:24086  A1CF             APOBEC1 complementation factor                                                ACF, ASP, ACF64, ACF65, APOBEC1CF
HGNC:7      A2M              alpha-2-macroglobulin                                                         FWP007, S863-7, CPAMD5
HGNC:27057  A2M-AS1          A2M antisense RNA 1
HGNC:23336  A2ML1            alpha-2-macroglobulin like 1                     CPAMD9                       FLJ25179, p170
HGNC:41022  A2ML1-AS1        A2ML1 antisense RNA 1
HGNC:41523  A2ML1-AS2        A2ML1 antisense RNA 2
HGNC:8      A2MP1            alpha-2-macroglobulin pseudogene 1               A2MP


And the RNAseq dataset:

gene          TCGA.A1.A0SE  TCGA.A1.A0SH  TCGA.A1.A0SJ  TCGA.A1.A0SK  TCGA.A1.A0SM  TCGA.A1.A0SO  TCGA.A1.A0SP  TCGA.A2.A04P  TCGA.A2.A04Q
hsa-let-7a-1  12.6169       12.5752       12.6773       11.8037       12.7343       11.3008       12.4393       12.7181       11.8223
hsa-let-7a-2  13.6169       13.5573       13.6806       12.8041       13.7251       12.3484       13.4698       13.7182       12.8123
hsa-let-7a-3  12.6344       12.5841       12.692        11.8315       12.7827       11.3531       12.4921       12.7756       11.8996
hsa-let-7b    15.4405       15.5052       15.6086       14.5116       16.037        12.8137       15.033        14.0804       13.7427
hsa-let-7c    12.0564       12.8274       11.4256       9.6178        10.8023       11.5737       10.8517       12.7046       11.4696
hsa-let-7d    8.6969        9.3829        8.6306        10.4122       8.2413        10.2243       9.9569        10.8403       9.8383


Don't these look prettier and easier on the eyes? You can combine this with the gist trick to make your content look better and enable people to download the raw text!