Question: make a different color for a given amino acid
1
gravatar for Learner
2.0 years ago by
Learner 240
Learner 240 wrote:

I have a xls file which I want to color some strings differently

My xlsx file is like below lets call it sample.xlsx

>sp|Q96B97|SH3K1_HUMAN SH3 domain-containing kinase-binding protein 1 OS=Homo sapiens OX=9606 GN=SH3KBP1 PE=1 SV=2
MVEAIVEFDYQAQHDDELTISVGEIITNIRKEDGGWW
>sp|Q9UBS4|DJB11_HUMAN DnaJ homolog subfamily B member 11 OS=Homo sapiens OX=9606 GN=DNAJB11 PE=1 SV=1
KLALQLHPDRNPDDPQAQEKFQDLGAAYEVLSDSEKRKQYD
>sp|P61916|NPC2_HUMAN NPC intracellular cholesterol transporter 2 OS=Homo sapiens OX=9606 GN=NPC2 PE=1 SV=1
CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP

I want to read it and then save it when I color some letters. I want to color D in a yellow color

>sp|Q96B97|SH3K1_HUMAN SH3 domain-containing kinase-binding protein 1 OS=Homo sapiens OX=9606 GN=SH3KBP1 PE=1 SV=2
MVEAIVEF**D**YQAQH**DD**ELTISVGEIITNIRKE**D**GGWW
>sp|Q9UBS4|DJB11_HUMAN DnaJ homolog subfamily B member 11 OS=Homo sapiens OX=9606 GN=DNAJB11 PE=1 SV=1
KLALQLHPDRNP**DD**PQAQEKFQ**D**LGAAYEVLS**D**SEKRKQY**D**
>sp|P61916|NPC2_HUMAN NPC intracellular cholesterol transporter 2 OS=Homo sapiens OX=9606 GN=NPC2 PE=1 SV=1
CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP

Now I can read the xls and print them

#!/usr/bin/perl

use strict;
use warnings;
use Excel::Reader::XLSX;

    my $reader   = Excel::Reader::XLSX->new();
    my $workbook = $reader->read_file( 'sample.xlsx' );
    if ( !defined $workbook ) {
        die $reader->error(), "\n";
    }
    for my $worksheet ( $workbook->worksheets() ) {
        my $sheetname = $worksheet->name();
        print "Sheet = $sheetname\n";
        while ( my $row = $worksheet->next_row() ) {
            while ( my $cell = $row->next_cell() ) {
                my $row   = $cell->row();
                my $col   = $cell->col();
                my $value = $cell->value();
                print "  Cell ($row, $col) = $value\n";
            }
        }
    }

#Sheet = Sheet1
#  Cell (0, 0) = >sp|Q96B97|SH3K1_HUMAN SH3 domain-containing kinase-binding protein 1 OS=Homo sapiens OX=9606 GN=SH3KBP1 PE=1 SV=2
#  Cell (1, 0) = MVEAIVEFDYQAQHDDELTISVGEIITNIRKEDGGWW
#  Cell (2, 0) = >sp|Q9UBS4|DJB11_HUMAN DnaJ homolog subfamily B member 11 OS=Homo sapiens OX=9606 GN=DNAJB11 PE=1 SV=1
#  Cell (3, 0) = KLALQLHPDRNPDDPQAQEKFQDLGAAYEVLSDSEKRKQYD
#  Cell (4, 0) = >sp|P61916|NPC2_HUMAN NPC intracellular cholesterol transporter 2 OS=Homo sapiens OX=9606 GN=NPC2 PE=1 SV=1
#  Cell (5, 0) = CQLSKGQSYSVNVTFTSNIQSKSSKAVVHGILMGVP

Now here I want to focus on the Cell(1,0) and Cell (3, 0) and Cell (5, 0)

perl • 559 views
ADD COMMENTlink modified 2.0 years ago by Joe19k • written 2.0 years ago by Learner 240

I'm confused, are you dealing with fasta or xlsx? If you want to color in the terminal you can print in color, see https://stackoverflow.com/questions/287871/print-in-terminal-with-colors for instance.

ADD REPLYlink written 2.0 years ago by Asaf8.5k

@Asaf I converted the Fasta to xlsx , it is because I am doing some other analysis. No I am not interested in coloring them in terminal. I want to save them with another color in xlsx

ADD REPLYlink written 2.0 years ago by Learner 240

would an R-based solution also work for you? The openxlsx package is great for formatting, see their exhaustive set of examples.

My guess is your code may look something along those lines (taken from the documentation linked above):

wb <- createWorkbook()
addWorksheet(wb, "cellIs")

# add data without styling
writeData(wb, "cellIs", -5:5)
writeData(wb, "cellIs", LETTERS[1:11], startCol=2)

# styling
negStyle <- createStyle(fontColour = "#9C0006", bgFill = "#FFC7CE") # define the colors
conditionalFormatting(wb, "cellIs", cols=1, rows=1:11, rule="!=0", style = negStyle) # define the condition for applying the style you just defined. Here the rule is that there shouldn't be a zero, in your case you may want to require letters
ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Friederike6.8k

irrespective of this, it may actually be easier to write a VBA script, like this one here or here. That way you won't have to deal with reading in and writing out etc.

I also thought this solution described here sounded nifty:

IF you had a consistent word you were looking for, you could create another column (and hide if needed) and use the formula =Find("Text",CellName) command to find a specific text in another cell. It would return a number of where that word starts. You could then conditional format based on if you received a number back or if it gave you a zero, the word is not there.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Friederike6.8k

@Friederike R could be a good choice, can you put more info there? a better script , it has no start no end ;-)

ADD REPLYlink written 2.0 years ago by Learner 240

well, the start would be something like:

library(openxslx)
indata <- read.xlsx(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE, rowNames = FALSE)

I'm actually not 100% sure if that would allow you to specify part of a string within a cell to be colored because I have a feeling that openxlsx mostly reproduces the formula capabilities of Excel (which do not seem to allow for conditional formatting of parts of strings within the same cell). I would, in fact, suggest to look into an Excel-based solution to this unless you have compelling reasons not to use those.

ADD REPLYlink written 2.0 years ago by Friederike6.8k

@Friederike this package is not even available for unix!!! use library("xlsx") if you can please

ADD REPLYlink written 2.0 years ago by Learner 240

sorry, you will need to install the library first, of course:

install.packages("openxlsx")

Works fine on my unix server

ADD REPLYlink written 2.0 years ago by Friederike6.8k

@Friederike I just ran your code! then what ? it is doing nothing ? what are you trying to do with R exactly ? I am a bit confused

ADD REPLYlink written 2.0 years ago by Learner 240

Please, please, don't use Excel.

ADD REPLYlink written 2.0 years ago by Joe19k

@jrj.healey ok, lets do it in text , what is your solution ?

ADD REPLYlink written 2.0 years ago by Learner 240

what do you want to achieve with the coloring?

ADD REPLYlink written 2.0 years ago by Friederike6.8k

@Friederike part of a code that I am writing

ADD REPLYlink written 2.0 years ago by Learner 240

I can give you a solution, but it won’t be in Perl... (see answer below).

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Joe19k
4
gravatar for Joe
2.0 years ago by
Joe19k
United Kingdom
Joe19k wrote:

Here’s a bash Function I keep handy:

quickcolor(){
 sed -e "s/A/$(tput setaf 1)A$(tput sgr0)/g" \
     -e "s/T/$(tput setaf 2)T$(tput sgr0)/g" \
     -e "s/C/$(tput setaf 3)C$(tput sgr0)/g" \
     -e "s/G/$(tput setaf 4)G$(tput sgr0)/g" $1 | cat
}

I can’t remember off then top of my head which nucleotide gets which colour, but you can change the number given to setaf to tweak it (it will need some modification to read a fasta/ignore headers)

Edit:

Sorry just noticed this thread was about amino acids. The same approach will work, you'll just have to modify the substitutions to suit.

Edit II:

Incorporated Asaf's addressing of headers, and added the ability to pass multiple fastas at once (e.g. $ quickcolor *.fasta)

quickcolor(){
 for i in "$@" ; do
  sed -e "/^>/!s/A/$(tput setaf 1)A$(tput sgr0)/g" \
      -e "/^>/!s/T/$(tput setaf 2)T$(tput sgr0)/g" \
      -e "/^>/!s/C/$(tput setaf 3)C$(tput sgr0)/g" \
      -e "/^>/!s/G/$(tput setaf 4)G$(tput sgr0)/g" "$i" | cat
 done
}
ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Joe19k
1

I think this one should receive "post of the year".

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by cschu1812.6k
1

Added to my .bashrc should be handy. And added a condition to avoid fasta headers:

quickcolor(){
 sed -e "/^>/!s/A/$(tput setaf 1)A$(tput sgr0)/g" \
 -e "/^>/!s/T/$(tput setaf 2)T$(tput sgr0)/g" \
 -e "/^>/!s/C/$(tput setaf 3)C$(tput sgr0)/g" \
 -e "/^>/!s/G/$(tput setaf 4)G$(tput sgr0)/g" $1 | cat
 }
ADD REPLYlink written 2.0 years ago by Asaf8.5k

That's the syntax I was looking for! ;)

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Joe19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1633 users visited in the last hour
_