gcc -o biostar342247 biostar342247.c
./biostar342247 ATTATGCGGGGAATTT AGACTGATCGATCGTAGCAA
Imagine I have a DNA sequence (e.g., dummy and short, ATTATGCGGGGAATTT ) and I would like to colour the different nucleotides with different coluours, given that I have a vector indicating the position of the nucleotides to be coloured:
(1,3,6,7,8,10) <- to be coloured in red
(2,4,5,12) <- to be coloured in green
How would you do that (if you don't want to do that manually) ?
Here's a solution using R:
library(crayon)
string <- "ATTATGCGGGGAATTT"
sp <- strsplit(string, split = "")[[1]]
df <- data.frame("nucleotide" = as.character(sp), stringsAsFactors = F)
redVector <- c(1,3,6,7,8,10)
greenVector <- c(2,4,5,12)
df$ntColored <- df$nucleotide
df[redVector, "ntColored"] <- red(df[redVector, "ntColored"])
df[greenVector, "ntColored"] <- green(df[greenVector, "ntColored"])
cat(df$ntColored)
edit: just for fun, it's also easy to colour by letter:
df$byLetter <- ifelse(df$nucleotide == "A", df$byLetter <- blue("A"),
ifelse(df$nucleotide == "C", df$byLetter <- red("C"),
ifelse(df$nucleotide == "G", df$byLetter <- green("G"),
df$byLetter <- yellow("T"))
)
)
cat(df$byLetter)
How do you intend to use the output? The answers here depend on the use of so-called ‘escape’ sequences which are invisible characters which terminals that support colour interpret.
These characters are not supported in every possible application though. I’m not personally aware of any image editors that support them natively.
The best solution I can think of is a screenshot?
You can view the escape characters but piping the output of the tools (maybe not the R one, I’m not 100% sure how that one works), to cat -v
E.g.
$ Colorise_script -arg ATGC | cat -v
Apologies for the bad formulation. I asked for colours in general and different solutions have been proposed. Among these, I have followed what looked the most suitable for me. What I am aiming to do is:
Thanks
From what I can find, there is no better option than screenshotting the output.
It is theoretically possible to pipe STDOUT from the terminal, as this post explains. The only option to support colours however is enscript
, which would mean you could only generate postscript files. enscripts
colouration escape sequences are also not the same as an xterm
's, so an intermediate script to transliterate everything would be needed.
In short, its f*cking difficult.
The alternative would be to start from scratch in a language which has some support for creation of images, but this then becomes less about text manipulation, and more of a rendering problem, and none of the solutions here are in that vane.
This is convoluted, but you can use the textGrob function from grid package in R. You can create a textGrob
, which is a ggplot-like object that just contains text. You'll need to figure out the coloring, but once you create the textGrob, you can ggsave the textGrob object to get your image.
Good luck!
OP, you should have specified that "save to file" part at the outset. Colors depend on the renderer, not the file itself (of course, image and pdf files are the way to achieve portability). By leaving that out, people have spent their time helping you without the actual goal available to them. This kind of formulation frustrates people and makes them less inclined to help you out subsequently/follow up on questions you might have.
in C using ANSI escape codes
#include <stdio.h> | |
int main(int argc,char** argv) { | |
int i,n; | |
for(i=1;i< argc;i++) { | |
char* p=argv[i]; | |
n=0; | |
while(p[n]!=0) | |
{ | |
switch(n+1) | |
{ | |
case 1:case 3 : case 6: case 7: case 8: case 10: | |
printf("\x1b[31m%c\x1b[0m",p[n]); | |
break; | |
case 2: case 4: case 5: case 12: | |
printf("\x1b[32m%c\x1b[0m",p[n]); | |
break; | |
default: | |
putchar(p[n]); | |
break; | |
} | |
n++; | |
} | |
putchar('\n'); | |
} | |
return 0; | |
} |
A pure bash
option (because I apparently have nothing better to do).
Note that this script will not be particularly forgiving for different specifications on the command line...
# Usage:
# $ bash col_seq.sh <Sequence> <red> <green> <yellow> <blue>
#
# Indexes must be provided as a comma separated quoted string, e.g:
# $ bash col_seq.sh ATGTACGATCG "1,2" "3,4" "5,6" "7,8"
#
# You can miss a colour out, but will need to specify empty quotes: ""
# $ bash col_seq.sh ATGTACGATCG "1,2" "3,4" "" "7,8"
in_array() {
ARRAY=$2
for e in ${ARRAY[*]} ; do
if [[ "$e" == "$1" ]] ; then
return 0
fi
done
return 1
}
red(){
printf "\e[31m$1\e[0m"
}
green(){
printf "\e[32m$1\e[0m"
}
yellow(){
printf "\e[33m$1\e[0m"
}
blue(){
printf "\e[34m$1\e[0m"
}
string=$(echo "$1" | tr '[:lower:]' '[:upper:]')
IFS=',' read -r -a Rarray <<< "$2"
IFS=',' read -r -a Garray <<< "$3"
IFS=',' read -r -a Barray <<< "$4"
IFS=',' read -r -a Yarray <<< "$5"
for i in $(seq 1 "${#string}") ; do
if in_array "$i" "${Rarray[*]}" ; then
red "${string:i-1:1}"
elif in_array "$i" "${Garray[*]}" ; then
green "${string:i-1:1}"
elif in_array "$i" "${Yarray[*]}" ; then
yellow "${string:i-1:1}"
elif in_array "$i" "${Barray[*]}" ; then
blue "${string:i-1:1}"
else
printf "${string:i-1:1}"
fi
done
printf "\n"
Now you can do assorted bash
magic:
I got carried away...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I had this idea once for schollboys and the only way I found is to go throught HTML code
Color is manage by the CSS class.
This is hard code but you can develop a function to input a sequence and generate to appropriate
<span class="X">X</span>
for each baseDo you want to use multiple different colour arrays?