I am studying about DNA compression algorithm because it is expected that the dna data will be enormous and increase. The other day, I get a DNA data in fasta format. The data is about 2GB. But I realized. Fasta format is written 'A','T','G','C' by character code. So it takes 8bit per ATGC. But they can be expressed 2bit by binary code. for instance, A=00, T=01, G=10, C=11 To use binary format will reduce its redundancy and be able to storage much smaller size. I think using binary format is better way than using character code.
Are there special reasons that store using a character code.