11 months ago by
EMBL Heidelberg, Germany
The difference between the mentioned character data types lies in the maximum size of a string one can store in the table. VARCHAR(x) limits the strings to x characters whereas TEXT has a max size of 65535 bytes and the max for LONGTEXT is 4 GB (there's also MEDIUMTEXT at 16MB). Note that for all these types, the storage space occupied by a string is the same, e.g. the string 'mystring' is always stored as 8 characters but the actual disk space required in bytes depends on the encoding. The collation is the set of rules governing string comparisons (because the order of characters is alphabet/language dependent). The collation matters for searches and index creation.
You should choose the data type and encoding that can represent all your sequences (e.g. are they going to contain funny characters?) and the collation that gives you a relevant ordering and indexing for your purpose.
For sequences without funny characters, ASCII encoding should be fine, otherwise for most West European languages, Latin1. If unicode support is needed, use UTF8. Usually encodings have an associated default collation which should be used to avoid surprises.
For more info, check the MySQL documentation on character sets and collations.