I suspect that greater entropy or "randomness" in most genomic sequences — as well as a Huffman table often using more bits per character than 2bit — is why gzip could perform worse than 2bit.
To explain, the 2bit format stores 4 bits for each base: two bits for the base and another two to classify the type of base (intron, exon or unknown). That is in addition to some minor overhead for metadata for the file and for each FASTA record.
The gzip DEFLATE procedure generally uses a mix of Huffman coding, which builds a variable-bit-length dictionary from raw input processed with LZ77, which looks for and encodes repetitive regions.
It seems reasonable to believe that 2bit would be more efficient with genomic regions containing "noisy" or "random" sequence data, i.e., a good portion of a given genome, I'd wager. With 2bit, every base in a FASTA record's sequence data gets compressed to the same four bits, regardless of the overall information content in the sequence being squeezed.
Some special areas of the genome can be redundant - transcription factor motifs, repeats like satellite DNA, etc. - and will allow LZ77 to perform better, being able to squeeze those redundancies to a greater degree.
Once redundant data are compressed with LZ77, those data are reduced further with a Huffman encoding table, which I think could often use more than four bits per table entry, because there are at least ten bases (
actgACTGnN), lexicographical characters for FASTA headers, and special characters used in the LZ77 encoding that describe the redundancies.
So, in sum, 2bit always uses four bits per base (plus minor overhead), regardless of whether the sequence data is repetitive or not. As there is probably generally less redundancy in most genomic data to take advantage of, the gzip procedure likely performs worse, for typical, average inputs.
To answer your last question, it is difficult to compress a file with a series of two or more modern algorithms and expect better results. As you compress data, you lose bits that any applied algorithm can take out — that's the definition of compression, after all!
Otherwise, you could just apply a magical compression algorithm repeatedly to the same data, in series, and reduce the original input by some exponential power. With real world compression algorithms, at some point, you can't take any more squished stuff out and still be able to get your old, un-squished stuff back.
What you could do, I think, is take a 2bit representation of what you know in advance is highly redundant DNA sequence data, squeezing that with a compression tool that is designed to take advantage of redundant input, like gzip or bzip2. You might be able to squeeze that a bit more.