To me it's easiest to understand the key biological issues underlying protein scoring matrices by taking a historical view. In the 1960s researchers solved the primary amino acid sequence of many hundreds of proteins and grouped them into families such as cytochromes c and globins. When you align two (or more) proteins, it's easy to search for identities, and it's sometimes straightforward to introduce gaps into the proteins to help create the alignment. But when aligned residues are not identical, what score(s) should be assigned? You need to answer this to figure out an overall score for the alignment of any two proteins. Pauling, Zukerkandl and others realized there are 20 amino acids, and they made a 20 x 20 matrix of all the known globins (they started with a total of a dozen!). They looked at the sequence data empirically: which substitutions occur in nature? We need rules to know what scores to assign when we compare any proteins. Cysteine rarely changed to any other residue, so aligning cys with another residue was assigned a large negative score. Glutamate and aspartate exchanged frequently, as did lysine and arginine, so those alignments were assigned positive scores.
By the mid-1960s it was appreciated that each protein family has its own "personality": for cytochromes c very few amino acid substitutions were tolerated anywhere along the length of the protein (why? because substitutions are fatal in those particularly highly conserved protein families). Other proteins (e.g. globins) tolerated substitutions fairly well, while yet others (fibrinopeptides) changed very rapidly. Thus the molecular clock hypothesis was introduced: within a given protein family, the rate of substitution was observed to be constant across lineages.
All this thinking led to the introduction of PAM matrices (Margaret Dayhoff and colleagues, 1966 or so). Later BLOSUM matrices were introduced and were slightly more effective at helping identify related proteins (e.g. if you have a favorite protein and you search a database for homologs, you need a scoring matrix to help guide an algorithm that matches significantly related proteins). Today we use programs like DELTA-BLAST (NCBI) which dynamically, quickly, and effectively "customizes" your scoring matrix to your particular query. In a related approach we use hidden Markov models (e.g. HMMer) to customize searches in a probabilistic framework.