Recently I've been given the task of collecting all the available amino acid sequences for a protein that is associated with a disease, the idea is to collect all the available mutations associated with the protein and make conclusions from their pattern.
My first approach was trying biomart from Ensembl, I've added the ensembl gene ID in the filters, as well as the disease name in the filters (since its already available as an option), and in the results I selected the protein ID and the sequences, however, the result generated was some 67K sequences which is unlikely to be correct, I've noticed some normal healthy sequences were also within the results, hence, i ditched it all (comment if you think I'm did something wrong).
My second approach was going to all the protein sequences of that gene in NCBI and check them 1 by 1 if they're actually a mutant, obviously, this is taking forever...
Any advice on a more efficient way I could do this? i.e. search for a protein and retrieve the sequence of all of its mutants, the protein I'm trying to find its mutant forms is Beta-amyloid precursor protein (associated with Alzheimer).