I am interested in characterizing how an amino acid is distributed along the length of a given protein sequence and if that distribution is significantly different from a random distribution of that amino acid over a sequence of equal length. I would like to determine this for each sequence in a fasta file with 1,000s of sequences.
Ideally I would also like to be able to define where the amino acid is enriched in the sequence (N vs C term for example) as well.
Is anyone familiar with any packages (preferably in R) which do something similar?
The closest I have found is the extractCTDD() function in protr, but unfortunately this only looks at the distribution of pre-set groups of amino acids.
Any help would be much appreciated!