I am working with a FASTA file in which each read contains a repetitive sequence of variable length at the 5' end. For instance, in the below file:
>seq1 CCCCAAAACCCCAAAACCCCGATGATCATGGATC >seq2 CCCCAAAACCCCGATGGCATCATTCA >seq3 CCCCAAAACCCCAAAATATGTTGCTACTAG
I would like to remove the repetitive sequence of C's and A's from the 5' end of each read, but whatever solution I use should take into account that there may be any number of repetitive units, including a repetitive C block without a subsequent A block (see "seq2" above).
If this can be done in the Mac OSX command line, that would be optimal. I am also interested in software packages that may be able to accomplish this. Thank you for any help you can offer.