How do I remove invalid characters E, F, I, L, P, Q from a fasta file?
I get these invalid characters while running muscle command for multiple sequence alignment.
It's amino acid code. Does it assume the input is DNA for some reason?
Yes it does. Can I clean these characters from the file?
No, you should make sure your input is DNA
Please can you help me with the muscle command line for that?
Please read the manual, especially section 3. Asking us to do your work for you is not good etiquette.
I did use the -seqtype nucleo option in the muscle command still I was getting the error message so I was asking for the code that does not give an error message as:
I* ERROR * Invalid parameter -SeqType nucleo
That doesn't change the fact that your sequences aren't nucleotides.
You already have an answer - you can only align multiple sequences of a single type (protein/DNA) using muscle. If you're sure your sequences are of the same type, we can help you with any error message you're seeing.
In a nucleotide fasta file, you won't be able to get this type of error. Make sure you have DNA file. And for muscle command please try this simple one and check you are getting the desired result
muscle -in seqs.fa -out seqs.afa
Asaf has already stated the point you're trying to make - what is the value you're adding to the discussion?