I am trying to filter MSA for phylogenetric tree construction. So I removed columns with >33% of gap and also sequences with >33% gap. I also removed sequences with >70% sequence ideantity to reduce the number of sequences. I am wondering if it is the correct way of doing it. Thanks.
I agree with Aldo that trimAl is a very good tool to let's say "cut" alignments and get rid of the gappy regions, it might be OK for phylogeny but it might not be optimal for other analyses.
I don't think removing sequences with >70% similarity is a good idea, since you are losing a lot of information, although that may depend on the species you are using to build such MSA and their divergence time from each other. Specially for very close species the most informative alignments will be the most similar ones >90%.
You may be interested in selecting isoforms before the MSA and thus, obtaining better alignments that do not require that much trimming. That will not only improve your phylogenetic reconstruction but will give you better results if you do further analyses with those MSA such as positive selection or others. Reference: http://gbe.oxfordjournals.org/content/5/2/457
It sounds good to me. These kind of filtering steps can be specific to the alignment that you're using. You might need to vary the identity cutoffs for the gaps and sequence similarity to get it right. You'll know if you're removed too many positions from the topology of the tree, removing too many variable positions will cause nodes to sit on top of each other, but based on the settings that you've used I don't think that will happen.