You touch on some really important questions, the answers to which will depend on what works best for your dataset so I highly recommend trying things multiple ways and seeing if they line up with your expectations of the data. The method for estimating K can also give varying results so again try several ways. I recently wrote a tutorial for estimating K for RNAseq data which you could find useful!
For these questions I'm assuming you're referring to expression data.
1) Is it mandatory to scale the dataset before? If yes, why?
I personally think this is important if you want to extract clusters based on gene profiles and not gene expression values. Without scaling your highest expressed genes will cluster as one group, your lowest as another etc. Scaling allows genes of similar profile to cluster together regardless of their absolute expression level.
2) Is it recommendable to remove outliers before? If yes, what's the best method to evaluate which outlier to exclude?
Outliers removal should be very carefully considered and only for good reason. You can always run it both ways and compare. Outlier samples can be identified by hierarchical clustering of the samples, outlier genes is another story. People sometimes filter lowly expressed genes as noise. You can also filter the genes a posteriori using their correlation to the cluster mean.
3) I have seen that the estimated K could be used also for hierarchical clustering.
It is indeed possible to extract K-clusters from a tree. This can be used as a way to cross validate your clusters.