But one thing I do not understand is why people would want to convert VCF to MAF format. My understanding on VCF files is that they can be annotated by tools such as Annovar to identify amino acid residues mutated, this information is more convenient than chromosomal coordinates in MAF files.
I've also noticed that almost all TCGA data are in MAF format, if I want to study a cancer type of interest, what's the appropriate way of annotating MAF files from TCGA?
The only reason that MAFs exist is for a human-readable list of mutations that folks could load into a spreadsheet for manual review. VCFs are preferred in bioinformatics pipelines because they are a superset of the information that a MAF can contain, and they discourage the use of spreadsheets! TCGA generates VCFs for some cancer types, but you need TCGA credentials to access those, because they contain some germline calls. Given a TCGA MAF, the maf2vcf script will convert it into a generic VCF format, which you can then annotate with Ensembl's VEP, snpEff, Annovar, etc.