Someone suggested me that the ratio between Mean non-synonymous substitutions per non-synonymous site (dN) and Mean synonymous substitutions per synonymous site (dS) should be calculated considering at least three sequences from each of the species. However, the explanation given was not understood by me. Can anyone please elucidate why should we consider multiple sequences from each species while calculating dN/dS? Another query is, what if only one sequence is available for some of the species for that particular coding sequence under study?
There is no general reason to use 3 sequences per species when you calculate dN/dS.
The dN/dS ratio tells you about differences that have been fixed between species. Variation within a species doesn't help you with that (since within-species represent transient polymorphisms, not fixations).
There may be specific cases in which sampling within a species is important. The only one I can think of from the top of my head is recently diverged species in which polymorphisms from the ancestral population might be retained by the 'daughter' species. However, dN/dS is probably not a useful measure in that case anyway.