I have a distance matrix with more than 1M nodes. I wonder whether there exist any Neighbour Joining implementation which could calculate a tree structure within a reasonable amount of time. The complexity for Neighbour Joining is N^3, so I am not sure whether there is any heuristic algorithm could do this.

Currently, I have a distance matrix. If there is another comparable tree building algorithm starting from distance matrix, I would like to try that method also.

1M*1M=1TB. If you have a matrix in float, that is 4TB. An algorithm are likely to keep a few such matrices. Do you have a machine with ~10TB RAM? If your main purpose is clustering, there are more appropriate algorithms than NJ.

There are in fact efficient algorithms that do not require having the entire matrix loaded into memory.

I haven't seen a popular implementation of hierarchical clustering with complexity less than N^2

What will you do next after you get the tree? If you are to cluster nodes into groups, you can do clustering directly without a tree. There are many clustering algorithms. Hierarchical clustering is just a very basic one.

I need the tree structure for visualization.

Suppose you managed to do the clustering. How do you imagine showing the hierarchical structure of a tree with 1M leaf nodes? It sounds to me like your actual problem is much more about big-data visualization than it is about the computational challenge.

A tree with 1M leaf nodes is about 2M nodes and 2M edges. I was thinking to use Gephi.. But, I guess I should reduce the size of the data first.

If the starting point is an all-against-all similarity matrix, it is theoretically impossible to do better than N^2, since just reading through the similarity matrix is N^2.

Yes, you are right. N^2 is the best the algorithm can do.