I have written a hadoop program which may need to run some large data set, I can use our local cluster which has 10 nodes, but it may take one month to finish, so I need to use some more nodes like 30-50 nodes, or better 100 nodes. so I can reduce the running time to several days. I roughly remember there are some free cloud resource to access for academic, but I forgot where are they. can anyone point me to some computing resources? Thanks a lot
I am not aware of free cluster/cloud computing solutions that would permit you to use Hadoop. Perhaps you were thinking of Galaxy, but that is not what you are looking for.
You mentioned Amazon's free-tier, but I think you would quickly exceed the granted free computing time of 750h per month (on tiny micro instances -- mind you). However, it still might be financially viable to use Amazon's Web-Services, which come with Hadoop specific map-reduce configuration instructions; "spot instances" can be used to get extremely cheap EC2 Linux virtual machines up and running.
For example, last week I ran "High-Memory Quadruple Extra Large" instances (68GB memory, 8 cores, 1.6TB storage) at $US0.14 per hour, which normally cost $US1.80 per hour when rented as "on-demand instances." The only drawback of using spot-instances is that you are not guaranteed that they will run as long as you require, because you set a maximum price you are willing to pay and the hourly recalculated $US-price per hour might exceed your maximum price bid.
Google Compute Engine
You could also enquire with Google's Compute Engine. Google is currently giving preview access to selected members of the community. I did not get in, but perhaps you have better luck.
Hope that helps.