Below is the configuration of a live production cluster , that shall serve your purpose.


For NameNode:


RAM:  64 GB,

Hard disk: 1 TB

Processor: Xenon with 8 Cores

Ethernet: 3 X 10 GB/s

OS: 32bit CentOS


For DataNode:


RAM: 16GB

Hard disk: 6 X 2TB

Processor: Xenon with 2 cores.

Ethernet: 3 X 10 GB/s

OS: 32bit CentOS



For Secondary-NameNode:

RAM:  32 GB,

Hard disk: 1 TB

Processor: Xenon with 4 Cores

Ethernet: 3 X 10 GB/s

OS: 32bit CentOS


The above configurations were dealing with about 10-15 TB of data per customer on an average and we had 3-4 customers who were using this functionality and we found that it was serving us well. We were also performing some real complex queries on this by slicing and dicing the data.

So, this configuration should serve well until you hit a PB of data. But the configuration of your production system not only depends on the size of data that we have but also the variety of data and the complexity of analysis that we plan to do on that.


Hence there is no such thing as an ideal cluster configuration for your production environment. It depends on the data set that you have and also the type of analysis that you want to do on that data.


Please feel free to revert if you need any further help