Hadoop: The Definitive Guide

O’REILLY 에서 책이 출시 된 것 같네요. http://oreilly.com/catalog/9780596521974/
다음과 같은 내용을 다루고 있다고 합니다.

  • Use the Hadoop Distributed File System (HDFS) for storing large
    datasets, and run distributed computations over those datasets using
    MapReduce
  • Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Take advantage of HBase, Hadoop’s database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

그동안 소스도 분석해 보고 Hadoop 기반 어플리케이션도 짜보고 했지만 좀 더 체계적으로 알고 싶은 마음에 질러볼까합니다.
그런데 바빠서 볼 수 있을지 ~(~_~)~


Adding new data nodes to Hadoop without rebooting

Usually, I have been wonder how to new data nodes (or recovered nodes) to Hadoop without rebooting. Recently, I found the solution from hadoop core-user mailing list.

The way is very simple as follows:

1. configure conf/slaves and *.xml files on master machine
2. configure conf/master and *.xml files on slave machine
3. run ${HADOOP}/bin/hadoop datanode

If you have to add more than one data node to Hadoop, run the following command (instead of the third command above) on master machine.

${HADOOP}/bin/start-all.sh

Additionally, the way to add a region server to Hbase master without restarting all is similar to that of Hadoop.

1. configure conf/regionservers and *.xml files on master machine
2. configure conf/*.xml files on slave machine
3. run ${HBASE}/bin/hbase regionserver start


Three nice articles that address Very Large Data Base