Hadoop: The Definitive Guide
Posted: June 9, 2009 Filed under: FOSS | Tags: cloud computing, hadoop, hbase, Pig, zookeeper Leave a commentO’REILLY 에서 책이 출시 된 것 같네요. http://oreilly.com/catalog/9780596521974/
다음과 같은 내용을 다루고 있다고 합니다.
- Use the Hadoop Distributed File System (HDFS) for storing large
datasets, and run distributed computations over those datasets using
MapReduce - Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
- Discover common pitfalls and advanced features for writing real-world MapReduce programs
- Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
- Use Pig, a high-level query language for large-scale data processing
- Take advantage of HBase, Hadoop’s database for structured and semi-structured data
- Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems
그동안 소스도 분석해 보고 Hadoop 기반 어플리케이션도 짜보고 했지만 좀 더 체계적으로 알고 싶은 마음에 질러볼까합니다.
그런데 바빠서 볼 수 있을지 ~(~_~)~
Adding new data nodes to Hadoop without rebooting
Posted: October 23, 2008 Filed under: FOSS | Tags: hadoop 5 CommentsUsually, I have been wonder how to new data nodes (or recovered nodes) to Hadoop without rebooting. Recently, I found the solution from hadoop core-user mailing list.
The way is very simple as follows:
1. configure conf/slaves and *.xml files on master machine
2. configure conf/master and *.xml files on slave machine
3. run ${HADOOP}/bin/hadoop datanode
If you have to add more than one data node to Hadoop, run the following command (instead of the third command above) on master machine.
${HADOOP}/bin/start-all.sh
Additionally, the way to add a region server to Hbase master without restarting all is similar to that of Hadoop.
1. configure conf/regionservers and *.xml files on master machine
2. configure conf/*.xml files on slave machine
3. run ${HBASE}/bin/hbase regionserver start
Three nice articles that address Very Large Data Base
Posted: September 26, 2008 Filed under: Research | Tags: bigTable, google, hadoop, hbase, map-reduce, vldb 1 Comment- Big Data: The futhre of biocuration, Nature
- Greenplum MapReduce for the Petabyte Database
- Aster nCluster: In-Database MapReduce