MapReduce Online Comes Out!

Posted: October 20, 2009 | Author: Hyunsik Choi | Filed under: Research | Tags: hadoop, map-reduce, online aggregation, stream queries | 3 Comments

MapReduce has been gaining much attention in data intensive computing field. As you know, it is well known as a very popular framework for batch-processing.

Recently, however, Tyson Condie who is a Ph.D student in UC Berkeley accomplishes MapReduce Online. Today, I heard this news from Data Beta. Actually, It is amazing works since the original MapReduce is specialized and designed for only batch-processing. In addition, most people believe that MapReduce will remain a batch-processing.

The essential of MapReduce online is that it tries to hold the fault-tolerance model of the original MapReduce, whereas it provides the the pipelining of results across tasks and jobs instead of materializing the output of each MapReduce task and job into disk. Consequently, MapReduce online enables the program to return the result earlier from a big job.

You can get further information from MapReduce Online.

BSP Library on Hadoop?

Posted: October 9, 2009 | Author: Hyunsik Choi | Filed under: FOSS, Research | Tags: angrapa, apache, bsp, bulk synchronization parallel, distributed systems, hadoop, hama | 2 Comments

Recently, I started to participate in the Hama project (a distributed scientific package on Hadoop for massive matrix and graph data), and I have taken the times to develop the bulk synchronization parallel (BSP) library on Hadoop (HAMA-195); I’m getting help from Edword Yoon, a founder of Hama project. The motivation of BSP lib is definitely clear.

The hadoop platforms are installed in cloud computing service providers and many companies as you can see in http://wiki.apache.org/hadoop/PoweredBy. However, most of them may use only MapReduce programs. As you know although MapReduce is very scalability, but it provides only the simple programming model. Many programmers want to use more various programming model without changing the platform (i.e., Hadoop). This BSP lib will be the beginning for their desires. However, like MapReduce, BSP may also be not swiss army knife. When we find appropriate applications, BSP lib on Hadoop will be valued for its scalability and ability.

Sooner, I’ll post articles about the progress of BSP library and Angrapa (the graph package on Hama).

Java Universal Network/Graph Framework

Posted: September 15, 2009 | Author: Hyunsik Choi | Filed under: Research | Tags: graph, java, jung, visualization tools | 2 Comments

Recently, I’m primarily concerned with large-scale graph data processing. Occasionally, the visualization of graph can be a good way for us to observe some properties from graph data sets. Today, I’m going to introduce a graph framework, called Java Universal Network/Graph Framework (Jung). Jung provides data structures for graph, a programming interface familiar with graph features, some fundamental graph algorithms (e.g., minimum spanning tree, depth-first search, breath-first search, and dijkstra algorithm), and even visualization methods. Especially, I’m interested in its visualization methods.

The following java source shows the programming interface of Jung. In more detail, this program make a graph, add three vertices to the graph, and connect vertices. This source code is brought from Jung tutorial. As you can see, Jung’s APIs are very easy.

  // Make a graph by a SparseMultigraph instance.
  Graph&lt;Integer, String&gt; g = new SparseMultigraph&lt;Integer, String&gt;();
  g.addVertex((Integer)1); // Add a vertex with an integer 1
  g.addVertex((Integer)2);
  g.addVertex((Integer)3);
  g.addEdge(&quot;Edge-A&quot;, 1,3); // Added an edge to connect between 1 and 3 vertices.
  g.addEdge(&quot;Edge-B&quot;, 2,3, EdgeType.DIRECTED);
  g.addEdge(&quot;Edge-C&quot;, 3, 2, EdgeType.DIRECTED);
  g.addEdge(&quot;Edge-P&quot;, 2,3); // A parallel edge

  // Make some objects for graph layout and visualization.
  Layout&lt;Integer, String&gt; layout = new KKLayout&lt;Integer, String&gt;(g);
  BasicVisualizationServer&lt;Integer, String&gt; vv =
  new BasicVisualizationServer&lt;Integer, String&gt;(layout);
  vv.setPreferredSize(new Dimension(800,800));

  // It determine how each vertex with its value is represented in a diagram.
  ToStringLabeller&lt;Integer&gt; vertexPaint = new ToStringLabeller&lt;Integer&gt;() {
    public String transform(Integer i) {
    return &quot;&quot;+i;
   }
  };

  vv.getRenderContext().setVertexLabelTransformer(vertexPaint);

  JFrame frame = new JFrame(&quot;Simple Graph View&quot;);
  frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
  frame.getContentPane().add(vv);
  frame.pack();
  frame.setVisible(true);

Some APIs of the Jung are based on generic programming, so you can use easily vertices or edges to contains user-defined data. If you want more detail information, visit http://jung.sourceforge.net.

The above source code shows the following diagram.

Dive Into A Data Deluge

Discussion about Newly Emerging Issues on Database

MapReduce Online Comes Out!

BSP Library on Hadoop?

Java Universal Network/Graph Framework

Categories

Archives

Dive Into A Data Deluge

Discussion about Newly Emerging Issues on Database

MapReduce Online Comes Out!

Share this:

BSP Library on Hadoop?

Share this:

Java Universal Network/Graph Framework

Share this:

Categories

Archives