<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Dive Into A Data Deluge &#187; database</title>
	<atom:link href="http://diveintodata.org/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://diveintodata.org</link>
	<description>Discussion about Newly Emerging Issues on Database</description>
	<lastBuildDate>Wed, 28 Dec 2011 14:16:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='diveintodata.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Dive Into A Data Deluge &#187; database</title>
		<link>http://diveintodata.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://diveintodata.org/osd.xml" title="Dive Into A Data Deluge" />
	<atom:link rel='hub' href='http://diveintodata.org/?pushpress=hub'/>
		<item>
		<title>A Brief Introduction to Skyline Problem (Pareto-optimal Tuples) (1)</title>
		<link>http://diveintodata.org/2009/09/06/a-brief-introduction-to-skyline-problem-pareto-optimal-tuples-1/</link>
		<comments>http://diveintodata.org/2009/09/06/a-brief-introduction-to-skyline-problem-pareto-optimal-tuples-1/#comments</comments>
		<pubDate>Sun, 06 Sep 2009 06:27:09 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[decision making]]></category>
		<category><![CDATA[pareto tuples]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[skyline]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=78</guid>
		<description><![CDATA[The skyline problem is to compute the best tuples from a set of ordered d-tuples. The name is originated from what the solution represented on 2d plane resembles the scene that urban buildings comprise. Skyline is one of the recommendation queries, and it is considering multi criteria. It is very interesting problem as well as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=78&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><span class="dropcaps">The s</span>kyline problem is to compute the best tuples from a set of ordered <em>d</em>-tuples. The name is originated from what the solution represented on 2d plane resembles the scene that urban buildings comprise. Skyline is one of the recommendation queries, and it is considering multi criteria. It is very interesting problem as well as very useful query. This problem has been being intensively studied for recent years. Today, I’m going to present the problem definition of skyline. Next time, I&#8217;ll describe several algorithms for the skyline problem.</p>
<p><a style="float:left;margin-right:5px;" title="Singapore Skyline (#12) by Christopher Chan, on Flickr" href="http://www.flickr.com/photos/chanc/469796567/"><img src="http://farm1.static.flickr.com/226/469796567_311f4a3b79.jpg" alt="Singapore Skyline (#12)" width="250" /></a> First of all, let us know the input data. The input data <img src="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" alt="D^{d}" /> of skyline is a set of <em>n</em> ordered <em>d-</em>tuples, each of which consists of ordered <em>d</em> scalar values. They are shown in below formulas:</p>
<p><img style="display:block;float:none;margin-left:auto;margin-right:auto;" src="http://www.codecogs.com/eq.latex?D^{d}%20=%20{tp_{1},tp_{2},...tp_{n}}" alt="D^{d} = {tp_{1},tp_{2},...tp_{n}}" /></p>
<div id="equationview" style="text-align:center;">
<div id="equationview"><img src="http://www.codecogs.com/eq.latex?tp_%7Bi%7D%20=%20%28v_%7B1%7D,v_%7B2%7D,...,v_%7Bd%7D%29" border="0" alt="tp_{i} = (v_{1},v_{2},...,v_{d})" align="absmiddle" /></div>
</div>
<p><em> </em></p>
<div id="equationview"><img src="http://www.codecogs.com/eq.latex?tp_%7Bi%7D" border="0" alt="tp_{i}" align="absmiddle" /> denotes a <em>d</em>-tuple. And, we need to understand the definition of the dominance relation. In addition, because the skyline problem is to find the better tuples, we need an assumption about &#8216;better&#8217;. In most literature, it is assumed that the less value is better, so we follow this assumption.</div>
<blockquote><p><span style="background-color:#ffffff;"><strong>Definition 1 (Dominance). </strong></span><span style="background-color:#ffffff;">Let <em>tp</em> and <em>tp’</em> be tuples in <img src="http://www.codecogs.com/eq.latex?D^{d}" alt="D^{d}" /> where </span><img src="http://www.codecogs.com/eq.latex?v_%7Bi%7D" border="0" alt="v_{i}" align="absmiddle" /> <span style="background-color:#ffffff;">is an element of <em>tp</em> and </span><img src="http://www.codecogs.com/eq.latex?u_%7Bi%7D" border="0" alt="u_{i}" align="absmiddle" /> <span style="background-color:#ffffff;">is an element of <em>tp&#8217; </em>for </span><img src="http://www.codecogs.com/eq.latex?1%20%3C%20i%20%5Cleq%20d" border="0" alt="1 &lt; i leq d" align="absmiddle" /><span style="background-color:#ffffff;">. Then, <em>tp</em> <strong>dominates</strong> <em>tp’</em> </span><span style="background-color:#ffffff;">if and only if  <img src="http://www.codecogs.com/eq.latex?forall{i},%20v_{i}%20leq%20u_{i}%20land%20exists{j},%20v_{j}%20%3C%20u_{j}" alt="forall{i}, v_{i} leq u_{i} land exists{j}, v_{j} &lt; u_{j}" width="182" height="17" />.</span></p></blockquote>
<p>In other words, it is said that one tuple <img src="http://www.codecogs.com/eq.latex?tp" border="0" alt="tp" align="absmiddle" /> dominates another tuple <img src="http://www.codecogs.com/eq.latex?tp%27" border="0" alt="tp'" align="absmiddle" /> if <img src="http://www.codecogs.com/eq.latex?tp" border="0" alt="tp" align="absmiddle" /> is not worse (not greater) than <img src="http://www.codecogs.com/eq.latex?tp%27" border="0" alt="tp'" align="absmiddle" /> in all dimensions and<em> </em><img src="http://www.codecogs.com/eq.latex?tp" border="0" alt="tp" align="absmiddle" /> is better (less) than <img src="http://www.codecogs.com/eq.latex?tp%27" border="0" alt="tp'" align="absmiddle" /> in at least one dimension.</p>
<blockquote><p><strong>Definition 2 (Skyline)</strong> Given a data set <img src="http://www.codecogs.com/eq.latex?D^{d}" alt="D^{d}" />, a skyline contains tuples that is not dominated any other tuples in <img src="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" alt="D^{d}" />.</p></blockquote>
<p>As I described above definition, a skyline is a set of tuples and the tuples are not dominated by any other tuples in <img src="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" alt="D^{d}" />. In literature, a <em>d</em>-dimensional data set and above two definitions are usually represented for comprehensive description to <em>d</em>-points on <em>d</em>-axies.</p>
<p style="text-align:left;">Without loss of generality, we assume that <img src="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" alt="D^{d}" /> is a 2d data set (i.e., <em>d</em>=2). A data set is given as follows:</p>
<ul>
<li>a = (3,2)</li>
<li>b = (8,1)</li>
<li>c = (1,10)</li>
<li>d = (4,3)</li>
<li>e = (8,6)</li>
</ul>
<p style="text-align:left;">Each element of a tuple in <img src="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" alt="D^{d}" /> can be represented to one axis. In other words, the first element and the second element of tuples are represented to X and Y axies respectively. Then, tuples of above list are represented to 2d points as shown in Fig. 1.</p>
<div id="attachment_324" class="wp-caption aligncenter" style="width: 300px"><img class="size-full wp-image-324" title="Fig. 1. An example of a skyline" src="http://diveintodata.files.wordpress.com/2009/09/skyline_intro.png?w=590" alt="Fig. 1. An example of a skyline"   /><p class="wp-caption-text">Fig. 1. An example of a skyline</p></div>
<p>In Fig. 1, let us look into a dominance relation. The point <em>a</em> dominates the points {<em>d,e</em>} since elements of the point <em>a</em> less than those of {<em>d,e</em>} in X and Y. The point <em>b</em> dominates only <em>e </em>since X values of {<em>b,e</em>} are same (i.e., X=8) but Y of <em>b</em> (i.e., 1) is less than that (i.e., 6) of <em>e</em>. The points {d,e} cannot belong to the skyline because they are dominated by other tuples. Consequently, the points <em>a,b</em>, and <em>c</em> belong to the skyline since they are not dominated by any other tuples.</p>
<p>Initially, the skyline problem was known as the <em><a href="http://portal.acm.org/citation.cfm?id=321910" target="_blank">maxima vector problem (H. T. Kung et. al 1975)</a></em> for traditional processing system. However, this problem was revisited by <a href="http://portal.acm.org/citation.cfm?id=656550&amp;dl=" target="_blank">the Skyline Operator (Stephan Börzsönyi et. al 2001)</a>. Since then, this problem has been intensively studied in database area.</p>
<p>Next time, I&#8217;ll describe several algorithms including above algorithms in detail.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/diveintodata.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/diveintodata.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/diveintodata.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/diveintodata.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/diveintodata.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/diveintodata.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/diveintodata.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/diveintodata.wordpress.com/78/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=78&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/09/06/a-brief-introduction-to-skyline-problem-pareto-optimal-tuples-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4213567e11cad51fc02bc2038e9ace27?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Hyunsik Choi</media:title>
		</media:content>

		<media:content url="http://farm1.static.flickr.com/226/469796567_311f4a3b79.jpg" medium="image">
			<media:title type="html">Singapore Skyline (#12)</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?Dd%20=%20tp_1,tp_2,...tp_n" medium="image">
			<media:title type="html">D^{d} = {tp_{1},tp_{2},...tp_{n}}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp_%7Bi%7D%20=%20%28v_%7B1%7D,v_%7B2%7D,...,v_%7Bd%7D%29" medium="image">
			<media:title type="html">tp_{i} = (v_{1},v_{2},...,v_{d})</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp_%7Bi%7D" medium="image">
			<media:title type="html">tp_{i}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?Dd" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?v_%7Bi%7D" medium="image">
			<media:title type="html">v_{i}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?u_%7Bi%7D" medium="image">
			<media:title type="html">u_{i}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?1%20%3C%20i%20%5Cleq%20d" medium="image">
			<media:title type="html">1 &#60; i leq d</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?foralli,%20v_i%20leq%20u_i%20land%20existsj,%20v_j%20%3C%20u_j" medium="image">
			<media:title type="html">forall{i}, v_{i} leq u_{i} land exists{j}, v_{j} &#60; u_{j}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp" medium="image">
			<media:title type="html">tp</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp%27" medium="image">
			<media:title type="html">tp&#039;</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp" medium="image">
			<media:title type="html">tp</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp%27" medium="image">
			<media:title type="html">tp&#039;</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp" medium="image">
			<media:title type="html">tp</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?tp%27" medium="image">
			<media:title type="html">tp&#039;</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?Dd" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://www.codecogs.com/eq.latex?D%5E%7Bd%7D" medium="image">
			<media:title type="html">D^{d}</media:title>
		</media:content>

		<media:content url="http://diveintodata.files.wordpress.com/2009/09/skyline_intro.png" medium="image">
			<media:title type="html">Fig. 1. An example of a skyline</media:title>
		</media:content>
	</item>
		<item>
		<title>Some Interesting Papers of ACM SIGMOD Conference 2009</title>
		<link>http://diveintodata.org/2009/08/08/some-interesting-papers-of-acm-sigmod-conference-2009/</link>
		<comments>http://diveintodata.org/2009/08/08/some-interesting-papers-of-acm-sigmod-conference-2009/#comments</comments>
		<pubDate>Sat, 08 Aug 2009 13:31:48 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[SIGMOD]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=87</guid>
		<description><![CDATA[ACM SIGMOD Conference 2009 was held in Providence, Rhode Island from June 29 through July 2. Then, the electronic proceedings are available online. Among many nice papers, I tried to choose some interesting papers as follows: MapReduce &#38; Hadoop “A Comparison of Approaches to Large Scale Data Analysis,” Andrew Pavlo, Samuel Madden, David DeWitt, Michael [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=87&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>ACM SIGMOD Conference 2009 was held in Providence, Rhode Island from June 29 through July 2. Then, the electronic proceedings are available online. Among many nice papers, I tried to choose some interesting papers as follows:</p>
<h4>MapReduce &amp; Hadoop</h4>
<ul>
<li>“A Comparison of Approaches to Large Scale Data Analysis,” Andrew Pavlo, Samuel Madden, David DeWitt, Michael Stonebraker, Alexander Rasin, Erik Paulson, Lakshmikant Shrinivas and Daniel Abadi.</li>
</ul>
<p><span style="color:#400080;"><strong>Some of the authors are members of vertica, a parallel database. Prof. Dwitt strongly attacked MapReduce (<em><a href="http://databasecolumn.vertica.com/2008/01/mapreduce-a-major-step-back.html" target="_blank">MapReduce: A major step backwards</a></em>, </strong></span><span style="color:#400080;"><strong><a href="http://databasecolumn.vertica.com/2008/01/mapreduce-continued.html" target="_blank">MapReduce II</a></strong></span><span style="color:#400080;"><strong>). So, I wonder how did they benchmark both architectures.</strong></span></p>
<h4>Skyline Queries</h4>
<ul>
<li>“Minimizing the Communication Cost for Continuous Skyline Maintenance,” Zhenjie Zhang, Reynold Cheng, Dimitris Papadias, Anthony K. H. Tung.</li>
<li>“Scalable Skyline Computation Using Object-based Space Partitioning,” ZHANG Shiming, Nikos Mamoulis, David Cheung.</li>
<li>“Kernel-Based Skyline Cardinality Estimation,” Zhenjie Zhang, Yin Yang, Ruichu Cai, Dimitris Papadias, Anthony and K. H. Tung.</li>
</ul>
<p><strong><span style="color:#400080;">Since I first met the skyline problem, I have been always interested in skyline queries. Considering multi-criteria, Skyline queries retrieve the best tuples among multi-dimensional objects.</span></strong></p>
<h4>Graph Query Processing</h4>
<ul>
<li>“3-HOP: A High-Compression Indexing Scheme for Reachability Query,” Ruoming Jin, Yang Xiang, Ning Ruan, and Dave Fuhry.</li>
</ul>
<p><span style="color:#400080;"><strong>Rechability query is to compute whether two given vertices are rechable, or not. Rechability query is one of the most fundamental operations in graph querying. it can be usually used in a primitive operation for complex graph queries.</strong></span></p>
<h4>RDF Query Processing</h4>
<ul>
<li>“Scalable Join Processing on Very Large RDF Graphs,” Thomas Neumann and Gerhard Weikum.</li>
</ul>
<p><strong><span style="color:#400080;">The issue with which I’m primarily concerned is RDF query processing. As linked data are gaining attention, this issue will be more dealt with in the database community.</span></strong></p>
<h4>Spatial Query Processing</h4>
<ul>
<li>“Quality and Efficiency in High Dimensional Nearest Neighbor Search,” Yufei Tao, Ke Yi, Cheng Sheng and Panos Kalnis.</li>
<li>“Continuous Obstructed Nearest Neighbor Queries in Spatial Databases,” Yunjun Gao and Baihua Zheng.</li>
<li>“A Revised R*-tree in Comparison with Related Index Structures,” Norbert Beckmann and Bernhard Seeger.</li>
</ul>
<p><strong><span style="color:#400080;">While I was taking M.S. program, I studied many spatial query processing issues. Hence, I try to keep in touch with recent spatial database issues.</span></strong></p>
<p>They are seem to be very interesting. Later, I will post paper reviews about above papers.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/diveintodata.wordpress.com/87/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/diveintodata.wordpress.com/87/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/diveintodata.wordpress.com/87/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/diveintodata.wordpress.com/87/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/diveintodata.wordpress.com/87/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/diveintodata.wordpress.com/87/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/diveintodata.wordpress.com/87/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/diveintodata.wordpress.com/87/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=87&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/08/08/some-interesting-papers-of-acm-sigmod-conference-2009/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4213567e11cad51fc02bc2038e9ace27?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Hyunsik Choi</media:title>
		</media:content>
	</item>
		<item>
		<title>HadoopDB: An Open Source Parallel Database for Analytical Workloads</title>
		<link>http://diveintodata.org/2009/07/31/hadoopdb-releases/</link>
		<comments>http://diveintodata.org/2009/07/31/hadoopdb-releases/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 15:01:15 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hadoopdb]]></category>
		<category><![CDATA[map-reduce]]></category>
		<category><![CDATA[vldb]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=155</guid>
		<description><![CDATA[With the increasingly growing volume of data, the techniques to manage big data are needed in many areas. Open source community and many companies have attempted developing solutions to deal with big data. Recently, Prof. Daniel Abadi, who is an Assistant Professor of Computer Science at Yale University, announced HadoopDB release and the paper published [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=155&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><span class="dropcaps">W</span>ith the increasingly growing volume of data, the techniques to manage big data are needed in many areas. Open source community and many companies have attempted developing solutions to deal with big data.</p>
<p>Recently, <a href="http://cs-www.cs.yale.edu/homes/dna/" target="_blank">Prof. Daniel Abadi</a>, who is an Assistant Professor of Computer Science at Yale University, announced <a href="http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html" target="_blank">HadoopDB release and the paper</a> published in <a href="http://vldb2009.org/" target="_blank">VLDB’09</a>. HadoopDB is an open source analytical database, being developed by him and his students. The paper states that HadoopDB is a hybrid of both MapReduce and parallel  database and it takes the best features from both.</p>
<p><img style="display:inline;margin-left:0;margin-right:0;" title="Hadoop Logo" src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="Hadoop Logo" width="198" height="47" align="right" />Actually, MapReduce has made controversial issues from a database point of view. Formerly, there was some debates. Representatively, <a href="http://pages.cs.wisc.edu/~dewitt/" target="_blank">Prof. David Dewitt</a>, who is well known as a great master of (parallel) database, critiqued that <a href="http://databasecolumn.vertica.com/2008/01/mapreduce-a-major-step-back.html" target="_blank">MapReduce is a major step backwards</a>. On the other hand, proponents of MapReduce argue that MapReduce outperforms parallel database in respect of scalability, fault tolerance, and flexibility to unstructured data.</p>
<p>This paper concludes that HadoopDB is close to the performance of parallel databases while it is similar score on fault tolerance and feasibility in heterogeneous systems as Hadoop.</p>
<p>In sum, HadoopDB is a hybrid system of MapReduce and parallel DBMS. It is quite interesting achievement. I respect their decision to release HadoopDB as open source because their achievement will more broadly contribute to Hadoop and data analytical database. Still, I do not read this paper completely, and sooner I will discuss HadoopDB in detail.</p>
<h3>Some interesting points:</h3>
<ul>
<li>They carried out experiments on a 100 node of amazon EC2 cluster.</li>
<li>They try to deal with semantic web data (i.e., RDF) by HadoopDB.</li>
<li>HadoopDB is a full open source project.</li>
<li>HadoopDB isn’t well suited for real-time data yet.</li>
<li>I can participate in his presentation at the session at VLDB.</li>
</ul>
<h3>See Also:</h3>
<ul>
<li><a href="http://news.idg.no/cw/art.cfm?id=9D2C109A-1A64-6A71-CE90BD44D98F12B1" target="_blank">Yale researchers create database-Hadoop hybrid</a>, Computer World</li>
<li><a href="http://radar.oreilly.com/2009/07/hadoopdb-an-open-source-parallel-database.html" target="_blank">HadoopDB: An Open Source Parallel Database</a>, <a href="http://radar.oreilly.com/" target="_blank">O’REILLY radar</a></li>
<li><a href="http://databasecolumn.vertica.com/2008/01/mapreduce-a-major-step-back.html" target="_blank">MapReduce: A major step backwards</a></li>
<li><a href="http://databasecolumn.vertica.com/2008/01/mapreduce-continued.html" target="_blank">MapReduce: A major step backwards (II)</a></li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/diveintodata.wordpress.com/155/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/diveintodata.wordpress.com/155/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/diveintodata.wordpress.com/155/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/diveintodata.wordpress.com/155/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/diveintodata.wordpress.com/155/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/diveintodata.wordpress.com/155/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/diveintodata.wordpress.com/155/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/diveintodata.wordpress.com/155/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=155&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/07/31/hadoopdb-releases/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4213567e11cad51fc02bc2038e9ace27?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Hyunsik Choi</media:title>
		</media:content>

		<media:content url="http://hadoop.apache.org/images/hadoop-logo.jpg" medium="image">
			<media:title type="html">Hadoop Logo</media:title>
		</media:content>
	</item>
		<item>
		<title>Computer Scientist들을 위한 추천 블로그 (1)</title>
		<link>http://diveintodata.org/2009/06/24/computer-scientist%eb%93%a4%ec%9d%84-%ec%9c%84%ed%95%9c-%ec%b6%94%ec%b2%9c-%eb%b8%94%eb%a1%9c%ea%b7%b8-1/</link>
		<comments>http://diveintodata.org/2009/06/24/computer-scientist%eb%93%a4%ec%9d%84-%ec%9c%84%ed%95%9c-%ec%b6%94%ec%b2%9c-%eb%b8%94%eb%a1%9c%ea%b7%b8-1/#comments</comments>
		<pubDate>Tue, 23 Jun 2009 15:11:35 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[computer science]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[P=NP]]></category>
		<category><![CDATA[scalable computing]]></category>

		<guid isPermaLink="false">http://diveintodata.org/2009/06/computer-scientist%eb%93%a4%ec%9d%84-%ec%9c%84%ed%95%9c-%ec%b6%94%ec%b2%9c-%eb%b8%94%eb%a1%9c%ea%b7%b8-1/</guid>
		<description><![CDATA[오늘은 Computer Science 분야의 문제들 및 최신 이슈들을 다루는 몇몇 유명 블로그들을 소개하려고 한다. 워낙 유명한 블로그들이라 이미 많은 분들이 아실꺼라 생각이 들지만 혹시 모르는 분들이 있을까 이렇게 소개해 본다. The Database Column &#8211; 말 그대로 데이터베이스 이슈들을 다룬다. 최근에는 클라우드 컴퓨팅에 대한 이슈도 언급된다. 이 블로그는 진짜 짱인게 Michael Stonebraker, Daniel Abadi, David DeWitt, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=59&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><span style="font-size:9pt;">오늘은 Computer Science 분야의 문제들 및 최신 이슈들을 다루는 몇몇 유명 블로그들을 소개하려고 한다. 워낙 유명한 블로그들이라 이미 많은 분들이 아실꺼라 생각이 들지만 혹시 모르는 분들이 있을까 이렇게 소개해 본다. </span><span style="font-weight:bold;"><br />
</span> <span style="font-size:10pt;font-weight:normal;"><span style="font-size:10pt;"><a title="[http://www.databasecolumn.com/]로 이동합니다." href="http://www.databasecolumn.com/" target="_blank"><span style="font-size:10pt;"></span></a></span></span></p>
<ul>
<li><span style="font-size:10pt;font-weight:normal;"><span style="font-size:10pt;"><a title="[http://www.databasecolumn.com/]로 이동합니다." href="http://www.databasecolumn.com/" target="_blank"><span style="font-size:10pt;"><span style="font-size:9pt;">The Database Column</span></span></a><span style="font-size:10pt;"><span style="font-size:9pt;"> &#8211; 말 그대로 데이터베이스 이슈들을 다룬다. 최근에는 클라우드 컴퓨팅에 대한 이슈도 언급된다. 이 블로그는 진짜 짱인게 </span></span></span></span><span class="entry-author-name" style="font-weight:normal;"><span style="font-size:9pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:9pt;">Michael Stonebraker, </span></span></span></span></span><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:9pt;">Daniel Abadi, </span></span></span></span></span><span style="font-size:9pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:9pt;">David DeWitt, Stan Zdonik, </span></span></span></span></span></span><span style="font-weight:normal;font-family:Helvetica;"><span style="font-size:9pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-family:Gulim;"><span style="font-size:9pt;">Samuel Madden</span></span></span></span></span></span></span><span class="entry-author-name" style="font-weight:normal;"><span style="font-size:9pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:9pt;"> 같은 대가들이 글을 쓴다. 최근 database 학계에서 어떤 주제에 관심을 가지고 있는지 알고 싶다면 제목만 훑어봐도 된다.</span></span></span></span></span></span></li>
<li><span class="entry-author-name" style="font-weight:normal;"><span style="font-size:9pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:10pt;"></span></span></span></span></span><a href="http://rjlipton.wordpress.com/"></a><span style="font-size:10pt;font-weight:normal;"><a title="[http://rjlipton.wordpress.com/]로 이동합니다." href="http://rjlipton.wordpress.com/" target="_blank"><span style="font-size:10pt;"><span style="font-size:9pt;">Gödel’s Lost Letter and P=NP</span></span></a><span style="font-size:10pt;"><span style="font-size:9pt;"> &#8211; </span></span></span><span style="font-size:9pt;"><span style="font-size:10pt;font-weight:normal;"><span style="font-size:10pt;"><span style="font-size:9pt;">제목만보면 NP문제를 주로 다루는 것 같지만 다양한 문제들과 알고리즘들을 다루고 있다(사실 오늘 발견함). 상당히 유익해 보이는 반면 어려워 보인다 (@_@)</span></span></span><span style="font-size:10pt;"><span style="font-size:10pt;font-weight:normal;"><span style="font-size:9pt;">. </span></span></span></span><a title="[http://www.allthingsdistributed.com/]로 이동합니다." href="http://www.allthingsdistributed.com/" target="_blank"><span style="font-size:10pt;"></span></a></li>
<li><a title="[http://www.allthingsdistributed.com/]로 이동합니다." href="http://www.allthingsdistributed.com/" target="_blank"><span style="font-size:10pt;"><span style="font-size:9pt;">All Things Distributed</span></span></a><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:9pt;"> &#8211; Amazon CTO인 </span></span></span><span style="font-size:10pt;"><span style="font-size:10pt;"><span style="font-size:9pt;">Werner Vogels의 블로그 이다. Scalable and distributed Computing에 대한 이슈를 다룬다.</span></span></span></li>
</ul>
<p><span style="font-size:9pt;"> 원래 계획은 5개씩 소개하여 2회에 총 10개 소개였는데 요즘 포스팅 거리도 없고 하니&#8230;&#8230; 나머지는 다음에 이어서 쓰겠다. </span></p>
<p><span style="font-size:9pt;"><br />
덧붙임. 저 블로그들에 읽고 싶은 글들은 많은데 업데이트되는 수가 장난이 아니라&#8230;따라가기 참 힘들구나 ~(~_~)~</span><br />
<strong></strong></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/diveintodata.wordpress.com/59/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/diveintodata.wordpress.com/59/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/diveintodata.wordpress.com/59/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/diveintodata.wordpress.com/59/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/diveintodata.wordpress.com/59/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/diveintodata.wordpress.com/59/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/diveintodata.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/diveintodata.wordpress.com/59/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=diveintodata.org&amp;blog=12237478&amp;post=59&amp;subd=diveintodata&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/06/24/computer-scientist%eb%93%a4%ec%9d%84-%ec%9c%84%ed%95%9c-%ec%b6%94%ec%b2%9c-%eb%b8%94%eb%a1%9c%ea%b7%b8-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/4213567e11cad51fc02bc2038e9ace27?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Hyunsik Choi</media:title>
		</media:content>
	</item>
	</channel>
</rss>
