Java Universal Network/Graph Framework

Recently, I’m primarily concerned with large-scale graph data processing. Occasionally, the visualization of graph can be a good way for us to observe some properties from graph data sets. Today, I’m going to introduce a graph framework, called Java Universal Network/Graph Framework (Jung). Jung provides data structures for graph, a programming interface familiar with graph features, some fundamental graph algorithms (e.g., minimum spanning tree, depth-first search, breath-first search, and dijkstra algorithm), and even visualization methods. Especially, I’m interested in its visualization methods.

The following java source shows the programming interface of Jung. In more detail, this program make a graph, add three vertices to the graph, and connect vertices. This source code is brought from Jung tutorial. As you can see, Jung’s APIs are very easy.

  // Make a graph by a SparseMultigraph instance.
  Graph<Integer, String> g = new SparseMultigraph<Integer, String>();
  g.addVertex((Integer)1); // Add a vertex with an integer 1
  g.addVertex((Integer)2);
  g.addVertex((Integer)3);
  g.addEdge("Edge-A", 1,3); // Added an edge to connect between 1 and 3 vertices.
  g.addEdge("Edge-B", 2,3, EdgeType.DIRECTED);
  g.addEdge("Edge-C", 3, 2, EdgeType.DIRECTED);
  g.addEdge("Edge-P", 2,3); // A parallel edge

  // Make some objects for graph layout and visualization.
  Layout<Integer, String> layout = new KKLayout<Integer, String>(g);
  BasicVisualizationServer<Integer, String> vv =
  new BasicVisualizationServer<Integer, String>(layout);
  vv.setPreferredSize(new Dimension(800,800));

  // It determine how each vertex with its value is represented in a diagram.
  ToStringLabeller<Integer> vertexPaint = new ToStringLabeller<Integer>() {
    public String transform(Integer i) {
    return ""+i;
   }
  };

  vv.getRenderContext().setVertexLabelTransformer(vertexPaint);

  JFrame frame = new JFrame("Simple Graph View");
  frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
  frame.getContentPane().add(vv);
  frame.pack();
  frame.setVisible(true);

Some APIs of the Jung are based on generic programming, so you can use easily vertices or edges to contains user-defined data. If you want more detail information, visit http://jung.sourceforge.net.

The above source code shows the following diagram.
Jung example


Zipf Distribution Random Generator in Java

When I carry out some experiments, I usually make synthetic data sets generated by  some probability distributions.  Especially, Zipf distribution is frequently used for a synthetic data set. Zipf distribution is  one of the discrete power law probability distributions. You can get detail information from Zipf’s law in Wikipedia. Anyway, I attached my own java class for zip distribution. Below graphs are generated by my own java code and the gnuplot.

Zipf Distribution (s=1)Zipf Distribution with log scale (s=1)
import java.util.Random;

public class ZipfGenerator {
 private Random rnd = new Random(System.currentTimeMillis());
 private int size;
 private double skew;
 private double bottom = 0;

 public ZipfGenerator(int size, double skew) {
  this.size = size;
  this.skew = skew;

  for(int i=1;i < size; i++) {
  this.bottom += (1/Math.pow(i, this.skew));
  }
 }

 // the next() method returns an random rank id.
 // The frequency of returned rank ids are follows Zipf distribution.
 public int next() {
   int rank;
   double friquency = 0;
   double dice;

   rank = rnd.nextInt(size);
   friquency = (1.0d / Math.pow(rank, this.skew)) / this.bottom;
   dice = rnd.nextDouble();

   while(!(dice &lt; friquency)) {
     rank = rnd.nextInt(size);
     friquency = (1.0d / Math.pow(rank, this.skew)) / this.bottom;
     dice = rnd.nextDouble();
   }

   return rank;
 }

 // This method returns a probability that the given rank occurs.
 public double getProbability(int rank) {
   return (1.0d / Math.pow(rank, this.skew)) / this.bottom;
 }

 public static void main(String[] args) {
   if(args.length != 2) {
     System.out.println("usage: ./zipf size skew");
     System.exit(-1);
   }

   ZipfGenerator zipf = new ZipfGenerator(Integer.valueOf(args[0]),
   Double.valueOf(args[1]));
   for(int i=1;i <= 100; i++)
     System.out.println(i+" "+zipf.getProbability(i));
 }
}

One-column abstract in two-column layouts in articles on Latex

If you want one-column abstract in two-column layouts in articles on Latex, just add abstract package and follow below source code. Someone who uses ubuntu linux can install the abstract package from ‘texlive-latex-extra’ package via synaptic. Others can install from http://www.tex.ac.uk/tex-archive/macros/latex/contrib/abstract/

usepackage{abstract}

twocolumn[
  maketitle
  begin{onecolabstract}
    Here in which one-column abstract resides
  end{onecolabstract}
]

You should move ‘maketitle’ within ‘twocolumn’ like above code and remove ‘abstract’.

You can find further information about the abstract package from http://www.tex.ac.uk/tex-archive/macros/latex/contrib/abstract/abstract.pdf.


A Brief Introduction to Skyline Problem (Pareto-optimal Tuples) (1)

The skyline problem is to compute the best tuples from a set of ordered d-tuples. The name is originated from what the solution represented on 2d plane resembles the scene that urban buildings comprise. Skyline is one of the recommendation queries, and it is considering multi criteria. It is very interesting problem as well as very useful query. This problem has been being intensively studied for recent years. Today, I’m going to present the problem definition of skyline. Next time, I’ll describe several algorithms for the skyline problem.

Singapore Skyline (#12) First of all, let us know the input data. The input data D^{d} of skyline is a set of n ordered d-tuples, each of which consists of ordered d scalar values. They are shown in below formulas:

D^{d} = {tp_{1},tp_{2},...tp_{n}}

tp_{i} = (v_{1},v_{2},...,v_{d})

tp_{i} denotes a d-tuple. And, we need to understand the definition of the dominance relation. In addition, because the skyline problem is to find the better tuples, we need an assumption about ‘better’. In most literature, it is assumed that the less value is better, so we follow this assumption.

Definition 1 (Dominance). Let tp and tp’ be tuples in D^{d} where v_{i} is an element of tp and u_{i} is an element of tp’ for 1 < i leq d. Then, tp dominates tp’ if and only if  forall{i}, v_{i} leq u_{i} land exists{j}, v_{j} < u_{j}.

In other words, it is said that one tuple tp dominates another tuple tp' if tp is not worse (not greater) than tp' in all dimensions and tp is better (less) than tp' in at least one dimension.

Definition 2 (Skyline) Given a data set D^{d}, a skyline contains tuples that is not dominated any other tuples in D^{d}.

As I described above definition, a skyline is a set of tuples and the tuples are not dominated by any other tuples in D^{d}. In literature, a d-dimensional data set and above two definitions are usually represented for comprehensive description to d-points on d-axies.

Without loss of generality, we assume that D^{d} is a 2d data set (i.e., d=2). A data set is given as follows:

  • a = (3,2)
  • b = (8,1)
  • c = (1,10)
  • d = (4,3)
  • e = (8,6)

Each element of a tuple in D^{d} can be represented to one axis. In other words, the first element and the second element of tuples are represented to X and Y axies respectively. Then, tuples of above list are represented to 2d points as shown in Fig. 1.

Fig. 1. An example of a skyline

Fig. 1. An example of a skyline

In Fig. 1, let us look into a dominance relation. The point a dominates the points {d,e} since elements of the point a less than those of {d,e} in X and Y. The point b dominates only e since X values of {b,e} are same (i.e., X=8) but Y of b (i.e., 1) is less than that (i.e., 6) of e. The points {d,e} cannot belong to the skyline because they are dominated by other tuples. Consequently, the points a,b, and c belong to the skyline since they are not dominated by any other tuples.

Initially, the skyline problem was known as the maxima vector problem (H. T. Kung et. al 1975) for traditional processing system. However, this problem was revisited by the Skyline Operator (Stephan Börzsönyi et. al 2001). Since then, this problem has been intensively studied in database area.

Next time, I’ll describe several algorithms including above algorithms in detail.


Follow

Get every new post delivered to your Inbox.

Join 440 other followers