# Zipf Distribution Random Generator in Java

When I carry out some experiments, I usually make synthetic data sets generated by  some probability distributions.  Especially, Zipf distribution is frequently used for a synthetic data set. Zipf distribution is  one of the discrete power law probability distributions. You can get detail information from Zipf’s law in Wikipedia. Anyway, I attached my own java class for zip distribution. Below graphs are generated by my own java code and the gnuplot.

```
import java.util.Random;

public class ZipfGenerator {
private Random rnd = new Random(System.currentTimeMillis());
private int size;
private double skew;
private double bottom = 0;

public ZipfGenerator(int size, double skew) {
this.size = size;
this.skew = skew;

for(int i=1;i < size; i++) {
this.bottom += (1/Math.pow(i, this.skew));
}
}

// the next() method returns an random rank id.
// The frequency of returned rank ids are follows Zipf distribution.
public int next() {
int rank;
double friquency = 0;
double dice;

rank = rnd.nextInt(size);
friquency = (1.0d / Math.pow(rank, this.skew)) / this.bottom;
dice = rnd.nextDouble();

while(!(dice &lt; friquency)) {
rank = rnd.nextInt(size);
friquency = (1.0d / Math.pow(rank, this.skew)) / this.bottom;
dice = rnd.nextDouble();
}

return rank;
}

// This method returns a probability that the given rank occurs.
public double getProbability(int rank) {
return (1.0d / Math.pow(rank, this.skew)) / this.bottom;
}

public static void main(String[] args) {
if(args.length != 2) {
System.out.println("usage: ./zipf size skew");
System.exit(-1);
}

ZipfGenerator zipf = new ZipfGenerator(Integer.valueOf(args[0]),
Double.valueOf(args[1]));
for(int i=1;i <= 100; i++)
System.out.println(i+" "+zipf.getProbability(i));
}
}

37.584589
127.026548

```

### 10 Comments on “Zipf Distribution Random Generator in Java”

1. bart says:

skewed data distribution으로 종종 인용되었던 듯 싶던데…
좋은 글 잘 보고 갑니다.

3. Hyunsik Choi says:

안녕하세요? 저도 bart님 블로그에 자주 방문합니다.
이렇게 뵙게 되어 반갑습니다.

5. Bart says:

그러셨군요. 반갑습니다.
MPP와 GraphDB 관련한 연구들을 하시나 봅니다.
저도 관련해서 관심있게 들여다 보고는 있는데, 아직은 해볼 여력이 안되는듯 싶더군요.
저는 XML쪽 하다가 요새는 Multi-core쪽 을 해볼까 하고 있습니다.
요새는 reminiscence of parallelism(inter-node parallelism과 intra-node parallelism )
이 큰 줄기인 듯 싶습니다.

7. Hyunsik Choi says:

네 현재 대용량 그래프데이터 처리 쪽으로 연구 하고 있습니다 :)
사실은 저도 공간 데이터베이스 쪽으로 공부하다가 박사과정 올라오면서 분야를 바꿨답니다.
시작한지 얼마 안되서 아직 많이 부족합니다.

아무튼 앞으로 종종 뵙겠습니다 :)

9. Reitffunk says:

Hi!

Thanks for this good solution for zip. Its working like a charme, but it is a little bit slow, because of the while loop in the next()-Function.

Do you have a solution to make it faster?

10. Reitffunk says:

Just for your information, with this code you can fasten the zipf:

http://stackoverflow.com/questions/27105677/zipfs-law-in-java-for-text-generation-too-slow