Friday, August 28, 2009

Select k records randomly from n records in Java

Assume a situation wherein you have an array of records, let them be anything varying from marks (integers), names (string), student information (user defined class object), or ANYTHING ARBITRARY which derives itself from Java's object. Suppose, you have such n records, each with an identifier say, 1 to n, such that you have then stored in array of any form in Java.

Now, suppose you want to chose k random records from this array, this is how you proceed.

Given n, generate a random permutation of {1, 2, ... n} as {1', 2', ... , n'} and pick first k numbers from the permuted array.
For example, say you want to pick 3 numbers randomly from 8, s.t. n = 8, k = 3.
1. {1, ... 8} ===> {2,3,1,7,6,4,8,5} ===> {2,3,1}
2. {1, ... 8} ===> {8,4,3,1,6,5,7,2} ===> {8,4,3}
Thus, once you have k such indexes, you just need to probe your original array with these indexes and get corresponding k random records.

The Java code snippet is as shown:

The output as expected is :
randomly selecting 3 out of them
1 a
7 g
5 e
randomly selecting 5 out of them
6 f
5 e
7 g
8 h
2 b

I needed this for:
I had to partition dataset containing some @1,000 records, for cross-validation. It is a method of training a classifier when you have small number of training records. In this, you will first decide how many folds of dataset you want to make, say 5-fold, then 1,000 records will be partitioned in the ration 4 : 1 such that every time 4/5 th of the data records will be used for training and remaining 1/5 th for testing (validating in this case). Thus, I needed to randomly pick 4/5 * 1000 of my records every time, for which I wrote above piece of code.

Note 1: Not listing the code, but adding image so that, whoever wants to try it, will actually type in the things, and thus they will never forget the trick! :)
Note 2: Reference - Weka software

No comments: