Sunday, November 27, 2005

Cumulative Normal Distribution Function

The cumulative normal distribution function is actually an integral formula. There is no explicit closed form solution. A good approximation can be found at Milton Abramowiz and Irene A Stegun. Handbook of Mathematical Functions. National Bureau of Standards, 1964.

The following code is a java function for calculating the cumulative normal distribution.

public double CNDF(double z)
{
double mean = calculateMean();
double sd = calculateSD();
z = z - mean;
if (z> 6.0) return 1.0;
if (z<-6.0) return 0;
double b1 = 0.31938153;
double b2 = -0.356563782;
double b3 = 1.781477937;
double b4 = -1.821255978;
double b5 = 1.330274429;
double p = 0.2316419;
double c2 = 0.3989423;
double a = Math.abs(z);
double t = 1.0/(1.0 + a*p);
double b = c2 * Math.exp((-z)*(z/2.0));
double n = ((((b5 * t + b4)*t+b3)*t+b2)*t+b1)*t;
n = 1.0 - b*n;
if (z<0.0) n = 1.0 - n;
return n;
}

Normality Testing

There are many algorithms for testing the normality of a data set.
Some of the tests include:

1. Kolmogorov-Smirnov test
2. Lilliefors test
3. Shapiro-Wilks’ W test

Shapiro-Wilks test seems to the best because of its good power properties. But, Kolmogorov-Smirnov test is easier to implement.

Tuesday, November 22, 2005

NSF grant proposal: Organization

I found Expert Opinon's link useful and interesting.

Monday, November 21, 2005

CLASSPATH

I have worked only on C# for the past one year. Today, I got a chance to work on my favorite programming language, Java. I always forget to set the CLASSPATH after I install Java SDK. This solved a lot of issues.

Wednesday, November 16, 2005

Types of output

Binary classification problem - A learning problem with binary outputs
Multi-class classification problem - A learning problem with finite number of categories
Regression - A learning problem with real-valued outputs

Monday, November 14, 2005

Report

The report that I would recommend in a typical NN experiment can be divided into the following sections:

  1. Objective of the experiment
  2. Data description - Preprocessing of the data (if any)
  3. Architecture of the Neural Network used
  4. Pseudo code
  5. Results
  6. Discussion
  7. Conclusion
  8. Matlab code as an appendix

Sunday, November 13, 2005

Matlab script

I used this MATLAB script using Neural Network toolbox. The important features were selected based on a threshold.



clear all;
P = load('train1.txt');
T = load('trainout1.txt');
Ps = load('test1.txt');
Ts = load('testout1.txt');
Net = newff(minmax(P), [300, 200, 150, 1], {'logsig', 'logsig', 'logsig', 'logsig'}, 'trainscg' );
Net.trainParam.epochs = 1000;
Net.trainParam.goal = 0.0001;
[Net_t, TR] = train(Net,P,T);
y = sim(Net_t,Ps);
% figure
x=[1:37]; % to draw the x axis values
efficiency = 1 - (sum(abs(round(y)-Ts)))/37;
efficiency
% plot(x, y, 'bo--', x, Ts, 'r*-');
% legend('network','actual');

% Selection of Features
Threshold = 0.90;
for i=1:300
disp 'Current Iteration'
disp(i)
FP = P;
FPs = Ps;
FP(i,:) = [];
FPs(i,:) = [];
Net = newff(minmax(FP), [299, 200, 150, 1], {'logsig', 'logsig', 'logsig', 'logsig'}, 'trainscg' );
Net.trainParam.epochs = 1000;
Net.trainParam.goal = 0.0001;
[Net_t, TR] = train(Net,FP,T);
y = sim(Net_t,FPs);
efficiency = 1 - (sum(abs(round(y)-Ts)))/37;
disp(efficiency);
if (efficiency < Threshold)
disp(i)
disp(' is an Important Feature');
end
end