Tuesday, July 26, 2005

Todays Job

Earlier before going to work, I had planned that I should write something about the data description of the high-frequency data and also do some experiments with the neural network.

I had to find the daily average price for all the 1826 points. I had to do this to find if there is any correlation between everyday temperature and price. Also, I found out the first-order serial correlation coefficient for hourly prices. I used the "correl" function of excel to do this. More details to be added later.

During the winter break of 2005 (Jan), I created a RBFF network using MATLAB. I created a file named ANN.m. The data used during that time was a simple price and load data. Now, I have more dimensions in the high-frequency data set. So, I modified the old ANN.m file and created ANNNew.m file. The new data has 43824 (1826 x 24) points. The neural network gives an out of memory error when executed as a whole. So, I have to do redo by taking some part of the data set for testing and training.

Took the printout of the tutorial about codeDOM.
I almost have 5 years of data. I have to redo the tests like this: Train with the data in the first year and then test with the data of second year. Simply alternate the training and testing data.

Sunday, July 24, 2005

Genetic Programming

Currently I am reading the c# article on genetic programming. Also, I borrowed John Koza's Genetic Programming: On the Programming of Computers by Means of Natural Selection from the library. This is the same book that the article refers to.

Thursday, July 21, 2005

Genetic Algorithms

I was reading the book "An Introduction to Genetic Algorithms by Melanie Mitchell " from the online bookstore on the IEEE Computer Society website and also reading the user guide of the GA toolbox to start implementing atleast a simple Genetic Algorithm.

Tuesday, July 19, 2005

Open Source Software in C# and Library

http://csharp-source.net/

I went to library to pick up the ILL book Tomorrow's professor : preparing for academic careers in science and engineering . Later, I went to the journals section and was browsing at some of the journals. I found the following:


  1. The journal Applied Stochastic Models in Business & Industry has a special issue on Statistical Learning in March-April 2005 (Vol 21, No. 2) It is available electronically also.
  2. www.kansascityfed.org has some articles in their economic review section which are interesting to read. I skimmed through their article How long is a long term investment?.
  3. MSDN Magazines came up with really nice and interesting articles. Check out http://msdn.microsoft.com/msdnmag
  4. Couple of articles which I thought might interest me in my project:
    Concurrency What Every Dev Must Know About Multithreaded Apps and Winsock Get Closer to the Wire with High-Performance Sockets in .NET. I have printed these articles.

Friday, July 15, 2005

Pattern Matching

I finally figured the algorithm for Pattern Matching of time-series data. It is almost similar to the one by the following paper:

F.-L. Chung, T.-C. Fu, V. Ng, R. W. P. Luk, An evolutionary approach to
pattern-based time series segmentation, IEEE Transactions on Evolutionary
Computation, 8 (5) (2004) 471-489.
I have written the pseudocode on paper and I am trying to implement in c#. The paper does not discuss many implementation issues. But, I agree with the methodology. It is sound and convincing.

Thursday, July 14, 2005

Oscar Wilde, Genetic Algorithms for Matlab

I like this quote by Oscar Wilde:

The true mystery of the world is the visible, not the invisible.

Regarding Genetic algorithms, read this first; then read this full length tutorial. Request a copy of the Genetic Algorithm Toolbox for Matlab from here. I got an email from them with zipped attachment. Decompressed the files into a folder "genetic" inside the "toolbox" folder of MATLAB. Added the path in the matlab path (MATLAB->FILE->SETPATH)

Wednesday, July 13, 2005

Data Processing

Finally, I am done with the processing of all raw data. I have managed to create a single file with the following fields: Date, Day of the Week, Temperature, Hour, Day-ahead Price, Real-time Price, and Load. At this point in time, since the daily temperature of Philadelphia was not available till 30 June 2005, I had to change the range of dates as 1 June 2000 to 31 May 2005.

I used DayofWeek property of the DateTime structure to extract the correct day of week for a given date. Depending on the results, I am thinking of changing the Day of the Week to a binary datatype to hold if it is a weekday (Mon-Fri) or weekend(Sat, Sun).

The data is organized in my desktop as: Raw, Pre-processed and processed data.

Tuesday, July 12, 2005

Data Analysis


Finally, using the c# program, I extracted the required data from the colossal data files present in the pjm website. 2009 data points were chosen from 1 June 2000 to 30 June 2005. The summary of the data files is as below:
The numbers in each cell represent the column numbers in the csv file. The same format was followed for both RT price and DA price. The query string was changed from 'PJM' to 'PJM-RTO' after 1 May 2004.

Friday, July 08, 2005

To Do List

Data pre-processing is an important job.

  1. Download data from the website (www.pjm.com) into a folder on desktop.
  2. Using FileInfo and Directory classes in C#, process all the csv filenames in the folder.
  3. Using the filenames in the above step, parse each file and extract a record satisfying the query (zone). A single record consists of the date and hourly prices for that day.
  4. Get the temperature archive of a chosen city from University of Dayton website.
  5. Categorize the dates into weekend, weekday, and national holiday as much as possible.

Thursday, July 07, 2005

Pre-Processing of Data and Coding

Collected the data. Used a ftp client software to download data from pjm. All data was in the form of zip files. So, I used a batch zip extraction tool (Zipghost). Then, I downloaded JhLib library and used it for parsing csv files.

The following code was used to get the filenames of all the files from a folder:

using System.IO;
string directory = @"D:\Data\Real Time Hourly Market Price";
DirectoryInfo directoryInfo = new DirectoryInfo(directory);
FileInfo[] fileInfo = directoryInfo.GetFiles("*.*");
foreach(FileInfo fi in fileInfo)
{
Console.Write(fi.Name+", ");
}

JHLib

JHLib - Jouni Heikniemi's .NET tool library

I used Jhlib library for parsing csv files. Very useful tool.

PJM data

I want to use hourly data. So, I need load and price for DA and RT markets. After having a preliminary look at the data in the website (www.pjm.com), I have decided to consider only looking for PJM zone. At this time it appears that PJM zone has been changed to PJM-EAST zone.

DA Hourly load

DA Hourly market price
Available at ftp://www.pjm.com/pub/account/lmpda/index.html
From June, 2000 to present
RT Hourly load
Load data is available from 1998-2005
RT hourly market price
Avaialbe at ftp://www.pjm.com/pub/account/lmp/index.html
From Jan, 2000 to present

Wednesday, July 06, 2005

Spatial Data Analysis

After reading the first few pages of "Spatial Data Analysis," I realized that this is not what I was looking for.
According to the author, Spatial means each item of data has a geographical reference so we know where each case occurs on a map.