Earlier before going to work, I had planned that I should write something about the data description of the high-frequency data and also do some experiments with the neural network.
I had to find the daily average price for all the 1826 points. I had to do this to find if there is any correlation between everyday temperature and price. Also, I found out the first-order serial correlation coefficient for hourly prices. I used the "correl" function of excel to do this. More details to be added later.
During the winter break of 2005 (Jan), I created a RBFF network using MATLAB. I created a file named ANN.m. The data used during that time was a simple price and load data. Now, I have more dimensions in the high-frequency data set. So, I modified the old ANN.m file and created ANNNew.m file. The new data has 43824 (1826 x 24) points. The neural network gives an out of memory error when executed as a whole. So, I have to do redo by taking some part of the data set for testing and training.
Took the printout of the tutorial about codeDOM.
I almost have 5 years of data. I have to redo the tests like this: Train with the data in the first year and then test with the data of second year. Simply alternate the training and testing data.
Tuesday, July 26, 2005
Todays Job
Posted by gt at 10:50 PM 0 comments
Sunday, July 24, 2005
Genetic Programming
Currently I am reading the c# article on genetic programming. Also, I borrowed John Koza's Genetic Programming: On the Programming of Computers by Means of Natural Selection from the library. This is the same book that the article refers to.
Posted by gt at 9:45 AM 0 comments
Thursday, July 21, 2005
Genetic Algorithms
I was reading the book "An Introduction to Genetic Algorithms by Melanie Mitchell " from the online bookstore on the IEEE Computer Society website and also reading the user guide of the GA toolbox to start implementing atleast a simple Genetic Algorithm.
Posted by gt at 10:52 AM 0 comments
Tuesday, July 19, 2005
Open Source Software in C# and Library
http://csharp-source.net/
I went to library to pick up the ILL book Tomorrow's professor : preparing for academic careers in science and engineering . Later, I went to the journals section and was browsing at some of the journals. I found the following:
- The journal Applied Stochastic Models in Business & Industry has a special issue on Statistical Learning in March-April 2005 (Vol 21, No. 2) It is available electronically also.
- www.kansascityfed.org has some articles in their economic review section which are interesting to read. I skimmed through their article How long is a long term investment?.
- MSDN Magazines came up with really nice and interesting articles. Check out http://msdn.microsoft.com/msdnmag
- Couple of articles which I thought might interest me in my project:
Concurrency What Every Dev Must Know About Multithreaded Apps and Winsock Get Closer to the Wire with High-Performance Sockets in .NET. I have printed these articles.
Posted by gt at 12:46 AM 0 comments
Friday, July 15, 2005
Pattern Matching
I finally figured the algorithm for Pattern Matching of time-series data. It is almost similar to the one by the following paper:
F.-L. Chung, T.-C. Fu, V. Ng, R. W. P. Luk, An evolutionary approach toI have written the pseudocode on paper and I am trying to implement in c#. The paper does not discuss many implementation issues. But, I agree with the methodology. It is sound and convincing.
pattern-based time series segmentation, IEEE Transactions on Evolutionary
Computation, 8 (5) (2004) 471-489.
Posted by gt at 9:01 PM 0 comments
Thursday, July 14, 2005
Oscar Wilde, Genetic Algorithms for Matlab
I like this quote by Oscar Wilde:
The true mystery of the world is the visible, not the invisible.
Regarding Genetic algorithms, read this first; then read this full length tutorial. Request a copy of the Genetic Algorithm Toolbox for Matlab from here. I got an email from them with zipped attachment. Decompressed the files into a folder "genetic" inside the "toolbox" folder of MATLAB. Added the path in the matlab path (MATLAB->FILE->SETPATH)
Posted by gt at 7:39 AM 0 comments
Labels: Software
Wednesday, July 13, 2005
Data Processing
Finally, I am done with the processing of all raw data. I have managed to create a single file with the following fields: Date, Day of the Week, Temperature, Hour, Day-ahead Price, Real-time Price, and Load. At this point in time, since the daily temperature of Philadelphia was not available till 30 June 2005, I had to change the range of dates as 1 June 2000 to 31 May 2005.
I used DayofWeek property of the DateTime structure to extract the correct day of week for a given date. Depending on the results, I am thinking of changing the Day of the Week to a binary datatype to hold if it is a weekday (Mon-Fri) or weekend(Sat, Sun).
The data is organized in my desktop as: Raw, Pre-processed and processed data.
Posted by gt at 2:19 PM 0 comments
Labels: Data
Tuesday, July 12, 2005
Data Analysis
Finally, using the c# program, I extracted the required data from the colossal data files present in the pjm website. 2009 data points were chosen from 1 June 2000 to 30 June 2005. The summary of the data files is as below:
The numbers in each cell represent the column numbers in the csv file. The same format was followed for both RT price and DA price. The query string was changed from 'PJM' to 'PJM-RTO' after 1 May 2004.
Posted by gt at 10:54 AM 0 comments
Labels: Data
Friday, July 08, 2005
To Do List
Data pre-processing is an important job.
- Download data from the website (www.pjm.com) into a folder on desktop.
- Using FileInfo and Directory classes in C#, process all the csv filenames in the folder.
- Using the filenames in the above step, parse each file and extract a record satisfying the query (zone). A single record consists of the date and hourly prices for that day.
- Get the temperature archive of a chosen city from University of Dayton website.
- Categorize the dates into weekend, weekday, and national holiday as much as possible.
Posted by gt at 1:38 PM 0 comments
Labels: Procedure
Thursday, July 07, 2005
Pre-Processing of Data and Coding
The following code was used to get the filenames of all the files from a folder:
using System.IO;
DirectoryInfo directoryInfo = new DirectoryInfo(directory);
FileInfo[] fileInfo = directoryInfo.GetFiles("*.*");
foreach(FileInfo fi in fileInfo)
{
Console.Write(fi.Name+", ");
}
Posted by gt at 10:19 PM 0 comments
Labels: Programming
JHLib
JHLib - Jouni Heikniemi's .NET tool library
I used Jhlib library for parsing csv files. Very useful tool.
Posted by gt at 12:31 PM 0 comments
Labels: Software
PJM data
I want to use hourly data. So, I need load and price for DA and RT markets. After having a preliminary look at the data in the website (www.pjm.com), I have decided to consider only looking for PJM zone. At this time it appears that PJM zone has been changed to PJM-EAST zone.
DA Hourly load
DA Hourly market price
Available at ftp://www.pjm.com/pub/account/lmpda/index.html
From June, 2000 to present
RT Hourly load
Load data is available from 1998-2005
RT hourly market price
Avaialbe at ftp://www.pjm.com/pub/account/lmp/index.html
From Jan, 2000 to present
Posted by gt at 10:53 AM 0 comments
Labels: Data
Wednesday, July 06, 2005
Spatial Data Analysis
After reading the first few pages of "Spatial Data Analysis," I realized that this is not what I was looking for.
According to the author, Spatial means each item of data has a geographical reference so we know where each case occurs on a map.
Posted by gt at 1:25 PM 0 comments
Labels: Definition