CALIFORNIA STATE UNIVERSITY, NORTHRIDGE. DMForex: A Data Mining Application to Predict Currency Exchange Rates and Trends. in Computer Science

DMForex: Currency Exchange Rate Prediction

Document information

Author

Jorge Orrantia

instructor/editor Gloria Melara, Ph. D.
School

California State University, Northridge

Major Computer Science
Document type Graduate Project
Place Northridge
Language English
Format | PDF
Size 1.21 MB

Summary

I.DMForex A WPF Application for Foreign Exchange Rate and Trend Prediction

DMForex is a composite WPF application built using the PRISM 4 framework and the Model-View-ViewModel (MVVM) pattern. It leverages Microsoft SQL Server 2008 Analysis Services for data mining and time series analysis to predict currency exchange rates and trends. The application uses historical daily price data for major currency pairs (e.g., EUR/USD, USD/JPY, USD/GBP) covering 12 years (from 01/01/1999 to 12/31/2011), obtained from FXCM. Technical indicators, such as the Simple Moving Average (SMA) and Relative Strength Index (RSI), calculated using the TA-lib library, enhance the prediction accuracy. Predictions are made using Microsoft’s Decision Trees Algorithm and a time series algorithm for rate prediction.

1. Application Overview and Purpose

DMForex is introduced as a composite WPF application designed for foreign exchange (FX) rate and trend prediction. Its core functionality relies on data mining techniques and algorithms. The application utilizes the PRISM 4 framework and adheres to the Model-View-ViewModel (MVVM) design pattern, separating the presentation layer from business logic. A key aspect is the use of SQL Server 2008 with Analysis Services to handle the forecasting of exchange rates and trends, utilizing daily historical price data for major world currency pairs across a twelve-year period. This extensive dataset allows for robust analysis and prediction modeling. The application's primary goal is to accurately predict future values and trends in the FX market, providing valuable insights for market participants. The design incorporates user interaction to modify datasets, adding technical indicators like Simple Moving Average (SMA) and Relative Strength Index (RSI) via the TA-lib library, improving the prediction model's precision. Microsoft's Decision Trees Algorithm plays a crucial role in trend prediction, while a time-series algorithm is employed for rate forecasting. The ability to accurately predict time series is emphasized, particularly within the highly liquid and dynamic FX market.

2. Data Sources and Processing

The application's data foundation is historical daily price data for various currency pairs, spanning twelve years. This data, obtained from FXCM, a forex trading company offering both demo and real accounts, is crucial for the application's predictive capabilities. The use of daily historical price data, saved in comma-separated value (CSV) format, is specifically noted. The data undergoes preprocessing to ensure consistency, including adjustments to headers and date formatting for seamless integration into the data mining process within the application. The specific data points included in each record are noted as date, open, high, low, and close price. The selection of FXCM as the data provider is highlighted, suggesting a consideration of the reliability and suitability of the dataset for the chosen algorithms. Data is processed to match required formats which allows for ease of use within the application and efficient data mining processes. The availability of this type of data on the internet is acknowledged, emphasizing its accessibility for researchers and traders interested in developing similar applications.

3. Technological Choices and Implementation

The project's technological choices significantly impact its functionality and performance. The decision to use Microsoft's Visual Studio 2010 Ultimate edition as the Integrated Development Environment (IDE) highlights the software development environment's role. The selection of WPF for the user interface is significant, as this framework is known for its rich graphical capabilities. The application's architectural design utilizes a layered approach, dividing the application into functional layers. The utilization of a layered architecture emphasizes the application's modularity and maintainability. The consistent implementation of the Model-View-ViewModel (MVVM) design pattern is also noted, separating the concerns of data handling, user interaction and presentation. The use of the PRISM 4 framework guides the application's overall structure. The choice of Microsoft SQL Server 2008 Enterprise Edition with Analysis Services is key to the data mining capabilities, providing access to algorithms like time series and classification trees. The application also incorporates the Fluent Ribbon Control Suite to provide a user-friendly interface. A custom chart control was developed to visualize the data. The TA-lib library is integrated for calculating technical indicators such as SMA and RSI. The use of ADOMD.Net and DMX commands for querying the SQL Server Analysis Services database is also detailed. The choice of Amazon's Elastic Cloud Computing (EC2) to host the source code emphasizes scalability and accessibility.

II.Methodology Data Mining and Prediction Algorithms

The core of DMForex lies in its data mining capabilities. It employs time series analysis algorithms within SQL Server Analysis Services to forecast future exchange rates. For trend prediction, a classification trees algorithm is used after incorporating SMA and RSI indicators to transform the original time series data. The user can define trend parameters (days and pips). The application uses ADOMD.Net and DMX commands to interact with the Analysis Services database. Future improvements could include incorporating Support Vector Machines (SVM) and Artificial Neural Networks (ANN).

1. Rate Prediction Methodology

DMForex employs a time series algorithm for predicting future currency exchange rates. The algorithm uses the closing price of a currency pair as input to forecast the next closing value. This approach is a core component of the application's predictive capabilities, focusing on the direct numerical prediction of future exchange rate values. The algorithm's implementation utilizes data mining techniques and algorithms available within Microsoft SQL Server 2008 Analysis Services. A module dedicated to generating and testing past predictions was developed to assess the model's accuracy and reliability. The results of these backtests are displayed in a grid showing predicted versus actual values, along with the difference and percentage difference for each prediction, allowing for a comprehensive evaluation of the model's performance. The algorithm's reliance on historical data highlights the importance of data quality and quantity in achieving accurate predictions. This approach focuses on predicting the precise numerical value of the next closing price, providing a quantitative forecast for trading decisions.

2. Trend Prediction Methodology

The prediction of currency exchange rate trends utilizes a different approach, employing a classification trees algorithm. To enhance the accuracy of trend prediction, the original dataset is modified by the user to incorporate additional attributes calculated from popular technical indicators. Specifically, Simple Moving Average (SMA) and Relative Strength Index (RSI) columns are added using the TA-lib library. The user defines the trend criteria in terms of days and pips which significantly influences the algorithm's output. This user-defined input parameter allows for customization and flexibility in evaluating trends which makes it adaptable to various trading strategies and risk profiles. A 'trend' column is calculated based on the price movements within the user-defined time frame, categorized into four possible outcomes: up, down, both, or no change which makes the prediction adaptable and more interpretable. The input to the algorithm is now a modified dataset containing the technical indicators and the calculated trend. The output classifies the next trend based on these parameters. The choice of a classification tree algorithm provides an interpretable model for analyzing and understanding the predicted trend. This methodology transforms the problem from a numerical prediction to a classification task.

3. Data Mining Implementation and Tools

The data mining module is implemented using Microsoft SQL Server 2008 Analysis Services, leveraging its built-in algorithms. This selection provides access to readily available data mining algorithms, including those used for time series and classification tree analyses. ADOMD.Net and Data Mining Extensions (DMX) commands are utilized to efficiently query and retrieve predictions from the Analysis Services database. This technology facilitates the efficient management and processing of large datasets for creating robust prediction models. The use of SQL Server Analysis Services provides a robust and scalable solution for handling the data mining process. The choice to use algorithms readily available in the Analysis Server ensures that the methodology is both accessible and readily reproducible, allowing for future expansions and updates. The algorithms' utilization highlights the efficiency and effectiveness of established data mining tools for time-series forecasting. CSV files are used to load the data into the system, making the data import process straightforward.

III.Application Development and Architecture

The application utilizes a layered architecture, with WPF for the user interface and C# as the programming language. The MVVM pattern separates presentation and business logic. A shell structure allows for modularity. The Fluent Ribbon Control Suite provides an Office-style interface. A custom chart control visualizes historical data and predictions. The application's lifecycle is managed using Team Foundation Server (TFS), hosted on Amazon EC2.

1. Architectural Design and Framework

DMForex employs a layered architecture, a design choice that separates the application into distinct layers, each handling specific concerns. This approach enhances modularity and maintainability. The application's user interface (UI) is built using Windows Presentation Foundation (WPF), a framework known for its rich graphical capabilities and extensibility. The Model-View-ViewModel (MVVM) design pattern is implemented to decouple the UI from business logic, improving code organization and testability. The PRISM 4 framework is specifically utilized to guide the application's design and structure, leveraging its features for creating maintainable and scalable applications. These architectural choices are crucial for ensuring the application's overall structure and functionality, emphasizing the importance of maintainability and scalability from the design stage.

2. User Interface and Visualization

The user interface incorporates the Fluent Ribbon Control Suite, creating an Office-style experience for the user which improves user experience and familiarity. This familiar interface facilitates interaction with the application's features and functionalities. A custom chart control is implemented to allow for the visualization of both historical data and prediction results. The specific design choices, such as the use of the Fluent Ribbon Control Suite, indicate a focus on providing a clean and functional user interface. The custom chart control, specifically designed to accommodate both historical data and prediction results, emphasizes the importance of effective data visualization. This design addresses the need for presenting both the raw data and the model outputs in an understandable format, which allows users to compare and contrast the information effectively.

3. Development Environment and Tools

The development environment consisted of two laptops and an Amazon Elastic Cloud Computing (EC2) instance for source code management, highlighting the need for both local development and remote access to the codebase. Microsoft's Visual Studio 2010 Ultimate edition is utilized as the Integrated Development Environment (IDE), emphasizing the leverage of its advanced features. The use of Visual Studio 2010 Ultimate provides a rich set of tools for application development, including features for architecture modeling and team collaboration. Team Foundation Server (TFS) was employed for application lifecycle management (ALM), supporting version control and work item tracking. The use of a cloud-based environment facilitated remote work and collaboration. Although the developer was working solo, TFS's potential for team collaboration is highlighted, suggesting the project's scalability and readiness for future team-based development. The TA-lib library is incorporated for the calculation of technical indicators, emphasizing the integration of third-party libraries for enhanced functionality.

4. Development Process and Iteration

The application's construction involved three iterations. Iteration 1 focused on implementing the overall architecture, laying the foundation for subsequent development phases. Iteration 2 concentrated on developing the rate prediction functionality, testing and refining the time series algorithm. Iteration 3 addressed trend prediction functionality, which required adapting the dataset with technical indicators and employing the classification trees algorithm. This iterative approach facilitated a phased development process, allowing for thorough testing and refinement of each feature. The three iterations highlight the complexity of the model and the careful incremental development approach employed. The description emphasizes the structured, iterative nature of the development, enabling the systematic implementation and testing of features.

IV.Experimentation and Results

The accuracy of DMForex was evaluated through experiments using three major currency pairs (EUR/USD, USD/JPY, USD/GBP). Results from the time series algorithm for rate prediction and the classification tree algorithm for trend prediction showed promising potential, although specifics regarding the accuracy metrics are not detailed in this summary.

1. Experimental Setup and Data

Two sets of experiments were conducted to evaluate DMForex's predictive capabilities: one for exchange rate prediction and another for trend prediction. The experiments utilized data from three major currency pairs (the specific pairs are referenced in Table 3, but the content of Table 3 is not available here), selected based on their high turnover as reported by the Bank for International Settlements (BIS). Daily historical price data from January 1, 1999, to December 31, 2011 (12 years), was used which represents a considerable dataset for analysis. This extensive timeframe ensures that the models are tested across various market conditions. The data was obtained as comma-delimited files and formatted to match the application's requirements. The selection of these currency pairs and the time period reflects a focus on testing the models' performance on significant and widely traded currency pairs over a substantial period of time.

2. Rate Prediction Experiment and Results

The exchange rate prediction was performed using a time series algorithm. This algorithm attempts to predict the numerical value of a currency pair's future closing price, based on the historical data. The results, while not detailed numerically in this section, are described as 'promising'. This suggests that the time series algorithm demonstrated some level of predictive accuracy. Further details regarding the evaluation metrics are needed for a complete understanding of the performance. The focus on the numerical prediction of the next closing value emphasizes the algorithm's ability to provide a precise quantitative forecast. The lack of specific quantitative results in this summary section highlights the need for a more detailed analysis of the results.

3. Trend Prediction Experiment and Results

The trend prediction experiment involved transforming the original dataset into a classification problem. This was achieved by adding SMA and RSI technical indicators and defining a 'trend' column based on user-specified days and pips parameters. The dataset transformation highlights the importance of feature engineering in improving the model's prediction capabilities. The user-defined trend parameters provide flexibility in defining the model's sensitivity to price changes. A classification tree algorithm was then used to predict the trend. Similar to the rate prediction results, the trend prediction results are also described qualitatively as showing 'promising use'. More detailed analysis and quantitative metrics are necessary to thoroughly assess the performance of this methodology. The transformation of the data into a classification problem illustrates the adaptability of the data mining approach.

V.Future Work and Enhancements

Future development will focus on integrating additional algorithms like Support Vector Machines (SVM) and Artificial Neural Networks (ANN). This is facilitated by the application's modular architecture. Further research into optimizing the prediction models and expanding the range of supported technical indicators is also planned.

1. Algorithm Expansion and Enhancement

The document identifies several potential improvements for DMForex, primarily focusing on the integration of new prediction algorithms. The most prominent suggestion is the incorporation of Support Vector Machines (SVM) and Artificial Neural Networks (ANN). These algorithms are widely used in time series forecasting and could potentially enhance the application's accuracy and predictive power. The modular architecture, facilitated by the PRISM framework, makes the integration of these algorithms feasible and relatively straightforward. This suggests a future development path focusing on the enhancement of the predictive capabilities of the system by utilizing more sophisticated and potentially more accurate prediction models. The mention of these algorithms indicates an awareness of the current state of research in the field of financial time series prediction.

2. Model Optimization and Refinement

Beyond adding new algorithms, the document implicitly suggests further refinement of the existing models. While the current implementation uses time series and classification tree algorithms, the potential for optimization remains. This could involve adjusting algorithm parameters, experimenting with different data preprocessing techniques, or exploring alternative feature engineering approaches. The ongoing refinement of the existing algorithms could lead to significant improvements in their accuracy and reliability. This suggests the necessity of continuous improvement and adaptation of the current algorithms based on new research and findings. This is a continuous process of model optimization which requires regular review and updates.

3. Expansion of Technical Indicators and Features

Another avenue for future work involves expanding the range of supported technical indicators. The current implementation includes SMA and RSI but additional indicators could enhance the model's ability to capture different aspects of market dynamics. The integration of more advanced or niche indicators could improve the algorithm's ability to identify and incorporate various market dynamics. This would enhance the model's sensitivity to different patterns and improve the accuracy of predictions. Adding more technical indicators would enhance the predictive capabilities. The current use of TA-lib demonstrates the ease of incorporating new indicators based on the library's extensive capabilities. This approach highlights the iterative and expandable nature of the design.