Best programming language for algorithmic trading systems
Published
One of the most common questions I receive in the QS mailbox is: "What is the best programming language for algorithmic trading?". The short answer is that there is no "best" language. Strategy parameters, performance, modularity, development, resiliency, and cost all need to be considered. This article outlines the necessary components of an algorithmic trading system architecture and it explains how implementation decisions impact language choice.
First, the most important components of an algorithmic trading system are considered, such as the research tools, the portfolio optimizer, the risk manager and the execution engine. Various trading strategies and their effects on the design of the system are then examined. In particular, the frequency of trading and the expected trading volume are discussed.
Once the trading strategy has been selected, the entire system needs to be designed. This includes the choice of hardware, operating system(s), and system resilience in the event of rare, potentially catastrophic events. Architecture considerations also need to take performance into account - both in the research tools and in the live execution environment.
What should the trading system do?
Before deciding on the “best” language with which to write an automated trading system, the requirements must be defined. Will the system be purely execution-based? Will the system require a risk management or portfolio construction module? Does the system need a powerful backtester? For most strategies, the trading system can be divided into two categories: research and signal generation.
Research involves evaluating the performance of a strategy based on historical data. The process of evaluating a trading strategy based on previous market data is called backtesting. The size of the data and the complexity of the algorithm have a large impact on the computational intensity of the backtester. CPU speed and concurrency are often the limiting factors in optimizing exam execution speed.
Signal generation involves generating a series of trading signals from an algorithm and sending these orders to the market, usually through a broker. Certain strategies require a high level of performance. I/O issues such as network bandwidth and latency are often the limiting factor in optimizing execution systems. Therefore, the choice of languages for each component of your overall system can be quite different.
Type, frequency and scope of the strategy
The type of algorithmic strategy used has a significant impact on the design of the system. Considerations include the markets traded, connectivity to external data providers, the frequency and volume of the strategy, the trade-off between ease of development and performance optimization, and any custom hardware that may be required, including custom servers, GPUs or FPGAs.
The technology selection for a low-frequency strategy for US equities will be significantly different than that of a high-frequency strategy for statistical arbitrage in the futures market. Before choosing a language, many data providers must be evaluated that are suitable for the respective strategy.
What needs to be taken into account are the connectivity with the provider, the structure of the APIs, the timeliness of the data, the storage requirements and the reliability if a provider goes offline. It is also advisable to have quick access to multiple providers! The different instruments all have their own storage characteristics, e.g. B. multiple ticker symbols for stocks and expiration dates for futures (not to mention the specific OTC dates). This must be taken into account when designing the platform.
The frequency of strategies will likely be one of the most important factors in defining the technology stack. Strategies that use data more frequently than minute or second require significant performance considerations.
For a strategy that uses more than one-second cycles (i.e. tick data), performance-oriented design is the most important requirement. High-frequency strategies require a significant amount of market data to be stored and evaluated. Software like HDF5 or kdb+ are commonly used for these tasks.
To process the large amounts of data required for HFT applications, a comprehensively optimized backtester and execution system must be deployed. C/C++ (possibly with some assembly language) is probably the most suitable language. Ultra-high frequency strategies will almost certainly require custom hardware such as FPGAs, exchange co-location, and kernal/network interface tuning.
Research systems
Research systems typically involve a mix of interactive development and automated scripting. The former is often found in an IDE such as Visual Studio, MatLab or R Studio instead of. The latter involves extensive numerical calculations across numerous parameters and data points. This leads to the need to choose a language that provides a straightforward environment for testing the code, but is also powerful enough to evaluate strategies across multiple parameter dimensions.
Typical IDEs in this area are Microsoft Visual C++/C#, which includes extensive debugging tools, code completion capabilities (via "Intellisense"), and simple overviews of the entire project stack (via the LINQ database ORM); MatLab, designed for large-scale numerical linear algebra and vectorized operations, but in an interactive console form; R Studio, which packages the console of the R statistical language into a full-fledged IDE; Eclipse IDE for Linux Java and C++; and semi-proprietary IDEs such as the Anaconda distribution, which includes the Spyder IDE for Python. This distribution includes data analysis libraries such as NumPy, SciPy, scikit-learn and pandas in a single interactive (console) environment.
For numerical backtesting, all of the above languages are suitable, although it is not necessary to use a GUI/IDE as the code runs "in the background". The most important consideration at this stage is execution speed. A compiled language (like C++) is often useful when the backtesting parameters are very large. Remember that in this case you need to be careful with such systems!
Interpreted languages like Python often use high-performance libraries such as NumPy/pandas for the backtesting step in order to remain reasonably competitive with compiled counterparts. Ultimately, the language chosen for backtesting is determined by the specific algorithmic requirements as well as the libraries available in the language (more on this below). However, the language used for the backtester and research environments can be completely independent of the languages used for portfolio construction, risk management and execution components, as we will see.
Portfolio construction and risk management
The components of portfolio construction and risk management are often overlooked by algorithmic retail traders. This is almost always a mistake. These instruments are the mechanism through which capital is preserved. Not only are they trying to reduce the number of "risky" bets, but also to minimize fluctuation in the deals themselves, thereby reducing transaction costs.
Sophisticated versions of these components can have a significant impact on quality and consistency of profitability. It is easy to develop a range of strategies as the portfolio construction mechanism and risk manager can be easily changed to manage multiple systems. Therefore, they should be considered as essential components when starting to develop an algorithmic trading system.
The job of the portfolio construction system is to create from a set of desired trades a set of actual trades that minimize fluctuation, maintain exposure to various factors (such as sectors, asset classes, volatility, etc.), and allocate capital to different strategies optimize in a portfolio.
Portfolio construction often boils down to a linear algebra problem (e.g. matrix factorization), and therefore performance depends heavily on the efficiency of the available numerical linear algebra implementation. Common libraries include uBLAS, LAPACK and NAG for C++. MatLab also has extensively optimized matrix operations. Python uses NumPy/SciPy for such calculations. A frequently rebalanced portfolio requires a compiled (and well-optimized!) matrix library to perform this step so that the trading system does not become bottlenecked.
Risk management is another extremely important part of an algorithmic trading system. Risk can come in many forms: increased volatility (although this may be considered desirable for certain strategies!), increased correlations between asset classes, counterparty failure, server failures, "black swans" and undetected errors in trading code, to name a few.
The risk management components attempt to anticipate the impact of excessive volatility and correlation between asset classes and the resulting impact on trading capital. Often this boils down to a series of statistical calculations such as Monte Carlo "stress tests". This is very similar to the computational needs of a derivative price calculation engine and is therefore CPU bound. These simulations are highly parallelizable (see below), and to some extent it is possible to "throw hardware at the problem".
Execution systems
Those : dollarsandsense.sg
The job of the execution system is to receive filtered trading signals from the portfolio construction and risk management components and forward them to a broker or other market gateway. For the majority of retail algorithmic trading strategies, this means an API or FIX connection to a broker like Interactive Brokers. Key considerations when deciding on a language include the quality of the API, the availability of language wrappers for an API, execution frequency, and expected slippage.
The "quality" of the API refers to how well it is documented, what kind of performance it provides, whether standalone software is required to access it, or whether a gateway can be set up without a user interface. In the case of Interactive Brokers, the Trader WorkStation tool must run in a GUI environment to access the API. I once had to install an Ubuntu desktop edition on an Amazon cloud server to access Interactive Brokers remotely, for this very reason!
Most APIs offer a C++ and/or Java interface . Normally it is the responsibility of the community to develop language-specific wrappers for C#, Python , R, Excel and develop MatLab. Note that with any additional plugin (especially API wrappers) there is a chance that bugs will creep into the system. Always test such plugins and make sure they are actively maintained. A good indicator is how many new updates have been made to a codebase in the last few months.
The execution frequency is of utmost importance for the execution algorithm. Keep in mind that hundreds of orders can be sent every minute, so performance is critical. A poorly functioning execution system results in losses that dramatically impact profitability.
Statically typed languages (see below) like C++/Java are generally optimal for execution, but there is a trade-off in development time, testing and maintainability. Dynamically typed languages like Python and Perl are generally "fast enough" today. Always ensure that the components are modular (see below) so that they can be "swapped" as the system is scaled.
Architectural planning and development process
The components of a trading system and the frequency and volume requirements have already been discussed, but the system infrastructure has not yet been covered. Anyone who operates as a retailer or works in a small fund will likely have to “wear many hats”. One has to deal with the alpha model, risk management and execution parameters, as well as the final implementation of the system. Before the individual languages are discussed, the design of an optimal system architecture will be discussed.
Separation of interests
One of the most important decisions to make when starting out is how to separate the different parts of a trading system. In software development, this essentially means how to break down the various aspects of the trading system into separate modular components.
By exposing interfaces on each of the components, it is easy to swap parts of the system for other versions that improve performance, reliability, or maintainability without having to change the code for external dependencies. This is the "best practice" for such systems. Such practices are recommended for lower frequency strategies. For high-frequency trading, the rules may need to be ignored in order to optimize the system for even greater performance. A more closely coupled system might be desirable.
Creating a component plan for an algorithmic trading system is worth its own article. However, an optimal approach is to ensure that there are separate components for the historical and real-time market data inputs, data storage, data access API, backtester, strategy parameters, portfolio construction, risk management and automated execution systems.
For example, if the data storage in use does not perform as desired even with significant optimization, it can be replaced with minimal changes to the data entry or data access API. There is no difference in terms of the backtester and subsequent components.
Another advantage of the separate components is that a variety of programming languages can be used in the overall system. There is no need to limit yourself to a single language when the communication method of the components is language independent. This is the case if they communicate via TCP/IP, ZeroMQ or another language-independent protocol.
A concrete example is a backtesting system written in C++ to ensure "number crunching" performance, while the portfolio manager and execution systems were written in Python using SciPy and IBPy.
Performance considerations
Performance is an important factor for most trading strategies. For higher frequency strategies it is the most important factor. The term “performance” covers a wide range of aspects such as algorithm execution speed, network latency, bandwidth, data I/O, concurrency/concurrency, and scaling. Each of these areas is covered individually in comprehensive textbooks, so this article will only scratch the surface of each topic. Architecture and language choice are now discussed in terms of their impact on performance.
The prevailing wisdom, articulated by Donald Knuth, one of the fathers of computer science, is that "premature optimization is the root of all evil." This is almost always the case - except when developing a high frequency trading algorithm! For those interested in lower frequency strategies, a common approach is to build a system as simply as possible and only optimize when bottlenecks arise.
Profiling tools are used to determine where bottlenecks arise. Profiles can be created for all of the above factors, in either an MS Windows or Linux environment. There are numerous operating system and language tools and third-party utilities available for this purpose. The choice of language is now discussed in the context of performance.
C++, Java, Python, R and MatLab all contain powerful libraries (either as part of their standard or externally) for basic data structures and algorithmic work. C++ ships with the Standard Template Library, while Python includes NumPy/SciPy. Common math problems can be found in these libraries, and it is rarely beneficial to write a new implementation.
An exception is when a highly customized hardware architecture is required and an algorithm makes extensive use of proprietary extensions (e.g. custom caches). However, “reinventing the wheel” often wastes time that could be better spent developing and optimizing other parts of the trading infrastructure. Development time is extremely valuable, especially when it involves a single developer.
Latency is often an execution system issue since the research tools are usually on the same machine. In the former, latency can occur at multiple points along the execution path. Databases must be consulted (disk/network latency), signals must be generated (operating system, kernel message latency), trading signals must be sent (NIC latency), and orders must be processed (internal exchange systems latency).
For higher frequency operations, it is necessary to pay in-depth attention to optimizing the core and network transmission. This is a deep area and is well beyond the scope of this article, but if you want a UHFT algorithm, you should be aware of the depth of knowledge required!
Data caching is a very useful part of a quantitative trading developer's toolkit. Caching refers to the concept of storing frequently accessed data in a manner that allows for more powerful access at the expense of the data's potential perishability. A common use case in web development is transferring data from a relational database on disk to memory. Subsequent requests for the data no longer need to access the database, which can result in significant performance improvements.
For trading situations, caching can be extremely beneficial. For example, the current status of a strategy portfolio can be stored in a cache until it is rebalanced, so that the list does not have to be recreated with each loop of the trading algorithm. Such regeneration is likely to incur high CPU or disk I/O overhead.
However, caching is not without its problems. Concurrent regeneration of cache data can place significant infrastructure demands due to the volatility of cache memory. Another problem is dog-piling, where multiple generations of a new cache copy are performed under extremely high load, resulting in cascading errors.
Dynamic memory allocation is an expensive operation in software execution. Therefore, it is essential for more powerful trading applications to know exactly how memory is allocated and deallocated during program execution. Newer language standards such as Java, C# and Python all perform automatic garbage collection, i.e. H. freeing dynamically allocated memory when objects go out of scope.
Garbage collection is extremely useful during development as it reduces errors and improves readability. However, it is often suboptimal for certain high-frequency trading strategies. For these cases, custom garbage collection is often desired. For example, in Java, it is possible to achieve high performance for HFT strategies by tuning the garbage collector and heap configuration.
C++ does not provide a native garbage collector, so it is necessary to treat all memory allocation/deallocation as part of the object implementation. Although this is potentially error-prone (which can lead to "dangling pointers"), for certain applications it is extremely useful to have fine-grained control over how objects appear on the heap. When choosing a language, you should make sure how the garbage collector works and whether it can be modified to optimize it for a specific use case.
Many operations in algorithmic trading systems are suitable for parallelization. This refers to the concept of simultaneous, i.e. H. "parallel" execution of several programmatic operations. So-called "embarrassingly parallel" algorithms involve steps that can be calculated completely independently of other steps. Certain statistical operations, such as Some algorithms, such as Monte Carlo simulations, are a good example of scrupulously parallel algorithms because each random draw and each subsequent path operation can be calculated without knowledge of other paths.
Other algorithms can only be partially parallelized. Fluid dynamics simulations are one such example where the computational domain can be divided, but ultimately these domains must communicate with each other so that the operations are partially sequential. Parallelizable algorithms are subject to Amdahl's Law, which sets a theoretical upper limit for the performance increase of a parallelized algorithm on separate processes (e.g. on a CPU core or thread).
Parallelization has become increasingly important as an optimization tool since processor clock speeds have stagnated, as newer processors contain many cores that can perform parallel calculations. The emergence of consumer graphics hardware (particularly for video games) has led to the development of Graphical Processing Units (GPUs), which contain hundreds of "cores" for highly concurrent operations. Such GPUs are now very affordable. High-level frameworks such as Nvidia's CUDA have led to widespread use in science and finance.
Such GPU hardware is generally only suitable for the research aspect of quantitative finance, while other, more specialized hardware (including field-programmable gate arrays - FPGAs) is used for (U)HFT. Today, most modern languages support some level of concurrency/multithreading. Thus, it is easy to optimize a backtester since all calculations are generally independent of each other.
In software engineering and operations, scaling is the system's ability to handle ever-increasing loads in the form of more requests, higher processor utilization, and more memory allocation. In algorithmic trading, a strategy is scalable if it can absorb larger amounts of capital and still generate consistent returns. The trading technology stack is scalable if it can support larger trading volumes and higher latency without causing bottlenecks.
While systems need to be scalable, it is often difficult to predict where a bottleneck will occur. Rigorous logging, testing, profiling and monitoring go a long way in helping a system scale. Languages themselves are often described as "not scalable". This is usually the result of misinformation rather than hard facts. Not the language, but the entire technology stack should be tested for scalability. Of course, certain languages are more powerful than others in certain use cases, but one language is never "better" than another in every way.
One way to control scaling is to separate areas, as mentioned above. Furthermore, to provide the ability to handle "spikes" in the system (i.e. sudden volatility that triggers a whole series of trades), it makes sense to create a "message queuing architecture". This simply means that a queuing system is established between components so that jobs are "batched" when a particular component is unable to handle many requests.
Instead of the orders being lost, they are simply held in a batch until the message is processed. This is particularly useful for submitting trades to an execution engine. If the engine suffers from high latency, it will hold back completions. A queue between the trading signal generator and the execution API mitigates this problem at the expense of potential trade shifts. A well-respected open source message queue broker is RabbitMQ.
Hardware and operating systems
The hardware your strategy runs on can have a significant impact on the profitability of your algorithm. This is also not a problem limited to high-frequency traders. Poor hardware and operating system choices can cause your computer to crash or restart at the most inopportune moment. Therefore, you have to think about where to place your application. As a rule, you have the choice between a personal desktop computer, a remote server, a “cloud” provider or a server located on the stock exchange.
Desktop computers are easy to install and manage, especially with newer user-friendly operating systems such as Windows 7/8, Mac OSX and Ubuntu. However, desktop systems also have some significant disadvantages. The most important one is that the versions of operating systems designed for desktop computers are likely to require reboots/patches (and often at the most inconvenient times!). They also consume more computing resources because they require a graphical user interface (GUI).
Using hardware in a home environment (or local office) may cause internet connection and power issues. The main advantage of a desktop system is that significant computing power can be obtained for a fraction of the cost of a dedicated server (or a Cloud based system ) can be acquired at comparable speed.
While a dedicated server or cloud-based machine is often more expensive than a desktop option, it allows for more extensive redundancy infrastructure, such as: Such as automatic data backups, the ability to more easily ensure uptime, and remote monitoring. They are more difficult to manage because they require the ability to log in remotely operating system require.
On Windows, this is generally done via the GUI Remote Desktop Protocol (RDP). In Unix-based systems, the Secure SHell (SSH) command line is used. Unix-based server infrastructures are almost always command line-based, which immediately renders GUI-based programming tools (like MatLab or Excel) unusable.
A co-located server, as the term is used in capital markets, is simply a dedicated server located within an exchange to reduce trading algorithm latency. This is absolutely necessary for certain high-frequency trading strategies that rely on low latency to generate alpha.
The final consideration when choosing hardware and programming language is platform independence. Does the code need to run on multiple different operating systems? Is the code designed for a specific processor architecture such as Intel x86/x64 or can it also run on RISC processors such as those from ARM? These questions depend largely on the frequency and type of strategy being implemented.
Resilience and testing
Those : youtube.com
One of the best ways to lose a lot of money in algorithmic trading is to design a system that is not resilient. This refers to the resilience of the system in the event of rare events such as broker bankruptcies, sudden excessive volatility, regional failure of a cloud server provider, or the accidental deletion of an entire trading database. Years of gains can be wiped out in seconds with poorly designed architecture. It is absolutely necessary to consider issues such as troubleshooting, testing, logging, backups, high availability and monitoring as core components of your system.
It is likely that for any reasonably complicated custom quantitative trading application, at least 50% of the development time will be spent on troubleshooting, testing, and maintenance.
Almost all programming languages either ship with a debugger or have proven third-party alternatives. Essentially, a debugger allows a program to be executed by inserting arbitrary breakpoints in the code path that temporarily halt execution to examine the state of the system. The main advantage of debugging is that it is possible to examine the behavior of code before a known crash point.
Debugging is an essential part of the toolbox for analyzing programming errors. However, they are more commonly used in compiled languages such as C++ or Java, as interpreted languages such as Python are often easier to debug due to less LOC and less verbose instructions. Despite this tendency, Python ships with pdb, a sophisticated debugging tool. The Microsoft Visual C++ IDE has extensive GUI debugging tools, while for Linux C++ programmers working on the command line, the gdb debugger is available.
Testing in software development refers to the process of applying known parameters and results to specific functions, methods, and objects within a code base to simulate behavior and evaluate multiple code paths to ensure that a system behaves as it should . A newer paradigm is Test Driven Development (TDD), where test code is developed against a specific interface without implementation. Before the actual code base is completed, all tests will fail. If the code is written to fill in the gaps, eventually the tests will all pass and then development should stop.
TDD requires extensive upfront specification design and a healthy level of discipline to be carried out successfully. In C++, Boost provides a framework for unit testing. In Java there is JUnit library which serves the same purpose. Python also has the unittest module as part of the standard library. Many other languages have unit testing frameworks, and there are often multiple options.
In a production environment, sophisticated logging is essential. Logging refers to the process of outputting messages of varying degrees of severity about the execution behavior of a system to a flat file or database. Logs are a "first line of attack" when looking for unexpected program runtime behavior. Unfortunately, the shortcomings of a logging system are often only discovered after the fact! As with the backups discussed below, a logging system should be considered BEFORE developing a system.
Both Microsoft Windows and Linux have extensive system logging capabilities, and programming languages typically ship with standard logging libraries that cover most use cases. It is often advisable to centralize the logging information for analysis at a later date, as it can often lead to ideas for improving performance or reducing errors, which will almost certainly have a positive impact on your trading returns.
While logging a system provides insight into what has happened in the past, monitoring an application provides insight into what is happening now. All aspects of the system should be considered when monitoring. System-level metrics such as disk usage, available memory, network bandwidth, and CPU usage provide basic load information.
Trading metrics such as abnormal prices/volumes, sudden rapid declines, and account risk for different sectors/markets should also be continuously monitored. In addition, a threshold system should be established that provides notification when certain metrics are violated, with the notification method (email, SMS, automated phone call) varying depending on the severity of the metric.
System monitoring is often the domain of the system administrator or operations manager. However, as a developer of a retail business, these metrics need to be set as part of the overall concept. There are many solutions for monitoring: proprietary, hosted, and open source solutions that allow extensive customization of metrics for a specific use case.
Backups and high availability should be a priority for a trading system. Consider the following two questions: 1) If an entire production database of market data and trading history were deleted (without backups), how would that affect the research and execution algorithm? 2) How would a prolonged downtime of the trading system (with open positions) affect the account balance and ongoing profitability? The answers to these two questions are often sobering!
It is imperative to have a system in place to back up the data and test the recovery of that data. Many people don't test a recovery strategy. If crash recovery has not been tested in a safe environment, what guarantees are there that worst case recovery is available?
Similarly, high availability must be “built in” from the start. Redundant infrastructures (even if they incur additional costs) must always be considered as the costs of downtime are likely to far exceed the ongoing maintenance costs of such systems. I won't delve too deeply into this topic as it is a large area, but make sure it is one of the first considerations you make for your trading system.
Choosing a language
The various factors involved in developing a customized, high-performance algorithmic trading system have now been described in detail. The next step will be to discuss how programming languages are generally categorized.
Type systems
When choosing a language for a trading stack, the type system must be taken into account. The languages of interest for algorithmic trading are either statically or dynamically typed. In a statically typed language, the types (e.g. integers, floating point numbers, user-defined classes, etc.) are checked during the compilation process. These languages include C++ and Java. A dynamically typed language does most of the type checking at runtime. These languages include Python, Perl and JavaScript.
For a highly numerical system such as an algorithmic trading engine, compile-time type checking can be extremely beneficial as it can eliminate many errors that would otherwise result in numerical errors. However, type checking doesn't catch everything and this is where exception handling comes into play as unexpected operations need to be handled. "Dynamic" languages (i.e. those that are dynamically typed) can often lead to runtime errors that would otherwise be caught by compile-time type checking. This is why the concept of TDD (see above) and unit tests, which if When done correctly, they often provide more security than compile-time checking alone.
Another advantage of statically typed languages is that the compiler can make many optimizations not available to a dynamically typed language, simply because the type (and hence the memory requirement) is known at compile time. In fact, part of the inefficiency of many dynamically typed languages comes from the fact that certain objects must be type checked at runtime, which incurs a performance penalty. Libraries for dynamic languages, such as Some programs, such as NumPy/SciPy, mitigate this problem by enforcing a type in arrays.
Open source or proprietary?
One of the most important decisions an algorithmic trading system developer must make is whether to use proprietary (commercial) or open source technologies. Both approaches have their advantages and disadvantages. Things to consider include how well a language is supported, how active the community around a language is, how easy it is to install and maintain, how good the documentation is, and what the licensing and maintenance costs are.
Microsoft's .NET stack (including Visual C++ and Visual C#) and MathWorks' MatLab are two of the larger proprietary solutions for developing custom algorithmic trading software. Both tools have proven success in the financial industry, with the former forming the predominant software stack for trading infrastructure in investment banking and the latter being used extensively for quantitative trading research in mutual funds.
Both Microsoft and MathWorks provide extensive, high-quality documentation for their products. Additionally, the communities surrounding each tool are very large and there are active web forums for both. The .NET software enables coherent integration with multiple languages such as C++, C# and VB. MatLab also has many plugins/libraries (some free, some commercial) for almost every area of quantitative research.
There are also disadvantages. With both programs, the cost for a lone wolf is not insignificant (although Microsoft offers the entry-level version of Visual Studio for free). The Microsoft tools "play" well with each other, but integrate less well with external code. Visual Studio must also run on Microsoft Windows, which is probably far less powerful than an equivalent, optimally tuned Linux server.
Additionally, MatLab is missing some important plugins, such as: B. a good wrapper for the API of Interactive Brokers, one of the few brokers suitable for high-performance algorithmic trading. The main problem with proprietary products is the lack of source code availability. This means that these two tools are far less attractive when high performance is really required.
Open source tools have been industry standard for some time. In the area of alternative investments, open source Linux, MySQL / PostgreSQL , Python , R, C++ and Java are heavily used for production. However, they are far from limited to this area. In particular Python and R contain a wealth of extensive numerical libraries that can be used to perform almost any type of data analysis imaginable, often with execution speeds comparable to compiled languages, although with some limitations.
The main advantage of using interpreted languages is the reduced development time. Python and R require much fewer lines of code (LOC) to achieve similar functionality, largely due to the extensive libraries. Additionally, they often enable interactive console-based development, quickly shortening the iterative development process.
Since a developer's time is extremely valuable and execution speed is often less so (except in the HFT space), it is worth fully considering an open source technology stack. Python and R have significant developer communities and are very well supported due to their popularity. The documentation is excellent, and bugs (at least for the core libraries) are rare.
Open source tools often suffer from the lack of a dedicated commercial support contract and run optimally on systems with less forgiving user interfaces. A typical Linux server (e.g. Ubuntu) is often completely command line oriented. Additionally, Python and R can be slow when performing certain execution tasks. There are mechanisms for integration with C++ to improve execution speed, but this requires some experience in multilingual programming.
Although even proprietary software is not immune to dependency/version problems, it is much less common to have to deal with incorrect library versions in such environments. Open source operating systems like Linux can be more difficult to manage.
I would like to express my personal opinion here and claim that I develop all my trading tools using open source technologies. Specifically, I use: Ubuntu, MySQL, Python, C++, and R. The sophistication, size of the community, ability to drill down into problems, and lower total cost of ownership (TCO) far outweigh the simplicity of proprietary interfaces and easy installations . That being said, Microsoft Visual Studio (especially for C++) is a fantastic integrated development environment (IDE) that I would also highly recommend.
Batteries included?
Those : tradeoptionswithme.com
The heading of this section refers to the "out of the box" capabilities of the language - what libraries does it contain and how good are they? This is where mature languages have an advantage over newer versions. C++, Java, and Python all have extensive libraries for network programming, HTTP, operating system interaction, graphical user interfaces, regular expressions (regex), iteration, and basic algorithms.
C++ is famous for its Standard Template Library (STL), which contains a wealth of powerful data structures and algorithms "for free". Python is known for being able to communicate with almost all other systems/protocols (especially the web), especially through its own standard library. R has a wealth of statistical and econometric tools, while MatLab is extremely optimized for numerical linear algebra codes (such as those used in portfolio optimization and derivatives pricing).
Outside of the standard libraries, C++ makes use of the Boost library, which supplements the "missing parts" of the standard library. In fact, many parts of Boost were included in the TR1 standard and are now available in the C++11 specification, including native support for lambda expressions and concurrency.
Python has the high-performance NumPy/SciPy/Pandas library for data analysis, which has been widely adopted in algorithmic trading research. There are also powerful plugins for accessing major relational databases, such as MySQL++ (MySQL/C++), JDBC (Java/MatLab) and MySQLdb (MySQL/Python). Python can even communicate with R via the RPy plugin!
An often overlooked aspect of a trading system in the initial research and design phase is connectivity to a broker API. Most APIs natively support C++ and Java, but some also support C# and Python, either directly or with community-provided wrapper code for the C++ APIs. Interactive brokers in particular can be connected via the IBPy plugin. If high performance is required, the brokers support the FIX protocol.
conclusion
As is now clear, choosing the programming language(s) for an algorithmic trading system is not easy and requires careful consideration. The key considerations are performance, ease of development, resiliency and testing, separation of concerns, familiarity, maintenance, source code availability, licensing costs, and library maturity.
The advantage of a separate architecture is that languages can be "plugged in" for different aspects of a trading stack as requirements change. A trading system is a constantly evolving tool, and it is likely that the choice of language will evolve with it.