Why I encourage econ PhD students to learn Julia

Julia is a scientific computing language that an increasing number of economists are adopting (e.g., Tom Sargent, the NY FRB). It is a close substitute for Matlab, and the cost of switching from Matlab to Julia is somewhat modest since Julia syntax is quite similar to Matlab syntax after you change array references from parentheses to square brackets (e.g., “A(2, 2)” in Matlab is “A[2, 2]” in Julia and most other languages), though there are important differences. Julia also competes with Python, R, and C++, among other languages, as a computational tool.

I am now encouraging students to try Julia, which recently released version 1.0. I first installed Julia in the spring of 2016, when it was version 0.4. Julia’s advantages are that it is modern, elegant, open source, and often faster than Matlab. Its downside is that it is a young language, so its syntax is evolving, its user community is smaller, and some features are still in development.

A proper computer scientist would discuss Julia’s computational advantages in terms of concepts like multiple dispatch and typing of variables. For an unsophisticated economist like me, the proof of the pudding is in the eating. My story is quite similar to that of Bradley Setzler, whose structural model that took more than 24 hours to solve in Python took only 15 minutes using Julia. After hearing two of my computationally savvy Booth colleagues praise Julia, I tried it out when doing the numerical simulations in our “A Spatial Knowledge Economy” paper. I took my Matlab code, made a few modest syntax changes, and found that my Julia code solved for equilibrium in only one-sixth of the time that my Matlab code did. My code was likely inefficient in both cases, but that speed improvement persuaded me to use Julia for that project.

For a proper comparison of computational performance, you should look at papers by S. Boragan Aruoba and Jesus Fernandez-Villaverde and by Jon Danielsson and Jia Rong Fan. Aruoba and Fernandez-Villaverde have solved the stochastic neoclassical growth model in a dozen languages. Their 2018 update says “C++ is the fastest alternative, Julia offers a great balance of speed and ease of use, and Python is too slow.” Danielsson and Fan compared Matlab, R, Julia, and Python when implementing financial risk forecasting methods. While you should read their rich comparison, a brief summary of their assessment is that Julia excels in language features and speed but has considerable room for improvement in terms of data handling and libraries.

While I like Julia a lot, it is a young language, which comes at a cost. In March, I had to painfully convert a couple research projects written in Julia 0.5 to version 0.6 after an upgrade of GitHub’s security standards meant that Julia 0.5 users could no longer easily install packages. My computations were fine, of course, but a replication package that required artisanally-installed packages in a no-longer-supported environment wouldn’t have been very helpful to everyone else. I hope that Julia’s 1.0 release means that those who adopt the language now are less likely to face such growing pains, though it might be a couple of months before most packages support 1.0.

At this point, you probably should not use Julia for data cleaning. To be brief, Danielsson and Fan say that Julia is the worst of the four languages they considered for data handling. In our “How Segregated is Urban Consumption?” code, we did our data cleaning in Stata and our computation in Julia. Similarly, Michael Stepner’s health inequality code relies on Julia rather than Stata for a computation-intensive step and Tom Wollmann split his JMP code between Stata and Julia. At this point, I think most users would tell you to use Julia for computation, not data prep. (Caveat: I haven’t tried the JuliaDB package yet.)

If you want to get started in Julia, I found the “Lectures in Quantitative Economics” introduction to Julia by Tom Sargent and John Stachurski very helpful. Also look at Bradley Setzler’s Julia economics tutorials.

Trade economists might be interested in the Julia package FixedEffectModels.jl. It claims to be an order of magnitude faster than Stata when estimating two-way high-dimensional fixed-effects models, which is a bread-and-butter gravity regression. I plan to ask PhD students to explore these issues this fall and will report back after learning more.

One thought on “Why I encourage econ PhD students to learn Julia

  1. Computational nerd economist

    Hi Jonathan, nice post!

    One thought: in recent years, Python has improved substantially as an environment for numerical computation, and I think that some people are too quick to dismiss it now because of bad experiences in the past. There have been two key changes:

    1. The introduction of the infix operator @ for matrix multiplication has eliminated a major pain point for easy numerical coding.

    2. The rapid development of Numba has made it possible to easily accelerate (and often parallelize) code so that it is as fast as Julia.

    Some people still complain about #1, not realizing that it’s been fixed. Many more people, however, have absolutely no idea about #2, or don’t understand how to use it in practice.

    In principle, Numba is trivially easy: you take a function that performs some bottleneck numerical computation, type @jit at the top of it, and boom, you have just-in-time compiled performance similar to Julia’s. In practice, it is not always quite so easy, because Numba only accelerates a limited subset of Python, which contains pretty much everything you need for most numerical computations, but can still get tricky if you’re trying to be too fancy.

    You can code extremely well with Numba if you have a certain style: always being aware of what the key bottleneck inner loops of your code are, write those in a very simple way in their own function, and then @jit them. It happens that I have *exactly* this style, because I came of age in the bad old days of Matlab before it had some just-in-time compilation built in, and I always had to identify my bottlenecks and then write them separately as C or Fortran mex files. Coming from this background, coding with Python and Numba is trivially simple and vastly more productive.

    On the other hand, I recognize that most economists don’t have this style: they want to write everything in a fairly straightforward imperative way, and the idea of isolating the key bottlenecks and specifically accelerating them does not come naturally. Now, I’d argue that everyone would *benefit* from having a bit more of this style – you should always know how to profile your code and be aware of what the bottlenecks are. But realistically, not everyone is going to think this way, and for them Julia might be a more natural fit. (Also, there are a few things that are inherently going to be easier to speed up in Julia than Numba – although I’m not sure that many economists actually do these things. My guess is that Bradley Seltzer’s experience probably either predated Numba or failed to make full use of it.)

    Another big advantage of Julia over Python, to which you allude, is that Julia intentionally has syntax quite similar to Matlab’s, so the transition costs are lower for most economists. At this point, I view the Matlab similarities as a disadvantage (I hate 1-based indexing, hate cluttered namespaces, etc.), but people who just want a faster Matlab are going to feel much more at home with Julia.

    Set against this is a big advantage of Python: much better data-handling with Pandas, which you also mention. In this sense, Python is doing a much better job of solving the “two-language problem” (which is Julia’s raison d’être) than is Julia itself. It is conceivable and practical to do an entire project in Python – data wrangling, random scripting and general-purpose processing, simple numerical code, and accelerated numerical code with Numba – whereas this is not even close to being true for Julia. Another advantage of Python is that it is the de facto language of the rapidly-growing AI and machine learning communities. These communities have almost zero overlap with economics for now – but who knows what will happen in 10 years?

    Beyond all this, I use Python for a separate set of reasons relating to its (currently) much greater utility as a general-purpose programming language. I like to design tools that make it easy to interface with an underlying solution engine at a high level, and this has been much more practical for me in Python (just like it was more practical for TensorFlow and now PyTorch in the deep learning communities). This puts me in a very, very, very tiny minority of economists, however, and some of the tool-building economists have found Julia useful for other reasons; the main Python tool that I can think of is the HARK macro toolkit by Chris Carroll and coauthors.

    With big forces on both sides – ease of switching from Matlab and writing fast numerical code with Julia, vs. a much larger data and machine learning ecosystem and more power as a language with Python – I suspect we will be seeing a lot of both over the next few decades. I currently encourage students, depending on their specialization and interests, to learn some mix of Python, Julia, and R, with Python as the default first choice thanks to its utility for both data and numerics.

Comments are closed.