Teaching Machines to Code

Applying recent advances in Artificial Intelligence to synthesize programs from descriptions.

Hi everybody,

In this blog post we will outline some of our recent work on using neural networks and search techniques to synthesize programs from natural language spec and few examples from our paper.

Seq2Tree visualization

Code synthesis is a rather old field of research, that has been active since long before the deep learning took off. Because of that historically many code synthesis labs were concentrated on techniques that do not involve neural networks. For example, one of the most known successed of program synthesis to date FlashFill [1] uses search in a space of all programs in a restricted domain specific language using lots of hand crafted heuristics, no neural networks involved (this changed recently with RobustFill [2] and other recent work).

Since until the emergence of deep recurrent neural networks and attention mechanism natural language understanding was not working particularly great (and arguably it still isn’t), majority of advances in code synthesis until recently were based on input / output pairs as the specification of the desired program. While applicable in some cases (like the above mentioned FlashFill), generally it is desirable to be able to provide the specification in a form of a natural language request instead.

With natural language specifications come some challenges. First of all, natural language is very ambiguous. If you think about how we communicate what we want to engineers today, we rarely give them a task and walk away. More often than not we get engaged into a conversation, in which details of what we want are gradually figured out. If we are to build a system that synthesizes programs from natural language specification, we would need to somehow account for that – either by making sure the specifications we provide in our training set are absolutely unambiguous, or by providing some way to the network to disambiguate between different understandings.

Another major challenge is that in modern world the only real way to work with natural language is deep neural networks. Deep neural networks, however, are actually a pretty bad choice if we are looking for some model to produce code, at least in one-token-at-a-time manner. The reason is that deep nets are intrinsically probabilistic, meaning that if they are used to produce a program one token at a time, at each step there would be a small chance of producing an incorrect token. Errors compound very fast, and even for moderately long programs and very good model the chance of messing up at least one token are very high. In natural language translation this is not a big issue – couple words in a reverse order, or a slightly misused word generally can be tolerated. In code synthesis one misused token would result in a completely wrong program. This is where traditional code synthesis techniques, such as search, have a big advantage.

In this work we explore a possible combination of the two techniques. In particular, we first train a neural sequence to sequence neural network that reads a natural language description of a problem and outputs a program that solves it one token at time, and measure its accuracy. We then see how much we can improve upon the results by running search in the space of valid programs guided by this neural network instead of using the network directly to synthesize code.


As part of our work on Teaching Machines to Code, we are building Deep Learning models that are reading or writing code in a tree format. After trying to manage this complexity in TensorFlow, I’ve decided to give a try to PyTorch.

PyTorch is a framework built by Facebook AI researchers and has been growing in popularity in Natural Language and Reinforcment Learning research community. It’s main benefit is in dynamic graph building principle — compared to Tensorflow, where graph is built once and then “executed” many times, PyTorch allows to dynamically rebuild graph using simple Python logic, as if you were doing computation with numpy arrays.

PyTorch visualizatoin

This flexibility attracted people who work with complex input/output data [e.g. language, trees, graphs] or need to run some custom logic in the middle of the computation [e.g. Deep RL].

Here I want to talk about batching things. Even though PyTorch is fast by using GPU accelerators and in general pushing computation on C modules, if you are not batching your computation — you are still going to pay the toll.

Recursive neural network [TreeLSTM as an example] are especially hard to batch, as each example is a different tree.

We have built a library TorchFold to make it simple to batch anything, however complicated and dynamic architecture you may have.


Recently we have been exploring applications of code understanding technology we are building. Code search as an applicaiton naturally comes to mind.

Being developers ourselves, we use code search everyday for various types of questions. The span of questions ranges from asking for definition of specific function or API, looking up implementation of some functionality to a description of what we want to do. Currently one would use Google, StackOverflow and Github for all of these. Going from one website to another, using Github search for in-repo search.

As an alternative, we can index all the functions, classes and APIs in a single search index. We can embed the implementations of functions using neural network pre-trained on aligning of the text and code [from our generator network]. Then at retrieval time we are going to lookup tokens in regular index and encode text using the other part of neural network. Finding nearest neighbors in the encoded code space, can augment regular search techniques with additional “semantic” meaning.

We have a quick demo of such functionality available here:

Code Search Demo

Let us know if you have any comments at contact@near.ai.


Hi everyone,

Today we are going to show few demos of Sequence to Sequence models for code completion.

In the first demo we trained a seq2seq model, more on which below, on all the Accepted java solutions on CodeForces. The goal of the model is to predict the next token based on all the tokens seen so far. We then plug the output of the model into a code editor to see how it behaves when one actually solves a competitive programming problem. Here we are solving problem A from Codeforces Round 407:

Code Search Demo

Notice how early on the model nearly perfectly predicts all the tokens, which is not surprising, since most of the Java solutions begin with rather standard imports, and the solution class definition. Later it perfectly predicts the entire line that reads n, but doesn’t do that well predicting reading k, which is not surprising, since it is a rather rare name for the second variable to be read.

There are several interesting moments in the video. First, note how after int n = it predicts sc, understanding that n will probably be read from the scanner (and while not shown in the video, if the scanner name was in, the model would have properly predicted in after int n =), however when the line starts with int ans =, it then properly predicts 0, since ans is rarely read from the input.

The second interesting moment is what happens when we are printing the answer. At first when the line contains System.out.println(ans it predicts a semicolon (mistakenly) and the closing parenthesis as possible next tokens, but not - 1, however when we introduce the second parenthesis System.out.println((ans, it then properly predicts -1, closing parenthesis, and the division by two.

You can also notice a noticeable pause before the for loop is written. This is due to the fact that using such artificial intelligence suggestions completely turns off the natural intelligence the operator of the machine possesses :)

One concern with such autocomplete is that in the majority of cases most of the tokens are faster to type than to select from the list. To address it, in the second demo we introduce beam search that searches for the most likely sequences of tokens. Here’s what it looks like:

Code Search Demo

Here there are more rough edges, but notice how early on the model can correctly predict entire lines of code.

Currently we do not condition the predictions on the task. Partially because number of tasks available on the Internet is too small for a machine learning model to predict anything reasonable (so, please help us fix it by participating here: https://r-nn.com). Once we have a working model that is conditioned on the statement, we expect it to be able to predict variable names, snippets to read and write data and computing some basic logic.

Let’s review Seq2Seq models in the following section.


This post is a first post in a series of posts discussing some interesting advances in using machine learning for both writing and executing code. This particular post is about a machine learning model proposed early last year by Scott Reed from University of Michigan, then an intern at Google DeepMind, called Neural Programmer-Interpreters.

Neural Programmer-Interpreters

Neural programmer-interpreter, or NPI for short, is a machine learning model that learns to execute programs given their execution traces. This is different from some other models, such as Neural Turing Machines, that learn to execute programs only given example input-output pairs.

Internally NPI is a recurrent neural network. If you are unfamiliar with recurrent neural networks, I highly recommend reading this article by Andrej Karpathy, that shows some very impressive results of very simple recurrent neural networks: The Unreasonable Effectiveness of Recurrent Neural Networks.

The setting in which NPI operates consists of an environment in which a certain task needs to be executed, and some low level operations that can be performed in such environment. For example, a setting might comprise:

  • An environment as a grid in which each cell can contain a digit; and a cursor in each row of the grid pointing to one of the cells;

  • A task of adding up two multidigit numbers;

  • Low-level operations including “move cursor in one of the rows” and “write a digit”: