Automated Translation of Java to Python
I’ve written a tool to automatically translate Java source code to Python source code. The tool is useful, and it’s already working for me as I intended. It’s called java2python (clever, no?) and you can download it here.
Let me back up a bit and explain the motivation behind this. I’m the author and sole maintainer of the Python port of the Interactive Brokers API (IbPy). IB provides a default/reference Java implementation for UNIX and MacOS. This reference implementation is straight forward: it contains a Thread subclass that reads from a socket, an associated class for writing to the socket, plus a few other support classes. Conceptually pretty simple, and the initial port was actually easy (once I figured out the difference between writing data to a socket in Java and writing data to a socket in Python).
That was 5 years ago. In those 5 years, I’ve refactored the Python code quite a bit, and IB has enhanced their code and their network protocol significantly as well. I dropped out of trading for a while, and Life, the Universe, and Everything have conspired to keep me from tending to the port as much as I should. I have kept up with the code in spare moments, but haven’t been able to put together a tested release in quite a while. I devoted some time recently to working on the port and ran into trouble. I downloaded the latest release from IB and started to go through their source, line by line, matching it against my code. In reading their source, I realized how complex and hopeless the whole thing had become. Complex is bad, but
even worse, it was only thru force of will that I would ever finish. That’s doom for a project, and that’s the lowest you can rank in engineering software.
So I stepped back for a bit and thought about the problem. What was consuming so much time was translating code using my (increasingly older) noodle. Maintaining the port in the manner I was had no promise of ever getting better. Now, the IB Java code is pretty reasonable. It’s not tricky,
it’s fairly consistent, but it is a bit verbose for what it does (it’s Java, after all). And there’s lots and lots and lots of conditional logic, that if not perfect, breaks the communication between the trading application (the client) and the trading platform (the server). I toyed with the idea of
regex-ing the snot out of their source to produce something not unlike Python code, but ultimately rejected the idea as untenable and fraught with complexity. And complexity is what’s to be avoided.
Then I took a serious look at ANTLR. I knew it had strong support for both Java and Python, and I believed that most (if not all) of the bits I needed would already be there. I read the documentation, examples, and the various articles I could find. As luck (or actually, the hard work of many other developers) would have it, the ANTLR distribution includes an example grammar for lexing and parsing Java source. Given this grammar, and the Python script examples, I was able to print out an abstract syntax tree of all the IB Java source. Half the work done, and I didn’t have to lift a finger!
But I had no earthly idea what to do with an AST. I could walk it recursively, printing out its content, and not much else. So I did what any good programmer does when he or she doesn’t know how to solve a problem: I tried to ignore it. It wouldn’t go away, of course, so I sought help from my good friend Bob. We chatted a bit about it, and he pointed me to an article about
another feature of ANTLR, tree walker grammars. I had already skimmed the article, but reread it for comprehension and found the answers for which I was looking.
Like input (lexer and parser) grammars, a tree grammar is used to describe and generate a class for processing some input. The difference is that tree grammars are used to generate code to walk an AST. The ANTLR implementation allows code to be specified directly in the tree grammar, which provides a way to hook into the AST walk and do interesting things.
That was two weeks ago, and now I have a tool that works. It’s not perfect, but it already translates the entire IB reference implementation without syntax errors. I have to tackle the problem of matching semantics between the two languages, but I think the more difficult problem has been solved. Most importantly, I have something that is repeatable, and is no more difficult to use than typing “make”. Take that, demons of late nights past!
I’ve written java2python with the idea that it should provide a high degree of customization to the generation process. It allows for multiple, cumulative configuration modules, which means you can have a configuration for an entire translation project, and also have configurations for individual modules.
Let me add a few more waffles before concluding. Yes, I know Python is not Java. Yes, I know that this tool doesn’t translate the meaning of the input source code. Yes, the tool does not produce idiomatic Python. And yes, I know the tool isn’t even close to perfect. But even with all of those problems, I know this is better than what I was doing.
As always, I’m interested in your feedback if you use my code, or in this case, even if you don’t use my code. Feel free to drop me a note with any comments you have. You can reach me at troy@gci.net.