28
$\begingroup$

Are there programs that can 'translate' source code between any two languages (assuming the translator has access to the requisite libraries)?

If there are, how do they work (techniques used, knowledge required, etc)? How would they feasibly be constructed?

If they aren't, what are the restrictions preventing their development? Is this an A.I complete problem (natural language translation is listed as one)?

EDIT Conversion is only expected, when the language has the same expression power, can solve the same kind of problems and the code to be converted can be expressed in the destination language. (E.g conversion from a shell script to MATLAB isn't expected).

$\endgroup$
14
  • 6
    $\begingroup$ en.wikipedia.org/wiki/Source-to-source_compiler $\endgroup$
    – Bergi
    Commented Dec 12, 2016 at 19:58
  • 15
    $\begingroup$ What do you mean by "any two languages"? There certainly are programs that can translate from one language to another. They are called "compilers". That's literally the definition of a compiler: a program that translates programs from one language to another. But "any two languages"? I don't think that's possible. The translator has to know both the source and the target language, and it usually is specific to a particular pair of languages. $\endgroup$ Commented Dec 12, 2016 at 21:49
  • $\begingroup$ The program is provided the source and target languages. I'm thinking of writing a program in C++, translating it to Java, python, Perl, Ruby, Go, etc. There may be some restrictions(I don't expect it to convert your shell script to MATLAB for example). $\endgroup$ Commented Dec 13, 2016 at 0:23
  • 5
    $\begingroup$ Yes, they're called compilers, they work like compilers and they can be constructed like compilers. $\endgroup$ Commented Dec 13, 2016 at 6:30
  • 1
    $\begingroup$ If by "any two languages" you literally mean that the (finite) program should be able to read and understand an infinite number of input languages, the answer is trivially no. However, take a finite set of input languages and you can find a compiler for all those languages.. $\endgroup$
    – Bakuriu
    Commented Dec 13, 2016 at 15:26

6 Answers 6

57
$\begingroup$

TLDR; this is possible but not practical.

(assuming the translator has access to the requisite libraries)?

This ends up being the tricky bit, and is part of why things like this don't end up being used in practice.

  1. All compilers are translators. Translating from one language to another is definitely possible, and this is literally all a compiler is doing. The language that a compiler spits out as output is generally machine code or assembly, but this is just another language, and there are compilers (sometimes called transpilers or transcompilers) which translate between two languages. For example, there's a gamut of compile-to-Javascript languages like PureScript, Elm, ClojureScript, etc.

  2. Translating between any two Turing Complete languages is always possible. Ignoring things like library calls and FFI and other nasty practical bits that get in the way, that is. If a language is Turing Complete, then you have:

    • A translation that converts a Turing Machine to code in this language
    • A translation from this language into a Turing Machine

    So to translate from language A to language B, you convert the A code into a Turing Machine, then convert that machine into B code.

    Of course, in practice, the practical bits get in the way, and this also requires you having the translations accessible to you. They exist for basically every language, but that doesn't mean someone has taken the time to write them out.

  3. Doing this translation efficiently is hard. Different language prioritize different things. For example, if you translate from C to Python, you're probably going to have to end up simulating C's memory as a Python dictionary, so that you can do pointer arithmetic. There will be overhead associated with this, because you're now not accessing the bare metal memory instructions.

    Different languages have difference performance priorities, so something that one language optimizes (or rather, an implementation of one language optimizes) might be impossible to do quickly in another language. Translating a functional language with proper tail calls will have slowdown if you translate it into a language without proper tail calls.

  4. Doing this translation doesn't make the code readable. It's easy to get a piece of code in language B that behaves the same as the code from language A. It's hard to make it look like code a human would have written in B, for a number of reasons. A and B might have different abstraction tools, and the computer has no idea what makes code readable. This will be particularly true if you end up using the Turing Machine translation I described earlier.

    This raises the question: what's the point of such a translation? If all you get at the end us a block of slow, unreadable code, why not just compile it to machine code and use some kind of FFI or inter-process communication to link the pieces together?

    There are some exceptions to this. Sometimes you need things in a certain language (like JavaScript). Sometimes language are similar, and a sensible translation is easy. Sometimes a language is not meant to be run, but to have its code extracted into another language (such as Coq).

    But in general, it's not a very practical thing.

$\endgroup$
6
  • 5
    $\begingroup$ One example for Point 4 is asm.js. Today, it is possible to make it sorta readable, using Javascript Source Maps and the Element Inspector, but no one will want to do that... $\endgroup$ Commented Dec 13, 2016 at 0:44
  • 1
    $\begingroup$ Modelica is another example of a language designed for compilation into another language (in this case C). $\endgroup$ Commented Dec 13, 2016 at 12:00
  • $\begingroup$ Webassembly translating from C++ to javascript. $\endgroup$
    – Surt
    Commented Dec 13, 2016 at 22:50
  • $\begingroup$ There are numerous examples of transpilers from X to Y, but that's different from a universal anything to anything compiler. There are obviously cases where transpiling makes sense. $\endgroup$ Commented Dec 13, 2016 at 23:01
  • $\begingroup$ One important exception missing IMO: compiling to C. The reason is that many uncommon systems have an existing C compiler, which generally can emit quite reasonable machine code. Hence, by compiling a language to C, you don't need to have backends for those rare architectures. $\endgroup$
    – MSalters
    Commented Dec 14, 2016 at 0:00
2
$\begingroup$

There are such programs. For example Lisp-to-Fortran translators, that were widely used at their time. Sole Lisp compilers don't compile Lisp directly but generate C code instead that then is compiled by a regular C compiler. Another example would be Vala that isn't compiled directly but first translated to C++ before the C++ code is compiled. Qt is written in MOC, a language that is translated to C++ in order to compile it (but as MOC is just C++ with a few additional commands one can argue if it really is to be named a "new language") - and before there were C++ compilers there were C++-to-C-translators. And some projects were written in Pascal and then translated to C. Also clang and Java tend to be kind of such a thing as they translate C++ and Java code to some intermediate language that then can be processed further.

What you cannot expect of the output of a language translator is that the result makes any sense for an human reader: The program's task is to write code that results in a program doing the same as the original code (which in my experience might or might not work, depending of which features of the language and which external libraries you were using). But as it doesn't know the purpose this task is done for the rest of the program's meaning might be lost to a big extent.

$\endgroup$
0
$\begingroup$

Not a direct answer, but in there is a tool call ILSpy, which was written for the .Net Framework, and allows you to decompile a .Net assembly into C# or VB.Net.

If your unfamiliar with the nature of .Net, you can write .Net code in many languages but primarily C# or VB.Net. When the compiler compiles the application, it translates the code to an "Intermediate Language" (or IL for short) code. This code is then compiled to .Net binaries.

Since .Net applications are are binaries compiled from the IL code, ILSpy can take the .Net application, reverse it back to IL code and, subsequently, take it one step further and reverse it back to C# or VB.Net.

Using this tool, all you have to do is compile an application, and then you can browse the compiled files as IL, C# or VB.Net code. To be clear, it doesn't matter what language the code was initially written in. So long as the binary is a .Net assembly, it can reverse-engineer the compiled files and output the content as any of these three languages.

I know this isn't exactly a compiler, but it is a tool that offers an end-result similar to what you are looking for and, in fact, I've used this to "translate" VB.Net projects into something a little more familiar to me-- C#.

$\endgroup$
0
$\begingroup$

For your use-case (based on comments), it sounds like SWIG might be useful.

SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. SWIG is used with different types of target languages including common scripting languages such as Javascript, Perl, PHP, Python, Tcl and Ruby. The list of supported languages also includes non-scripting languages such as C#, Common Lisp (CLISP, Allegro CL, CFFI, UFFI), D, Go language, Java including Android, Lua, Modula-3, OCAML, Octave, Scilab and R. Also several interpreted and compiled Scheme implementations (Guile, MzScheme/Racket, Chicken) are supported.

$\endgroup$
0
$\begingroup$

I recall the venerable f2c, which does source-to-source translation from Fortran 77 to C.

It was (sometimes is...) used mainly to translate numerical code from decades ago without having to integrate a fortran compiler to your toolchain.

$\endgroup$
0
$\begingroup$

The piece of theory that tells you that such programs exist, in principle, is called admissible numberings. We can prove that there are computable compilers between any two such numberings, and every Turing-complete formalism (or programming language) is, in essence, one.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.