48

According to my experience, Wikipedia and prior answers, a scripting language is vague category of languages which are high-level (no manual memory management) and interpreted. Popular examples are Python, Ruby, Perl and Tcl.

Some scripting languages can be "embedded". For example:

  • Lua is frequently embedded in video game applications.
  • TCL is embedded in the fossil version control system

It is sometimes said that Lua is more easily embedded than Python or that JavaScript is difficult to embed, because the size of the interpreter. Similarly, Wren is "intended for embedding in applications".

What factors make a language embeddable? Is it solely the size and speed of the base interpreter or do other factors come into play?

2
  • 1
    The statement about Typescript (which actually says that this is due to Javascript) is quite misleading. Javascript is embedded in many applications, including all modern web browsers, and you do not need to include "web browser internals" to embed JS, node.js is enough unless you need some features from those browsers (including the DOM, an UI, HTML, CSS, media, etc.), and you can even embed the v8 engine itself (it's the actual JS engine in both Chrome/Chromium and Node.js).
    – jcaron
    Commented Jan 17, 2020 at 17:50
  • @jcaron whoops, edited the question
    – Seanny123
    Commented May 3, 2021 at 17:15

5 Answers 5

46

Embedding a language (I'll avoid characterizing it as "scripting") means that the following has been done:

  • The interpreter and runtime are running in the same process as the host application
  • Enough of the standard types and the standard library are also available from within that runtime
  • Most times, the application has its own library available to the host application

The first bullet is literally the definition of embedding. The main reason to embed a language into an application is to provide an easy means of extending the functionality of the application. Reasons include:

  • Creating macros to perform complex steps repeatably as fast as possible (e.g. Photoshop, Gimp)
  • Programming game elements by less technical people (many games have some level of embedded language to create mods, characters, etc.)

So the big question is then, what factors simplify embedding?

  • Complexity of the interpreter and/or runtime environment (simpler is easier)
  • Size of the standard library (smaller is easier)
  • Layers of indirection (fewer are better, Typescript recompiles down to JavaScript like C++ used to recompile down to C, there is no native Typescript environment)
  • Compatibility of underlying architecture (several languages are implemented on the Java runtime or .Net runtime, which makes it easier to embed due to the similarity of the underlying environment)

Bottom line is that it is possible to embed a wide range of languages into another application. In some cases, the hard work has already been done for you and you simply need to include the language into your app. For example, IronPython is built on .Net and Jython is built on Java allowing you to easily embed Python into applications built on those platforms.

As far as how robust or complete the implementation is, you will get mixed results. Some projects are more mature than others. Some languages are just easier to implement (there is a reason why LISP was one of the first embedded languages).

13
  • 6
    It may be worthwhile to mention that generally nothing about a language forces it to be compiled or interpreted - the language is the semantics/syntax/behavior, and anything which has a compiled implementation could have an interpreted implementation and vice-versa. (I.e. it should be possible to implement a C++ interpreter and embed C++ as a ""scripting language"" into your game or whatnot; however un-useful this is)
    – Delioth
    Commented Jan 16, 2020 at 21:23
  • 2
    This answer seems to be ignoring that embedded languages are linked to embedded hardware/systems.
    – Mast
    Commented Jan 17, 2020 at 8:24
  • 9
    One risk with embedding if you don't write your own interpreter or very explicitly limit what libraries are available, the embedded scripting language becomes a potential exploit vector. This isn't just a problem with games, it's why browsers nowadays are heavily sandboxed (with all the memory usage issues that entails). Commented Jan 17, 2020 at 9:07
  • 34
    @Mast: if I've understood correctly, "embedded" in this context is being used in a completely unrelated sense to that used in embedded systems. (This is not the first time that confusion has arisen here over the word!) Commented Jan 17, 2020 at 10:23
  • 5
    Re: running in the same process. Considering that Chrome, Edge, Visual Studio, etc. split the work into several processes, "same process" can be relaxed a bit.
    – Pablo H
    Commented Jan 17, 2020 at 12:28
10

The main factor is typically the API that's used by host applications to access the language libraries. Languages like Lua are designed to be easily 'connected to' from host applications. The language may be available in library form, the API easily callable from other languages (generally a plain C API). The API usually provides functions to run a script, setting up callbacks to respond to certain situations (like undefined variables), and accessing the host application's resources/gui. API's that let you do that fairly easily are more "embeddable" than those that don't.

10

In theory any language can be embedded. If there are no constraints on the solution, it is actually the case. It's natural consequence of Turing completeness i.e. you can always build an emulator.

What I think you are asking is "what makes a language practical for this purpose?" I think one of the main things that makes a language a good choice for this is one that's defined in terms of behaviors as opposed to implementations. If the language in question has very specific rules around how int values need to be represented in memory, for example, this creates challenges when running on top of another application whose ideas about integers are not exactly the same.

A good example of this is Python which is defined in terms of how it behaves and has little to say about how it is implemented. This means you can write a fully functional Python interpreter in that acts as a facade to Java (or C#, etc.) types. This means not only can you run the Python scripts, you can use it to interact with parts of the application written in Java.

Another factor is simplicity in the semantics of the language. The more complex the language, the more difficult it tends to be to build an interpreter for that language for obvious reasons.

7
  • Embedding implies that code in that language can interact with the host program, e.g. to script it. To safely embed an "unsafe" language like C or assembly (where code can create arbitrary pointers) your sandbox would have to be very isolated so it would be non-trivial to even provide an API that C could use without making it possible for C to do stuff you don't want to let it do. (e.g. a game where you don't want to let users cheat, just automate stuff they were already allowed to do.) Commented Jan 18, 2020 at 6:31
  • But I guess you could emulate an API/ABI like modern kernels that defend themselves from user-space, so as you say Turing completeness is sufficient if you don't care about practicality. Commented Jan 18, 2020 at 6:32
  • @PeterCordes: There is an example of a spacecraft (I forgot which one, it might have been one of the Mars rovers) that was saved by executing some code in a C interpreter that was embedded into the OS. Commented Jan 18, 2020 at 11:24
  • 2
    @JörgWMittag: The case I was inventing was one where you wanted to set strict limits on what "guest" code could do, e.g. because "untrusted" users could supply code. That wouldn't be the case for a spacecraft, where the intended use-case for the interpreter is probably to let uploaded code patch the OS or do basically any unforseen rescue activity. Commented Jan 18, 2020 at 20:13
  • 2
    Mostly agree with this answer, but Python is actually surprisingly difficult to embed. The GIL, the metaclasses, the difficulty of proper sandboxing, etc. Something like Lua would have been a much better example. Commented Jan 18, 2020 at 21:39
6

There are a couple factors:

  • whether the language has support for embedding API. Some scripting languages like Python and Lua has officially supported APIs specifically designed to embed those languages into a host application. This includes specifying how the language interacts with foreign function interface, foreign object handles, foreign classes, etc and specifying an API for those foreign languages to call into and work with objects in the language. Languages that are designed for embedding can make these foreign objects look and behave just like regular classes and objects without complicated wrappers classes. Protocol based languages like Python tend to be very good at this.

  • Languages implementations that are designed to embedding are often designed to share a thread with the main application. This is because UI elements can usually only be updated from the UI thread, so interpreters for embedded languages need be able to run, yield to the main thread, call UI updates, and resume execution without taking over the main thread completely. Language implementations that aren't designed for embedding might require the interpreter to run in separate thread or processes and communicate with the application's UI thread only through an RPC/message queue mechanism, which comes at a significant performance cost

  • Memory safety. Memory safe languages and languages without direct memory access are easier to embed because code written in the scripting language cannot crash the main application due to direct memory access.

  • How big the runtime support for those languages are. Languages with big standard libraries tend to be at a disadvantage for embedding, because it means bloating the application size. On the other hand, there are many applications where the huge standard library is the reason why a specific scripting language is chosen, so that scripters can actually access functionalities that the main application itself are unwilling to actually provide directly.

  • Additional challenges like Typescript can only be embedded by embedding a JavaScript interpreter. So there is the added challenge of embedding JavaScript interpreter even if you actually you only care about Typescript.

3

Languages that are designed to be embeddable try to provide features to ease access for the host application. There are two layers to this, the actual language syntax and semantics and the runtime implementation of the language you try to embed.

Take for example both Python and Tcl, which both are labeled as embeddable. From my experience, Python is much harder to embed than Tcl (did both, in multiple contexts).

Why is that so?

Python is opinionated, the world is assumed to look like a POSIX setup. The filesystem APIs, console APIs, network APIs all do not abstract much, mostly are direct wrappers around POSIX C-APIs. Tcl isn't that close to the hardware, it tries to abstract most APIs and does not provide a lot of low level APIs to the script layers. So if you try to embed Python, you must provide a POSIX like file abstraction. For Tcl you do not need to do anything, if you do not care about files. Less work, easier to embed.

(C)Python is basically single threaded with a global lock. Tcl has no global lock. So if your host application is multi threaded and you embed Python for critical stuff you just added a global lock to your application. So embedding Python in multi-threaded programs is much more painful.

Pythons module system by default maps to a filesystem. Your module names and filesystem names are linked. So your language is limited by the filesystem you provide and breaks suddenly when you port it to a filesystem that is case insensitive. Tcl did not link its module system to any filesystem layout, so the filesystem doesn't change semantics of your language.

Python assumes the world is blocking and synchronous (like POSIX), only slowly adopting async APIs. Tcl tries harder to do nonblocking APIs and callbacks. As it is much easier to go simulate blocking APIs on top of async than the other way round, you usually have less work to do.

Tcl is rather minimal. You can strip out all stuff you do not need, easily. Like get rid of any filesystem APIs. Or process control. Or regular expressions. Python has a ton of stuff in its builtin namespace and is nearly impossible to lockdown and secure against untrusted code due to write access to the bytecode. It is not even considered a bug that pure Python code can crash the process (e.g. Python bytecode sometimes just uses a raw pointer and writes to it). So it is harder to embed if you try to run untrusted user code.

Pythons standard library often assumes it is in control of the world. Tcls does not. For example Python often blows up when you encounter out of memory issues and kills your process. Some Python API calls might even dump critical errors to stderr (which might not exist in an embedding situation, e.g. Windows service contexts and kill your application), while Tcl usually tries very hard to give control back to the application without crashing or exiting. So being a good guest is important.

So things that make a language easy to embed is being like a good guest:

  • Do not assume too much about your hosting environment. You have no filesystem. You have no stdio. You have no environment variables. Check and minimize your assumptions about the world.
  • Clean up after yourself. Be able to reinitialize yourself multiple times.
  • Do not get in the way of threads in your host.
  • Allow it to customize the available featureset to the problem domain.
  • Be reasonable safe even if running untrusted input.
9
  • @PeterCordes: Embedding languages like Tcl/Lua and languages like Python have two very different philosophies, I often call them thin and fat embedding. The reason you want to embed a thin language is to provide a minimal Turing complete, interpreted language to add custom logic the main application; the reason you want to embed fat is to provide user access to the underlying system and integration with things outside the main application. For the former use case, the language should only access what the main app provides, but for the latter, sandboxing is counter to the point.
    – Lie Ryan
    Commented Jan 19, 2020 at 1:21
  • "Python ... The filesystem APIs, console APIs, network APIs all do not abstract much ... direct wrappers around POSIX C-APIs. Tcl ... tries to abstract most APIs" This is a bit of a weird statement. POSIX is already an abstraction of the underlying system, why would you want to define another layer of abstraction? If you want to write portable code, generally you will have to write POSIX-compliant code. This is true, even if one of the platform you're targeting is Windows (which also supports POSIX APIs).
    – Lie Ryan
    Commented Jan 19, 2020 at 2:00
  • @LieRyan: Its abstractions all the way down, of course. Portable isn't necessarily a goal for embedding. So trying to hard to follow POSIX ideas can be a burden instead of a good idea. (just look at the horrible mess POSIX AIO is compared to Windows IO CompletionPorts, or the insane filename encoding for POSIX file APIs). And this gets in the way for embedding. E.g. look at how complex the Python embedding into lldb is (due to threading & console handling enforced by Python), and how much easier it would have been with Lua or Tcl.
    – schlenk
    Commented Jan 19, 2020 at 2:12
  • Python do add syntax sugar around many POSIX calls, to make the API look more Pythonic, not to change the semantics of the API. This is a good thing because most people who are already familiar with how POSIX abstracts the native system calls wouldn't have to learn yet another layer of abstraction.
    – Lie Ryan
    Commented Jan 19, 2020 at 2:13
  • It is a good thing on a POSIX platform. On non POSIX platforms (Windows most often) it just gets in the way or wastes time with trying to emulate foreign behaviour. Python has that in the core language and it forces an embedding host application to also handle the POSIX expectations, which it would not need to do otherwise. Just look at the posixmodule.c in Python, and count how many lines are code to work around Windows issues due to the abstractions not matching the platform. Thats all useless code when embedding, but its part of the core language.
    – schlenk
    Commented Jan 19, 2020 at 2:31

Not the answer you're looking for? Browse other questions tagged or ask your own question.