46
\$\begingroup\$

Creating a golfing language can be hard. Let's help budding golfing language creators out and provide some helpful tips on how to create one.

I'm looking for tips on:

  1. The design process
  2. Implementation
  3. Community engagement

One tip per answer would be appreciated, and tips like "don't create a new language, use an existing one" probably aren't very helpful.

You don't need to have created a golfing language to answer this question. Feel free to answer from personal experiences or from general observations.

See also Tips for launching a language for when you are done creating your language.

\$\endgroup\$
3
  • 2
    \$\begingroup\$ The community engagement process is a dupe of this. \$\endgroup\$
    – user85052
    Commented Jan 24, 2020 at 10:45
  • \$\begingroup\$ @a'_', the launching part is, but I also am interested in seeing how to keep a language going after being launched \$\endgroup\$
    – lyxal
    Commented Jan 24, 2020 at 11:19
  • 5
    \$\begingroup\$ @a'_' Oh wow, that first answer of DJMcMayhem♦ is almost exactly what I wrote in my CW answer below regarding point 3, except much more elaborated. Never seen that meta-answer before, so it's funny how similar it is. :D \$\endgroup\$ Commented Jan 24, 2020 at 12:56

15 Answers 15

36
+500
\$\begingroup\$

Here are some suggestions. Sorry that this partially overlaps with other answers, which have been posted as I was writing this.

Design process

  1. One possibility (by all means not the only one) to decide which features (functions, data types, etc.) your language L should have is to base it on another language B that you have been using for long. That way you already have a good idea which functions of B are used most often, and therefore should be included in your language L, and you can assign shorter names in L to functions that are commonly used in B.

Examples:

  • L = MATL: B = MATLAB/Octave.
  • L = Pyth: B = Python.
  • L = Japt: B = JavaScript.
  • L = Brachylog: B = Prolog.
  • L = Husk: B = Haskell.
  • L = ShortC: B = C.
  • L = V: B = Vim.
  1. Decide which "paradigm" your language will use. Some examples are
  • Stack-based (Cjam, MATL, 05AB1E);
  • Tacit (Jelly);
  • Prefix notation (Pyth);
  • Infix notation (Pip, Japt);
  • Fixed arity (Pyth);
  • Variable arity (CJam, MATL, Japt).

These decisions are quite independent from item 1. For example, MATL's function f has the same functionality as MATLAB's find, but is used differently in that it pops its inputs from the stack and pushes its outputs onto it, whereas MATLAB uses normal function arguments which can be stored in variables.

  1. Incorporate functions from other golfing languages that you find useful. Don't let item 1 (if applicable) limit language L's definition.

  2. Be prepared to add functions in the future. As you (or others) use the language, you will find things it would be nice to add. So reserve some of the "namespace" for future expansion. For example, don't use up all your single-character names at once.

Implementation

  1. Write the compiler (or interpreter) for your language L in a language C that you know well (often C = B).
  2. The compiler is most often actually a transpiler into another language T (which can be the same as B or C). Language T should have a free implementation. That way it is viable to have an online compiler for your language. The compiler will be a program in C that takes source code in L to produce transpiled code in T, and then calls T's compiler/interpreter to run the transpiled code.

A natural choice is C = T. Examples:

  • L = Japt: B = C = T = JavaScript.
  • L = Jelly: C = T = Python (and Jelly is inspired by J; but perhaps not so much as to claim that B = J).
  • L = MATL: B = C = T = MATLAB/Octave.

Community engagement

  1. Host your compiler/interpreter in a public repository such as GitHub, so people can easily download it, create bug reports, suggest new features or even contribute with code.
  2. Write good documentation. That helps users understand your language better. Besides, I found that task more rewarding than I expected. I recommend writing the specification while you are designing the language, not at the end. That way consistency between language behaviour and its specification is ensured.
  3. Ideally the documentation should include some quick reference (table or summary), so experienced users don't have to go to the full documentation simply because they have forgotten the name or the syntax of a function they know.
  4. Create an esolangs page with basic information about your language.
  5. Create a chat room where people can ask and discuss about the language. Visit it often.
  6. Answer questions in your language, with explanations about how the code works. You'll want to do that anyway (it's your language, you will find it fun to use), but this also helps get people curious about your language, and shows some of the language's properties, which might get people interested.
  7. If you chose to base your language L on a language B (see item 1) that is general-purpose and well known, users of B will find it easy to switch to L, which will allow them to provide short answers (in L) with minimal effort (coming from B).
\$\endgroup\$
5
  • 4
    \$\begingroup\$ @Λ̸̸ I think you can use that notation, yes. But an interpreter is required for it to be considered a language in CCGC \$\endgroup\$
    – Luis Mendo
    Commented May 15, 2020 at 11:53
  • 3
    \$\begingroup\$ This has by far the most useful information, so I've marked it as accepted. \$\endgroup\$
    – lyxal
    Commented May 15, 2020 at 12:00
  • \$\begingroup\$ I like how everything is written in C ;) (even if C represents a different language) \$\endgroup\$
    – Wezl
    Commented May 15, 2020 at 16:44
  • 3
    \$\begingroup\$ @Wezl Write in C, write in C... \$\endgroup\$
    – Luis Mendo
    Commented May 15, 2020 at 17:37
  • \$\begingroup\$ I don't think you should just steal a builtin from another golfing language directly. You should consider the existing structure of builtins in your language, and, if applicable, do it in operations that are more general that already exist in your language. \$\endgroup\$
    – asdf
    Commented Oct 28, 2022 at 13:48
20
\$\begingroup\$

Here's a couple of random hints.

Choose your built-in commands and values carefully

Especially when your language is new, it's tempting to add lots of functionality that you think may be useful at some point. If that turns out to be false, you'll have a mostly useless command taking up a valuable spot in your language's namespace. Rebinding the symbol breaks backward compatibility and may invalidate existing answers on this site. That's not the end of the world since we're talking about weird recreational languages, but if it happens frequently, your users will have a hard time keeping up with the changes.

As a hypothetical example, testing whether a given number is prime is probably a good choice for a one-symbol function, since primes come up a lot in golf challenges. Testing whether it's a square or a power of 2 can be useful, but not as often. Testing whether it's a perfect number most likely doesn't warrant a one-symbol command. You'll get a sense of what generally useful things the language is missing by using it.

Minimize the number of incorrect programs

By an incorrect program I mean one that produces any kind of error: syntax error, type mismatch, division by zero etc. Every error is an opportunity for your language to do something useful instead. Syntax error due to a missing parenthesis? Allow it to be implicit in some way (like at the beginning or end of line). Division by zero throws an error? Consider having it return infinity or NaN instead, or allow errors to be caught and used for flow control.

A type mismatch is usually a sign of a function that's not generic enough. Ideally, every function should accept every type of input and do something useful with it, unless there's a good reason not to. There are several common tactics for this:

  • Overloading. For example, + can be addition on numbers and concatenation on lists. This seems to be the most popular option.
  • Vectorization. If + is given lists, it can perform addition element-wise. Jelly uses vectorization extensively.
  • Coercion. If + is given non-numbers, it can convert them into numbers implicitly. For example, 05ab1e allows numbers and strings to be used mostly interchangeably.
  • Keep the error if you have a good use for it. In Brachylog, arithmetic operations only work on numeric types. Since it's a logic programming language based on Prolog, they can be used backwards or to generate input-output pairs, and in those cases it's essential that the input can't suddenly be a list or string.
\$\endgroup\$
0
11
\$\begingroup\$

I never created a language myself, but I can partially answer number 3 (I will make this a community wiki, so feel free to add more).

Some tips after the core part of your language is done:

  • Have a GitHub/GitLab project where you can link to, including a wiki-page/README of your language in general (what it's made for, how to compile and run it locally, etc.), and a short description of each of its builtins.
  • Ask @Dennis in talk.tryitonline.net chat to add your language to the online compiler TIO.
  • Add your language to the Showcase of Languages to expose it to the community.
  • Possibly create a chat room so people have a place to report bugs, discuss improvements, or talk about golfing strategies in your language.
  • Create a page for your language, and perhaps post a few tips as answers to get it started.
  • Start answering challenges here on CGCC in your language, so people seeing it for the first time might become interested. Answering well-known challenges like Hello World, "Never Gonna Give You Up"; Default Quine; Prime checker; Fizz-Buzz sequence; Odd or Even; etc. are all great challenge to introduce your new language, and let people who see it get a general feel for what's possible in your language.
    Apart from those and some other popular challenges, just answering any new challenge is also a great way to expose your language, since people tend to look at new challenges more often than existing ones.
\$\endgroup\$
4
  • \$\begingroup\$ You didn't have to make this CW. I realised that I should have added a part saying You don't necessarily have to have created a golfing language to answer this question. \$\endgroup\$
    – lyxal
    Commented Jan 24, 2020 at 9:44
  • 1
    \$\begingroup\$ @Lyxal I made it a CW so everyone can edit it freely and add more points for no. 3. I don't think each of these tips above would need a separated tip, nor any similar ones I might have forgotten. I do am very curious about the no. 1 and 2 of your question though. Some things that come to mind are: what is the underlying compiler/language used to compile/run your language in, and why did you choose that one? What did you do to ensure a good performance? Etc. I'm curious what kind of answers we will get. I'll probably tag some people in chat who I know created a language (after a while). :) \$\endgroup\$ Commented Jan 24, 2020 at 9:47
  • 1
    \$\begingroup\$ I'm thinking of trying to ask Mego and DjMcMayhem fot tips on discord \$\endgroup\$
    – lyxal
    Commented Jan 24, 2020 at 9:50
  • 2
    \$\begingroup\$ Sounds like a good idea. I thought about tagging Adnan, although he isn't very active anymore unfortunately. He started with 05AB1E once (written in Python); then derived 2sable for more flexible implicit input; he later on added this implicit input to 05AB1E as well, making 2sable rather obsolete; and mid-2018 he 'released' a completely rewritten 05AB1E, with new and changed builtins and infinite lists support (written in Elixir) (last two are available on TIO). So lots of experience and history to contribute here. \$\endgroup\$ Commented Jan 24, 2020 at 9:57
11
\$\begingroup\$

It strikes me that the most important design decision is what the underlying paradigm of the golfing language is.

Here are some possible types of language:

  • Stack based
  • Array based
  • Object based
  • Functional
  • Imperative
  • Declarative

Indeed you might even have a mixture of these, or something else entirely like a two dimensional language, automata, regex, machine language or a Turing machine and so on.

This is important because it will greatly affect the syntax of the language and in some ways how concisely code can be written.


Edit: Follow Up

I thought that I would show how the design and implementation phases of creating a language actually work in practice. I realise that's it's more than one tip in this answer and it's lengthy I hope that's ok.

Design Considerations

I wanted the following things out of my new language:

  • Quick to implement (or prototype)
  • Flexible and concise syntax, perhaps with a view to Golfing
  • Can do useful things, not just a toy language

I plumped for some version of Forth, because it would satisfy the first two critera. It would also have to be interpreted at least for prototyping, compiling is out because it would take too much development time.

In terms of it doing useful things, it absolutely had to have the following traits:

  • Ability to call functions
  • Arithmetic and logic operations
  • Manipulation of numbers and strings
  • Loops and conditional structures
  • Stack manipulation

And because it's Forth I thought the following would be kind of cool:

  • Multiple data stacks
  • Ability to 'pass in' function literals to be executed inside functions (kind of functional style)

Implementation Details

Firstly, I would leverage my knowledge and use a language I know well for the interpreter: PHP - at least for the prototyping stage.

Next I needed the interpreter to be able to recognise (tokenise) these four things:

  • String Literals: 'hello world'
  • Numeric Literals: 3.1415
  • Labels (for function names) drop
  • symbols (representing atomic actions): #@$^

The simplest solution for tokensing the program is to use regex. Also because I wanted concise syntax labels would be strictly alphabetic. I would also need some sort of separator to remove ambuiguity in tokenising, I left a space and comma free for that.

So with all that in mind I could create a fat-free Forth syntax. For example:

1.5,2.7,3,4add add add;

would push four numbers on to the stack, and call a function add three times and then return ;.

Once tokenised the interpreter can then consume tokens one by one and act accordingly.

One consequence of using regex, is that it's unable to handle a nested syntax. So I would need to manage nested loops in some way. The way to do this is to look for the start of loops (token) and find the corresponding end and record somehow in the start token where the end is and vice-versa. That way the interpreter could jump around loops and conditionals very easily. A stack would be needed to know which token closes which opening token.

Functions I would just manage as simply labels, or named positions in the token stream. When a function is executed, the position is looked up and interpretation continues from there. I would return from the function using a ; token. This would also require a function call stack to handle nested function calls - and return back to the calling position in the token stream.

For the fancy stuff, such as passing in function literals, the idea would be to push a string literal containing the code fragment on to the data stack e.g:

'2+;'apply;apply:`;

So to break the above down, '2+;' is a string literal containing the code fragment (push 2 and add to top item on data stack). A function called apply is then called. The function definition begins apply: and a backtick actually pops the string literal and executes it in its own brand new context. Once the fragment has been executed and returns, the function then continues.

The interpreter would handle this by separating out the parsing and tokenisation from the actual execution. That way the literal code fragment can be parsed when pulled off the stack, and that new context passed into the Executing function, using PHP's scoping to handle the new context's scope. The only fly in the ointment is that to be able to call a function from the code fragment, it would need to be able to access the parent's context. For example:

'dotproduct;'walkarray;walkarray:...`...;dotproduct:...;

Next for multiple data stacks that would be easy. I would have just one main data stack where all the action happens and provide atomic actions that can push and pull to other named data stacks. That should greatly simplify operations with vectors or arrays of numbers.

I also wanted conditional loops with the condition either at the beginning or end or just infinite loops. So I chose [...] for an if condition, [...) and (...] for conditional loops and (...) for infinite loop.

Lastly, some features that are missing: general mathematical functions, extensive string handling, goto and breaking out of loops and conditions. Although it is possible to break out by returning in a function by using func:(...[;]...).

Anyway, here's the semi-golfed prototype, and hopefully semi readable, enjoy!

Try it online!

\$\endgroup\$
4
  • 1
    \$\begingroup\$ If you want a beginner-friendly golfing language, the stack-based paradigm will suit you most. (Stack-based languages are famously easy to write in.) Also it's the most common paradigm of golfing languages and the easiest paradigm to implement. \$\endgroup\$
    – user85052
    Commented Jan 24, 2020 at 10:58
  • 3
    \$\begingroup\$ Stack based is good because you don't have to have extra syntax to override precedence rules. But stack based require a fair amount of pure stack manipulation and this would add overhead. \$\endgroup\$ Commented Jan 24, 2020 at 11:01
  • 3
    \$\begingroup\$ Stack-based is also the least original paradigm for golfing, and will therefore probably lead to less adoption by other people, unless it is truely groundbreaking compared to the plethora of other stack-based languages. \$\endgroup\$
    – Fatalize
    Commented Jan 24, 2020 at 13:11
  • \$\begingroup\$ @Fatalize Well, there aren't as many good stack-based golfing languages as you think. Most of them were terribly designed (e.g. Neim, Ohm, etc.) I think only 05AB1E and Vyxal stood the test of time. So, if you design your golflang carefully, it would be significantly better than a ton of stack based golflangs already. \$\endgroup\$
    – user117658
    Commented May 2, 2023 at 13:24
11
\$\begingroup\$

Remember to add implicit input/output

In my opinion, a golfing language (whether successful or not) should always have some kind of implicit input/output. This applies to Jelly, 05AB1E, and pretty much every competitive golfing language.

I'll take CJam as an example. CJam doesn't have an all-purpose implicit input; it has input-reading instructions like l and q, and then if the input isn't just a string, it's followed by a ~. That's how Pyth gained the upper hand than CJam.

If you are trying to make a competitive golfing language, try to avoid making your input type-dependent, otherwise you'll be needing type conversion every time you take input (which is pretty wasteful).

Types of implicit input

For my purpose I could only think of two types of implicit input: Taking the whole input as an argument (e.g. GolfScript & Pyth), and cycling the implicit inputs for the operators of the program (e.g. 05AB1E and Jelly). The former is a basic form of implicit input, but still allows you to win some challenges. For the latter, you need to think carefully about how this system works, otherwise it would not help programmers.

Implicit output

Implicit output is another very important design feature of a golfing language. For example, Element isn't very well-designed, as you need a backtick (output the whole stack) every time at the end of the program.

Currently there are only two types of implicit output: full implicit output (the thing that Jelly & GolfScript/CJam uses) and top implicit output (the one that Pyth and 05AB1E uses). Both of them are perfectly competive, however under my very unscientific testing, top implicit output languages seem to require extra joins at the end.

I never used Jelly/GolfScript/CJam, but when I compare 05AB1E's implicit top output and MathGolf's implicit full joined output, I personally prefer top output tbh. Many times I have to clean up the stack if I only want to output a single item or single list in MathGolf. Here an example answer. Although I certainly see your point, and in some cases it's indeed useful. Maybe I'm just too used to output top after starting with MathGolf, but I personally prefer just implicit top in most cases. – Kevin Cruijssen

\$\endgroup\$
2
  • 1
    \$\begingroup\$ That's how Pyth got shorter than CJam You may be over-simplifying a bit. Pyth programs can be shorter or longer than CJam programs for many different reasons \$\endgroup\$
    – Luis Mendo
    Commented Jan 24, 2020 at 12:47
  • 2
    \$\begingroup\$ I never used Jelly/GolfScript/CJam, but when I compare 05AB1E's implicit top output and MathGolf's implicit full joined output, I personally prefer top output tbh. Many times I have to clean up the stack if I only want to output a single item or single list in MathGolf. Here an example answer. Although I certainly see your point, and in some cases it's indeed useful. Maybe I'm just too used to output top after starting with MathGolf, but I personally prefer just implicit top in most cases. \$\endgroup\$ Commented Jan 24, 2020 at 13:10
9
\$\begingroup\$

Be a polyglot programmer

If you know 1 language, you can steal the good stuff from 1 language.

If you know 2 languages, you can steal the good stuff from 2 languages.

Likewise, if you know many languages, you can steal the good stuff from many languages.

Therefore, while you are designing a golfing language, it's beneficial to learn a lot of programming languages (which will help you create a good golfing language).

\$\endgroup\$
0
8
\$\begingroup\$

Useful integer constants

Of course your golfing language has syntax that can represent any integer constant, but some integers are more useful than others. You want the useful ones to be available as a single byte. A couple of obstacles to this goal:

  • With a traditional decimal format, only the numbers 0 through 9 can be represented in one byte. Some non-single-digit numbers, particularly 10 and -1, are very useful.
  • If you have to put multiple integer literals next to each other in your code, you'll probably need an extra byte for a separator (which doesn't carry any meaning and is therefore a Bad Thing™). I think this problem occurs more often in prefix languages (like Pyth and Charcoal), but I imagine stack-based languages would suffer from it too. In Pip, it crops up when defining a list of numbers.

There are a few strategies to address these issues. Jelly, for example, represents -1 as - and interprets leading 0s as separate numbers. You could even write your integer literals in duodecimal with a signed digit system, which would make -1 and 10 into single-digit numbers. But the most straightforward approach is to have dedicated one-byte builtins for the most useful integer constants. Even some of the single-digit integers can use such aliases, due to the second point above.

Some data

I did a survey of 150 of my Pip golf submissions to see which integer literals or builtins were used most often. Here are the numbers that appeared at least twice:

0    |||||||||||||||||||||||||||||||||||||||||||||||  47
1    |||||||||||||||||||||||||||||||||  33
2    ||||||||||||||||||||||||||||  28
10   ||||||||||||||  14
-1   ||||||||||||  12
3    ||||||||||||  12
8    |||||||  7
9    |||||||  7
4    |||||  5
5    |||||  5
6    ||||  4
26   ||||  4
100  ||||  4
7    |||  3
32   |||  3
64   |||  3
1000 |||  3
16   ||  2
50   ||  2
127  ||  2

This list fits pretty closely with what I would expect (though I didn't think 3 would be so common). It's possible the figures will skew a bit differently depending on the language (for example, 2 will be more common if you need it for operations like "convert to base 2," "modulo 2," etc., and less important if you have builtins for those operations); but I think the overall ranking will stay pretty consistent. So:

Suggestions

  • These numbers should be representable by some one-byte syntax: -1, 10
  • Consider adding one-byte aliases for these numbers if numeric literals are likely to occur next to each other: 0, 1, 2, and maybe 3, 8, 9
  • Consider adding shorter aliases for these numbers if you have codepage space to spare: 26, 100, 1000, and maybe 16, 32, 64, 127

Again, there are a few ways of doing aliases. You can simply have a constant that always evaluates to that number. Pip has variables preinitialized to useful values, which can be updated through the program; for example, i starts at 0, which makes ++i useful for counting things. Jelly has atoms that evaluate to the third, fourth, fifth (etc.) command-line arguments, but if that many arguments aren't provided, they evaluate to useful constants instead.

What about other types?

I've only run the numbers on integers, but here are some gut feelings about strings and lists that deserve golfy syntax:

  • Should have single-byte builtins: empty string, empty list, space, and probably newline
  • Should have one- or two-byte syntax: uppercase alphabet, lowercase alphabet, common regex patterns if your language has regex, and maybe some common lists like [0], [1], [0, 1], [1, 0], and [-1, 1]
\$\endgroup\$
1
  • 3
    \$\begingroup\$ I've generated a much larger list based on all JS submissions on the site: gist.github.com/Radvylf/1bebd9050ee2d530007bf67d2f325843 (note that it includes negative and floating point numbers) \$\endgroup\$ Commented Aug 18, 2021 at 16:26
6
\$\begingroup\$

Be patient while implementing your golfing language

This tip might be too obvious for some people, but I see the necessity for posting this tip, since I am surprised to see many golfing language implementors abandon their golfing language long before the language is complete.

If you want to make a good golfing language, you should be patient while you are implementing it.

It usually takes 3-4 years in order to make a good golfing language, adding 500 or so built-ins with lots of trial-and-error. During this period, you must be persistent and resilient, and continue implementing the language anyway. Nothing great happens overnight. You always need a bit of patience while you are implementing your golfing language.

Trust me, anyone can make a good golfing language - it just takes a lot of patience to do so. Besides, other CGCC users will always be excited in learning your golfing language, so don't be discouraged if you feel your language isn't that good yet! It's going to get better as you implement it, so continue implementing your language nonetheless. :P Besides, it's perfectly fine if the language doesn't turn out to be that good, at least you had a good time implementing the language.

Furthermore, after you complete this 3-4 year period, you may still continue maintaining/developing your golfing language, depending on how good you want to make your golfing language.

\$\endgroup\$
1
  • \$\begingroup\$ Not just patience is required. You still need to be a good programmer if you are to make a good golfing language. That way, you will find it easier to factor down the task into a small set of highly-generalized buitins. \$\endgroup\$
    – asdf
    Commented Oct 28, 2022 at 13:44
6
\$\begingroup\$

This relates to the implementation phase

Decide Upon How the Language Will Be Executed

The two most common options to implement your language are a) interpretation and b) transpilation.

Interpretation is where your language is executed step by step. In other words, you write all the processesing logic behind the language.

Transpilation is when you turn your language into another language. This is like translating it into another language.

Transpilation is much easier to implement than interpretation, but interpretation allows for more flexibility.

\$\endgroup\$
3
  • 10
    \$\begingroup\$ Can you give an example of interpretation allowing more flexibility? \$\endgroup\$
    – user85052
    Commented Jan 24, 2020 at 11:38
  • 2
    \$\begingroup\$ In what world is compilation easier to implement? It's almost universally more difficult \$\endgroup\$
    – mousetail
    Commented Aug 31, 2022 at 13:25
  • 1
    \$\begingroup\$ Ok it makes a bit more sense after the edit but I'd still say interpreting is generally easier, especially for stack based languages as most golfing langs are. \$\endgroup\$
    – mousetail
    Commented Aug 31, 2022 at 13:35
5
\$\begingroup\$

Making your language high-level (e.g. starting from Mathematical notation) is probably a good starting point.

Have you heard of APL (which is based on Matrices and its operations)? It's an early target for code golfers to golf in, and it's quite concise.

Mathematics is a formalized system to express stuff in a high-level way. So if you want your language to be easy to use (attracting code golfers) and concise, you should probably find an existing mathematical notation to start with. Also, it will (probably) help users to be engaged with your language (since it's high-level, easy to learn and easy to use).

Examples of golfing languages

For example, Husk is a langauge which incorporates a lot of Haskell elements, like lazy evaluation and function composition; Haskell is a kind of mathematical notation by itself that has evolved from lambda calculus. Husk is very concise and catches up/beats the concise golfing languages, using just a set of 256 instructions.

(Jelly is a language based off APL, and it's quite competitive; I guess it's fairly obvious here.)

Would making the language high-level always attract users?

A counter example of being high level attracting more users is probably Prolog. I don't know exactly why people don't use Prolog, but it might be because people are too used to the mainstream languages and refure to switch. And Prolog is quite concise, due to its functional nature.

The implementation process

Being high level usually means that you have to spend a lot more of your time implementing it, since the higher-level a language becomes, the higher the abstraction level goes. And that usually means a lot of code need to be written for the abstraction.

If you are lazy and don't feel like writing a lot of code just for making your golfing language, you don't have to. Just implement a lower-level language.

How do I design the language though?

You have a plethora of methods of parsing instructions. But the usual method is to parse it as infix/tacit, since Traditional Mathematical Notation is infix.

\$\endgroup\$
4
\$\begingroup\$

Make the syntax of your constants short

For strings, you can have implicit quotes or compressed strings. Or even, you can make 2-character compressed strings or directly index into your golfing language's built-in list of common English words.

Or built-ins for common strings (including the empty sting) may be feasible.

For integer arrays, you can create a packed array form (btw Stax has a form like this). For string arrays, you can introduce a special form (Like in Jelly) that introduces compressed strings next to each other.

For numbers, you can do something like 05AB1E's Ƶ, which provides a shorthand on a lot of small integers. Or like in @DLosc's tip, you can add built-ins for common numbers.

The above are merely suggestions.

Some Recommendations

  • Make sure that the syntactical shortening is appropriate for your language.
    • For example, the implicit trailing quote is somewhat useful in Jelly. However, in 05AB1E, it is only useful when you print a constant string at the end of execution. SOGL is an improvement of this. Since there are implicit preceding quotes, you can do a lot more things with one less byte.
  • Another thing about blocks. Most modern stack-based golfing languages already merge them with the higher-order commands that take a block or two as arguments, so this can also be seen as an example of shortening the syntax of your constants.
    • By the way, you can merge it before the block (like 05AB1E) or after the block (like Stax) and use implicit braces at the start or the end of the program. Which form do you think will be useful in more tasks? ;p
    • However, you can also attempt to shorten the block syntax. For example, MathGolf has shortened blocks from 1 all the way to 8. And you can also try to implicitly group your blocks (like in Husk; during the parsing process, functions are implicitly grouped into functions with correct types.)
\$\endgroup\$
1
  • \$\begingroup\$ Welcome to Code Golf, and nice tip! \$\endgroup\$ Commented Oct 20, 2022 at 13:02
4
\$\begingroup\$

Use the corpus

Credits go to @Lynn for creating this method.

Basically, if you have enough answers in your golfing language on SE, you can use a corpus to analyze your language and optimize its golfiness. Use a SEDE query, and analyze it with a script.

A couple things to note:

  • 1-grams reflect the general usefulness of the 1-byte commands in your language. The higher the frequency, the more useful it is. Basically, if a 1-byte builtin is rarely used, or never used here, then it's better to add it as a 2-byte builtin. A maximum frequency of 15 is a good metric for a 2-byte builtin.
    • If you leave enough room for your 1-byte builtins, then there would be more room to add more useful builtins in the future.
  • 2-grams are basically golfing opportunities, stuff that you can make into 1 byte. Again, if these occur frequently enough, like more than 15 occurences, for example, it would be nice to add as a 1-gram.
  • 3-grams and 4-grams reflect the parts where most people are wasting bytes on. So if a pattern occurs frequently enough, it would be great as a 2-byte builtin.

Tips on using the corpus

  • Investigate thoroughly. Understand the exact reason why an occurence is so frequent, when you are in doubt. Read the SE answers involving this occurence.
  • Consider using the 1-grams list as an insight to the 2-gram occurences.
  • The number 15 used here is just as an example. The actual number should be determined by yourself, since some languages are not used as often as other languages on SE.
  • The commands added should make sense to you, i.e. it doesn't look like a random combination that you won't use unless in specific circumstances.
\$\endgroup\$
4
\$\begingroup\$

Tips for overloading

  • Try not to use up every overloading space you have right away (Unless you're really good at overloading). You should be prepared for adding new overloads to your language.
  • Don't overload if vectorization makes sense. Vectoriation is a really common operation. For example, vectorization is the most common 1-byter in both Jelly and Vyxal, and these two are very different languages. This should show how prominent vectorization is in golfing languages.
    • For example, + should definitely be vectorized, because it makes sense to do so. If you overload it, you end up wasting bytes in codegolf due to an unnecessary use of a vectorization prefix.
  • Sometimes, it can be hard to track down your overloads, especially if you have a lot of them. So, it can be helpful to make a table of all of your overloads, to see which instructions could be overloaded, which operations are duplicated, etc.
    • Another trick when you're designing the language. You can first list every instruction you want to overload (without actually introducing overloads). Then, try to merge all the commands you have into fewer operations (in a way that makes sense to you).
  • When you don't know what things to overload in your language, generating a 1-gram corpus of a popular golfing language will lead you to some interesting ideas.
\$\endgroup\$
2
  • \$\begingroup\$ Well, I don't think vectorization is that useful except for addition, multiplication, or subtraction. Vectorizing stuff like equality comparison, for example, might warrant a separate builtin for 'exactly equal'. Besides, zipping two lists is not as common as transposing a list, so you may find vectorizing less commonly used builtins cumbersome anyway. \$\endgroup\$
    – user117658
    Commented May 2, 2023 at 2:32
  • \$\begingroup\$ However, vectorization has another definition, which is generating a list of base 10 digits or a 1-range for numbers (like in Jelly). In this case, vectorization would be quite useful, since 1-range and digits are very useful operations. \$\endgroup\$
    – user117658
    Commented May 2, 2023 at 12:59
3
\$\begingroup\$

Length of the builtins: 1 byte vs. 2 bytes

Here, I will attempt to provide some concrete suggestions for @Zgarb's answer upon deciding whether a builtin should be made as 1 byte or 2 bytes.

Maintaining a fairly large 1-byte space is very important. If you run out of 1-byte opcode spaces, you might find it difficult to add new golfy syntax forms, and 1-byte builtins for frequent 2-byters to your language.

Here are my obsevations on this topic:

  • If it is a fundemental built-in that is commonly associated with a basic datatype in your language, a 1-byte builtin would be helpful.
    • For example, a built-in that maps an input list by a block should definitely be a 1-byte builtin.
    • Similarly, common mathematical operations, like modulus and addition, should be 1-byte as well.
    • Same thing with list concatenation.
  • If you find yourself unable to solve some challenges without this builtin, or you think it is cumbersome without it, consider adding it as a 2-byte builtin. Or a 1 byte builtin if you use it frequently enough.
  • If this is something you would not normally do given the datatypes to your builtin, it should defintely make it a 2-byte builtin. You can always make it a 1-byte builtin, if it is used frequently enough.
    • For example, converting the input from a unix timestamp to a formatted string probably deserves a 2-byte builtin.

Ideas

  • If some substring comes up often in a lot of answers, consider first making it a 2-byte builtin.
    • If it is already 2 bytes, make it 1 byte. The divmod builtin in Jelly is a good example. It combines a pretty common occurence of taking the quotient and the remainder into a single builtin.
  • If your language uses an ASCII codepage, then you need to be even more careful on builtins. Consider heavy overloading on one-byte builtins, to pack the largest number of fundemental bultins.
  • A fairly effective method, as mentioned by @mousetail below, is analyzing the frequencies of the builtins of an existing golfing language, and then decide upon the length of the builtin based on its frequency. This would ensure that your built-ins are optimally packed based on the frequencies.
\$\endgroup\$
1
  • \$\begingroup\$ The length of every built in needs to be equal to log8(frequency) if you use full bytes. This can be proved to be optimally efficient. This means sometimes even longer built ins may be worthwhile. \$\endgroup\$
    – mousetail
    Commented Oct 28, 2022 at 13:40
-1
\$\begingroup\$

Code it using another programming language you already know very well

You can then create a script in that language to run the programme made using the golfing language you invented.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.