8
\$\begingroup\$

There is a follow up question here.

I've undertaken the project of creating my own programming language, Linny. It's a very, very simple language, with only variable creation, variable changing, and outputting to the console, but I've very proud of it. It's an interpreted language, with the interpreter being written in Python 3. The interpreter is really a hybrid of a compiler/interpreter. I can't really say where I got the idea for the syntax, I just meshed a bunch of ideas from a wide range of languages and that's what I got. I want as much criticism and nitpick as you can find. From performance to readability to bugs, anything.

How it works

You write a program in Linny, with the file extension .linny. You set the path to the file in the source code, and you run it. You can also uncomment the bottom part in the main guard, comment out the for line in lines: interpret(line) part, and you'll be able to input line by line commands like Python.

Syntax (sample file, script.linny)

string text = "Hello" ; // semicolon MUST be one space away from ending
text = "Hello_There!" ;

out text ; // "out" outputs the variable to the screen
type text ; // "type" returns the type of variable (integer, string, etc)

boolean food = false ;
out food ;
type food ;

integer num = 16 ;
out num ;
type num ;

float f = 14.2 ;
out f ;
type f ;

The Interpreter

"""
This program compiles and interprets programs written in `Linny`
"""

def interpret(line_of_code):
    """Interprets user inputed Linny code """

    words = line_of_code

    if isinstance(words, str):
        words = words.split()

    #Basic empty line check
    if words == []:
        return

    #Comment check
    if "//" in words[0]:
        return

    #Interpret mode begin

    #If user wants to output a value
    if len(words) == 3 and \
       words[0] == "out" and \
       already_defined(words[1]) and \
       words[2] == ";":
        print(VARIABLES[words[1]]['value'])
        return

    #If user wants to get the type of value
    if len(words) == 3 and \
       already_defined(words[1]) and \
       words[0] in MISC_KEYWORDS and \
       words[2] == ";":
        if words[0] == "type":
            print(VARIABLES[words[1]]['data_type'])
            return

    #If user wants to create a value
    if len(words) == 5 and words[4] == ";":
        add_to_variables(
            name=words[1],
            value=words[3],
            data_type=words[0],
            line_number=0
        )
        return

    #If user wants to edit a value
    if len(words) == 4 and words[3] == ";":
        change_value(words[0], words[2])
        return
    #Interpret mode end

def change_value(variable, new_value):
    """ Changes the value of the variable to the `new_value` """
    data_type = VARIABLES[variable]['data_type']
    if data_type == "integer":
        VARIABLES[variable]['value'] = int(new_value)
    elif data_type == "string":
        VARIABLES[variable]['value'] = str(new_value)
    elif data_type == "float":
        VARIABLES[variable]['value'] = float(new_value)
    elif data_type == "boolean":
        if new_value == "true":
            VARIABLES[variable]['value'] = True
        elif new_value == "false":
            VARIABLES[variable]['value'] = False
        else:
            exit(f"Cannot assign boolean value to {new_value}")
    elif data_type == "char":
        if len(new_value) == 1:
            VARIABLES[variable]['value'] = chr(new_value)
        else:
            exit(f"char can only be one character long, not {new_value}!")
    else:
        exit(f"Not a data type")

def add_to_variables(name, value, data_type, line_number):
    """ Checks `data_type` of passed variable, and adds it to list of variables """
    if data_type == "integer":
        VARIABLES[name] = {'value': int(value), 'data_type': data_type}
    elif data_type == "string":
        VARIABLES[name] = {'value': value, 'data_type': data_type}
    elif data_type == "float":
        VARIABLES[name] = {'value': float(value), 'data_type': data_type}
    elif data_type == "boolean":
        if value == "true":
            VARIABLES[name] = {'value': True, 'data_type': data_type}
        elif value == "false":
            VARIABLES[name] = {'value': False, 'data_type': data_type}
        else:
            exit(f"SyntaxError: Expected boolean true/false on line {line_number}")
    elif data_type == "char":
        VARIABLES[name] = {'value': chr(value), 'data_type': data_type}
    else:
        exit(f"SyntaxError: {data_type} is not a valid data type on line {line_number}")

def variable_syntax_check(line_number, line):
    """ Returns if the syntax is correct in the passed `line` """

    words = line.split()

    if words == []:
        return

    if words[0] in list(VARIABLES.keys()):
        #Check if next word is =
        if words[1] == "=":
            #Check if last index that holds ; exists
            #try:
            #    words[len(words - 1)] = words[len(words - 1)]
            #except IndexError:
            #    exit(f"SyntaxError: Expected ; at end of line {line_number}")
            if words[3] == ";":
                add_to_variables(
                    name=words[0],
                    value=words[2],
                    data_type=VARIABLES[words[0]['data_type']],
                    line_number=line_number
                )
            else:
                exit(f"SyntaxError: Expected ; at end of line {line_number}")

    #Check if keyword is first argument, or variable has already been defined
    if words[0] in VARIABLE_KEYWORDS:
        #Check if variable hasn't already been defined
        if words[1] not in VARIABLES.keys():
            #Check if next word is '='
            if words[2] == "=":
                #Check if ending is ;
                try:
                    words[4] = words[4]
                except IndexError:
                    exit(f"""SyntaxError: Excepted ; at end of line {line_number}""")
                if words[4] == ";":
                    #Call method and pass relevent information to add to variables
                    add_to_variables(
                        name=words[1],
                        value=words[3],
                        data_type=words[0],
                        line_number=line_number
                    )
                else:
                    exit(f"SyntaxError: Excepted ; at end of line {line_number}")
            else:
                exit(f"SyntaxError: Expected '=' on line {line_number}")
        else:
            exit(f"SyntaxError: Variable {words[1]} has already been defined.")
    else:
        exit(f"SyntaxError: Variable {words[0]} has not been defined.")

def if_logic_syntax_check(statement):
    """ Determines if the syntax is correct for the if statement """
    expression = statement[0].split()

    #Determine is logic statements are correct
    if expression[0] in LOGIC_KEYWORDS and \
       expression[2] in LOGIC_KEYWORDS and \
       expression[4] in LOGIC_KEYWORDS:
        #Now check if variable names are correct
        if already_defined(expression[1]) and already_defined(expression[3]):
            return
        else:
            if not already_defined(expression[1]) and already_defined(expression[3]):
                exit(f"SyntaxError: {expression[1]} has not been defined yet.")
            if already_defined(expression[1]) and not already_defined(expression[3]):
                exit(f"SyntaxError: {expression[3]} has not been defined yet.")
            if not already_defined(expression[1]) and not already_defined(expression[3]):
                exit(f"SyntaxError: {expression[1]} and {expression[3]} have not been defined.")
    else:
        exit(f"SyntaxError: Logic keyword not spelled correctly / not included.")

    #Now check the body
    del statement[0], statement[len(statement) - 1]

    for i in range(len(statement)):
        if not statement[i][:1] == "\t":
            exit(f"SyntaxError: Inconsistent Tabbing")

def parse_if(index, lines):
    """ Returns the if statement at the place in the file """
    statement = []
    for i in range(index, len(lines)):
        if lines[i][0] != "endif":
            statement.append(lines[i])
        else:
            break
    return statement

def to_list(file):
    """ Converts the lines in the source file to a list"""
    lines = []
    with open(file, "r") as file_:
        for line in file_:
            if line[len(line) - 1] == "\n":
                lines.append(line[:len(line) - 1])
            else:
                lines.append(line)
        return lines

def compile_file(source_file):
    """ Starts compiling process """
    lines = to_list(source_file)
    for line_number, line in enumerate(lines):
        if line != "":
            if is_variable(line.split()[0]):
                variable_syntax_check(line_number + 1, line)
            if line.split()[0] == "if":
                if_logic_syntax_check(parse_if(line_number, lines))
    print("Code compiles!")

def is_variable(word):
    """ Determines if the passed word is a/possibly can be a variable """
    return word in VARIABLE_KEYWORDS and word not in LOGIC_KEYWORDS and word not in FUNC_KEYWORDS

def already_defined(variable):
    """ Returns if the variable has already been defined """
    return variable in list(VARIABLES.keys())


if __name__ == '__main__':

    #Dict of variables that have been initialized in the program
    VARIABLES = {}
    FUNCTIONS = {}

    VARIABLE_KEYWORDS = ["integer", "string", "float", "boolean", "char"]
    LOGIC_KEYWORDS = ["if", "endif", "else", "while", "for", "then", "equals", "greaterthan", "lessthan"]
    FUNC_KEYWORDS = ["func", "endfunc"]
    MISC_KEYWORDS = ["type"]

    ALL_KEYWORDS = VARIABLE_KEYWORDS + LOGIC_KEYWORDS + FUNC_KEYWORDS + MISC_KEYWORDS

    SOURCE_FILE = "Code/Python/Linny/script.linny"
    lines = to_list(SOURCE_FILE)

    for line in lines:
        interpret(line)

    """
    print("[Linny Interpreter]")
    print("Enter in one line of code at a time!")
    while True:
        code = input(">>> ")
        variable_syntax_check(0, code)
    """
\$\endgroup\$
2
  • 2
    \$\begingroup\$ I see some keywords reserved for flow control and function definition, but don't see them used anywhere. It seems like this language isn't quite ready to "progam" anything. A good test of readiness might be an implementation of a Turing machine -- if you can implement one, then you can theoretically compute "anything." Beyond that, it might be good to identify scenarios where your language is particularly useful (that is, goal of language design shouldn't be ability to construct a TM; focus could be on solving a class of problems). \$\endgroup\$ Commented Jul 25, 2019 at 18:26
  • 1
    \$\begingroup\$ You may want to indicate in this question that there is a follow-up (with link). -> two-way question reference \$\endgroup\$
    – dfhwze
    Commented Jul 27, 2019 at 18:06

1 Answer 1

9
\$\begingroup\$

I'm just going to take a look at the interpret function for now at least. I'm also up for suggestions to improve the review as I've not had a lot of time to go through it.

The Interpret Function

To start off, the function is doing two things; it's splitting the line_of_code into tokens(rather strictly for a programming language) and then interpreting it. This function should probably be split into two; a tokenizing function and the actual interpreter, I'll elaborate later.

As a bit of a tangent, most programming language would--after tokenization, create what's called an an Abstract Syntax Tree(AST) to validate code and also because things like an if statement can have a "body"; code nested inside of it which makes it a tree. This is enforced in Python by a visual indent, Linny does not appear to have a tree structure though. This would be a good place to begin if expanding the language as this limits the language.

Generally, your interpret function is overall much too permissive in several places because it doesn't check every token, and the method begins with checks that are subtly wrong;

  • words is a misleading name--for a programming language they are more like tokens that Linny seems to guarantee are delimited by spaces(most languages, like Python do not).

  • words' type is not guaranteed to be an array by the time you check words == [], not unless it is passed as a string or already is an array. You'll likely want to just check that it's passed a string and raise an exception if it is not, or simply use type hints instead. Note that type hints aren't automatically enforced, they're there to explain to a user what the function does, ex def interpret(line_of_code: str) explains for a user that the code will probably error if it isn't a string.

  • "//" in words[0] will think text like foo//bar is all a comment(i.e. foo would be assumed to be a comment, not only bar) because in looks "in" the whole string. You probably want words[0].startswith("//") for naïve approaches, but if comments are allowed without whitespace before it as the foo//bar example shows, more work would have to be done.

Note: All of the above code I've covered should probably be put into a function such as tokenize. This is so that you can create more advanced logic later and leave the interpret function with a single responsibility.

The next component, the actual interpreting also has a few flaws, the most broad is that it is overall a bit hard to read/understand because of the lack of an AST. Passing in an AST to interpret it, instead of working with raw tokens, would allow the logic for parsing which command to be simplified. Overall this seems to be a reoccurring theme.

The out command, annotated:

# You commented the next line. It'd probably be better as: "out command" or something more descriptive.
# You also should add a space after the "#"; it's the typical Python style.
#If user wants to output a value
if len(words) == 3 and \ # This is probably too strict, unless you enforce one command per line.
                         # Secondly, this could be added to a tokenize function.
       words[0] == "out" and \ # NOTE: In an AST this would be the node name
       already_defined(words[1]) and \ # no error happens if it isn't already defined.
       words[2] == ";": # The tokenize function could handle this; expect a semicolon and strip it off.
        print(VARIABLES[words[1]]['value'])
        return

These notes apply to most, but now for the unique reviews of each one:

For the type command, you have the checks in a bit of a weird order. You should check the tokens in number order. Also, your nested check words[0] == "type" makes your words[0] in MISC_KEYWORDS check redundant; you should just use the words[0] == "type" because if word[0] == "type", word[0] must be in MISC_KEYWORDS because it's a constant(by convention) and "type" is in MISC_KEYWORDS, in fact it's the only item. Those constants, such as MISC_KEYWORDS do actually seem to be a start towards a more versatile AST or language grammar, which is great.

Your set command is very flawed in its check. It only verifies that it has 5 tokens and ends with a semicolon; foo bar lorem ipsum ; would make your program think it's a set command. There may be checking in add_to_variables, but that sort of check should go in a tokenizer anyways. Then you could be passed something like command and check command.name instead.

Your next command, edit has a similar issue; it doesn't check anything except a semi-colon before trying to use it. If you ever expand your program this will be an issue because if anything has 5 or 4 tokens your code as is will believe it is a set or edit command(as I've dubbed them).

Lastly... your program just ends after this. If I give it foobar lorem ipsum//this is incredibly invalid ; 12fasdf the interpret function will just do nothing with it, at minimum a user would expect feedback that "this is invalid". This'd be something to catch at the tokenization stage; nothing invalid should ever be possible to feed to the interpreter function unless run directly(which it shouldn't be).

Here's what this looks like all together, and plus a little bit.

def tokenize(line_of_code: str):
    """Tokenizes Linny code """

    # Now it's more obvious what you're doing; you're checking for an empty line.
    if line_of_code == "":
        return

    # Everything is a comment.
    if line_of_code.startswith("//"):
        return

    tokens = tokens.split() # They were warned with the type hint, you can always be more friendly and type check though.
    # There is *way* more you could do here though, like most of the command checking etc. It's just a lot more to change so I didn't.

    return tokens


def interpret(tokens):
    """Interprets Linny tokens"""

    # Out command
    if len(tokens) == 3 and \
       tokens[0] == "out" and \
       already_defined(tokens[1]) and \
       tokens[2] == ";":
        print(VARIABLES[tokens[1]]['value'])
        return

    # Type command
    if len(tokens) == 3 and \
       tokens[0] == "type":
       already_defined(tokens[1]) and \
       tokens[2] == ";":
        print(VARIABLES[tokens[1]]['data_type'])
        return

    # Create a variable
    if len(tokens) == 5 and \
       tokens[0] in VARIABLE_KEYWORDS and \
       # no check for the name (seemingly) needed.
       tokens[2] == "=" and \
       tokens[4] == ";":
        add_to_variables(
            name=tokens[1],
            value=tokens[3],
            data_type=tokens[0],
            line_number=0 # The line number probably shouldn't always be zero, or be in the function either way.
        )
        return

    # Edit a variable
    if len(tokens) == 4 and \
       is_variable(tokens[0]) == "" and \
       tokens[1] == "=" and \
       # is valid... value?
       tokens[3] == ";":
        change_value(tokens[0], tokens[2])
        return

    # No valid commands... what should *you* do?

Note: Writing a whole language is a complicated beast. I have suggested some (simplified) tips that real languages follow, but this review could spiral into minute details not seemingly to accord with the expected level of responses. I'd suggest finding some good books or articles on programming languages it if you're interesting in making a more complete one, but acquiring more programming skills would also be valuable to do prior.

P.S. The type things in and get a result back style of coding that you describe is called a Read-eval-print loop or a REPL--that's (mostly) what you've made in your code.

P.P.S. A formatter and a linter wouldn't hurt if you don't already have one.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.