125

I have a string in Lua and want to iterate individual characters in it. But no code I've tried works and the official manual only shows how to find and replace substrings :(

str = "abcd"
for char in str do -- error
  print( char )
end

for i = 1, str:len() do
  print( str[ i ] ) -- nil
end

6 Answers 6

177

In lua 5.1, you can iterate of the characters of a string this in a couple of ways.

The basic loop would be:

for i = 1, #str do
    local c = str:sub(i,i)
    -- do something with c
end

But it may be more efficient to use a pattern with string.gmatch() to get an iterator over the characters:

for c in str:gmatch"." do
    -- do something with c
end

Or even to use string.gsub() to call a function for each char:

str:gsub(".", function(c)
    -- do something with c
end)

In all of the above, I've taken advantage of the fact that the string module is set as a metatable for all string values, so its functions can be called as members using the : notation. I've also used the (new to 5.1, IIRC) # to get the string length.

The best answer for your application depends on a lot of factors, and benchmarks are your friend if performance is going to matter.

You might want to evaluate why you need to iterate over the characters, and to look at one of the regular expression modules that have been bound to Lua, or for a modern approach look into Roberto's lpeg module which implements Parsing Expression Grammers for Lua.

4
  • Thanks. About lpeg module you have mentioned - does it save tokens positions in original text after tokenization? The task i need to perform is to syntax highlight specific simple language in scite via lua (with no compiled c++ parser). Also, how to install lpeg? Seems it has .c source in distribution - does it need to be compiled alongside lua?
    – grigoryvp
    Commented May 7, 2009 at 9:25
  • Building lpeg will produce a DLL (or .so) that should be stored where require can find it. (i.e. somewhere identified by the content o f the global package.cpath in your lua installation.) You also need to install its companion module re.lua if you want to use its simplified syntax. From an lpeg grammar, you can get callbacks and capture text in a number of ways, and it is certainly possible to use captures to simply store the location of match for later use. If syntax highlight is the goal, then a PEG is not a bad choice of tool.
    – RBerteig
    Commented May 7, 2009 at 21:32
  • 3
    Not to mention the latest releases of SciTE (since 2.22) include Scintillua, an LPEG-based lexer, meaning it can work right out of the box, no re-compilation required. Commented May 16, 2011 at 18:21
  • 1
    All they doesn't work with non-ASCII characters.
    – Ivan Black
    Commented Dec 1, 2021 at 21:31
17

Depending on the task at hand it might be easier to use string.byte. It is also the fastest ways because it avoids creating new substring that happends to be pretty expensive in Lua thanks to hashing of each new string and checking if it is already known. You can pre-calculate code of symbols you look for with same string.byte to maintain readability and portability.

local str = "ab/cd/ef"
local target = string.byte("/")
for idx = 1, #str do
   if str:byte(idx) == target then
      print("Target found at:", idx)
   end
end
14

If you're using Lua 5, try:

for i = 1, string.len(str) do
    print( string.sub(str, i, i) )
end
10

There are already a lot of good approaches in the provided answers (here, here and here). If speed is what are you primarily looking for, you should definitely consider doing the job through Lua's C API, which is many times faster than raw Lua code. When working with preloaded chunks (eg. load function), the difference is not that big, but still considerable.

As for the pure Lua solutions, let me share this small benchmark, I've made. It covers every provided answer to this date and adds a few optimizations. Still, the basic thing to consider is:

How many times you'll need to iterate over characters in string?

  • If the answer is "once", than you should look up first part of the banchmark ("raw speed").
  • Otherwise, the second part will provide more precise estimation, because it parses the string into the table, which is much faster to iterate over. You should also consider writing a simple function for this, like @Jarriz suggested.

Here is full code:

-- Setup locals
local str = "Hello World!"
local attempts = 5000000
local reuses = 10 -- For the second part of benchmark: Table values are reused 10 times. Change this according to your needs.
local x, c, elapsed, tbl
-- "Localize" funcs to minimize lookup overhead
local stringbyte, stringchar, stringsub, stringgsub, stringgmatch = string.byte, string.char, string.sub, string.gsub, string.gmatch

print("-----------------------")
print("Raw speed:")
print("-----------------------")

-- Version 1 - string.sub in loop
x = os.clock()
for j = 1, attempts do
    for i = 1, #str do
        c = stringsub(str, i)
    end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))

-- Version 2 - string.gmatch loop
x = os.clock()
for j = 1, attempts do
    for c in stringgmatch(str, ".") do end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))

-- Version 3 - string.gsub callback
x = os.clock()
for j = 1, attempts do
    stringgsub(str, ".", function(c) end)
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))

-- For version 4
local str2table = function(str)
    local ret = {}
    for i = 1, #str do
        ret[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
    end
    return ret
end

-- Version 4 - function str2table
x = os.clock()
for j = 1, attempts do
    tbl = str2table(str)
    for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
        c = tbl[i]
    end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))

-- Version 5 - string.byte
x = os.clock()
for j = 1, attempts do
    tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
    for i = 1, #tbl do
        c = tbl[i] -- Note: produces char codes instead of chars.
    end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))

-- Version 5b - string.byte + conversion back to chars
x = os.clock()
for j = 1, attempts do
    tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
    for i = 1, #tbl do
        c = stringchar(tbl[i])
    end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))

print("-----------------------")
print("Creating cache table ("..reuses.." reuses):")
print("-----------------------")

-- Version 1 - string.sub in loop
x = os.clock()
for k = 1, attempts do
    tbl = {}
    for i = 1, #str do
        tbl[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
    end
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))

-- Version 2 - string.gmatch loop
x = os.clock()
for k = 1, attempts do
    tbl = {}
    local tblc = 1 -- Note: This is faster than table.insert
    for c in stringgmatch(str, ".") do
        tbl[tblc] = c
        tblc = tblc + 1
    end
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))

-- Version 3 - string.gsub callback
x = os.clock()
for k = 1, attempts do
    tbl = {}
    local tblc = 1 -- Note: This is faster than table.insert
    stringgsub(str, ".", function(c)
        tbl[tblc] = c
        tblc = tblc + 1
    end)
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))

-- Version 4 - str2table func before loop
x = os.clock()
for k = 1, attempts do
    tbl = str2table(str)
    for j = 1, reuses do
        for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))

-- Version 5 - string.byte to create table
x = os.clock()
for k = 1, attempts do
    tbl = {stringbyte(str,1,#str)}
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))

-- Version 5b - string.byte to create table + string.char loop to convert bytes to chars
x = os.clock()
for k = 1, attempts do
    tbl = {stringbyte(str, 1, #str)}
    for i = 1, #tbl do
        tbl[i] = stringchar(tbl[i])
    end
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))

Example output (Lua 5.3.4, Windows):

-----------------------
Raw speed:
-----------------------
V1: elapsed time: 3.713
V2: elapsed time: 5.089
V3: elapsed time: 5.222
V4: elapsed time: 4.066
V5: elapsed time: 2.627
V5b: elapsed time: 3.627
-----------------------
Creating cache table (10 reuses):
-----------------------
V1: elapsed time: 20.381
V2: elapsed time: 23.913
V3: elapsed time: 25.221
V4: elapsed time: 20.551
V5: elapsed time: 13.473
V5b: elapsed time: 18.046

Result:

In my case, the string.byte and string.sub were fastest in terms of raw speed. When using cache table and reusing it 10 times per loop, the string.byte version was fastest even when converting charcodes back to chars (which isn't always necessary and depends on usage).

As you have probably noticed, I've made some assumptions based on my previous benchmarks and applied them to the code:

  1. Library functions should be always localized if used inside loops, because it is a lot faster.
  2. Inserting new element into lua table is much faster using tbl[idx] = value than table.insert(tbl, value).
  3. Looping through table using for i = 1, #tbl is a bit faster than for k, v in pairs(tbl).
  4. Always prefer the version with less function calls, because the call itself adds a little bit to the execution time.

Hope it helps.

2
  • The elapsed = os.clock() - x adds one global table fetch into the mix. Recommended to take os.clock into a local variable. There is also doing - x on it which may or may not affect the time. Noticeable? Probably not until you run these tests hundreds of times to secure the average, min, and max run times. Commented Jan 25, 2023 at 22:36
  • I did some testing with long strings, these were the results ordered by speed: (Strlen 160k V4: 41.544) (Strlen 400k V1: 9.121) (Strlen 800k V3: 0.064, V2: 0.048, V5b: 0.048, V5: 0.018) NOTES: [v4] runs out of memory on strings longer than around 160k, at least on my machine. [v5/5b] byte() only handles slices up to 7997 chars long, so I had to warp both in an extra loop to process the string in chunks of that size.
    – dt192
    Commented Jun 17, 2023 at 12:14
1

Iterating to construct a string and returning this string as a table with load()...

itab=function(char)
local result
for i=1,#char do
 if i==1 then
  result=string.format('%s','{')
 end
result=result..string.format('\'%s\'',char:sub(i,i))
 if i~=#char then
  result=result..string.format('%s',',')
 end
 if i==#char then
  result=result..string.format('%s','}')
 end
end
 return load('return '..result)()
end

dump=function(dump)
for key,value in pairs(dump) do
 io.write(string.format("%s=%s=%s\n",key,type(value),value))
end
end

res=itab('KOYAANISQATSI')

dump(res)

Puts out...

1=string=K
2=string=O
3=string=Y
4=string=A
5=string=A
6=string=N
7=string=I
8=string=S
9=string=Q
10=string=A
11=string=T
12=string=S
13=string=I
-1

All people suggest a less optimal method

Will be best:

    function chars(str)
        strc = {}
        for i = 1, #str do
            table.insert(strc, string.sub(str, i, i))
        end
        return strc
    end

    str = "Hello world!"
    char = chars(str)
    print("Char 2: "..char[2]) -- prints the char 'e'
    print("-------------------\n")
    for i = 1, #str do -- testing printing all the chars
        if (char[i] == " ") then
            print("Char "..i..": [[space]]")
        else
            print("Char "..i..": "..char[i])
        end
    end
2
  • 1
    "Less optimal" for what task? "Best" for what task? Commented Jul 10, 2017 at 12:28
  • 2
    This is the less optimal method. Commented May 6, 2022 at 12:29

Not the answer you're looking for? Browse other questions tagged or ask your own question.