This seems a job for expl3
. Let's say we want to split a string of characters into its constituents, for later processing. So we define a macro that takes two arguments: the string and the macro for the processing
\documentclass{article}
\ExplSyntaxOn
\NewDocumentCommand{\stringprocess}{ m m }
{
\egreg_string_process:nn { #1 } { #2 }
}
\cs_new_protected:Npn \egreg_string_process:nn #1 #2
{
\text_map_inline:nn { #2 } { #1 { ##1 } }
}
\ExplSyntaxOff
\newcommand{\boxchar}[1]{\fbox{\strut#1} } % leave a space after the box
\begin{document}
\stringprocess{\boxchar}{abcdef}
\stringprocess{\boxchar}{ábcdefß}
\end{document}
![enter image description here](https://cdn.statically.io/img/i.sstatic.net/9TrMZ.png)
Different toy problem, but here “complex” character are more problematic, so I'll stick with ASCII. We want to input a string and get as a result a token list that contains only the digits in the string, separated by commas. We assume that the input is controlled, so it contains only alphanumeric characters.
All we need is to define a suitable auxiliary function, instead of the simple \boxchar
used before. However, it's better to use sequences instead of token lists, so I'll rework the solution from the start.
\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\seq_new:N \l__egreg_input_string_seq
\seq_new:N \l__egreg_output_string_seq
\cs_new_protected:Npn \egreg_string_process:nnn #1 #2 #3
% #1 = preprocess macro
% #2 = postprocess macro
% #3 = string
{
\seq_clear:N \l__egreg_output_string_seq
\seq_set_split:Nnn \l__egreg_input_string_seq { } { #3 }
\seq_map_inline:Nn \l__egreg_input_string_seq
{ #1 { ##1 } }
#2
}
\NewDocumentCommand{\boxchars}{ m }
{
\egreg_boxchars:n { #1 }
}
\cs_new_protected:Npn \egreg_boxchars:n #1
{
\egreg_string_process:nnn
{ \egreg_fbox_strut:n }
{ \seq_use:Nnnn \l__egreg_output_string_seq { ~ } { ~ } { ~ } }
{ #1 }
}
\cs_new_protected:Npn \egreg_fbox_strut:n #1
{
\seq_put_right:Nn \l__egreg_output_string_seq { \fbox { \strut #1 } }
}
\ExplSyntaxOff
\begin{document}
\boxchars{abcdef}
\end{document}
This would give the same result as before, but \unskip
wouldn't be necessary.
The string passed as fourth argument to \egreg_string_process:nnnn
is split into its components; the third argument is the delimiter of the components, which can also be empty; an auxiliary "output" sequence is cleared for possible subsequent use by the preprocessing or postprocessing macros;
Each element of the sequence is passed to the "preprocessing macro", which should be a one argument function;
The "postprocess" macro is applied.
In the example, the preprocess macro stores \fbox
, the postprocess macro just produces the items in the output sequence, separated by spaces.
What about the toy problem? The preprocess macro should test whether the item is a digit and, in this case, add it to the output sequence. Let's add this code before \ExplSyntaxOff
\cs_new_protected:Npn \egreg_store_digit:n #1
{
\bool_if:nT
{
\int_compare_p:n { `#1 >= `0 } && \int_compare_p:n { `#1 <= `9 }
}
{
\seq_put_right:Nn \l__egreg_output_string_seq { #1 }
}
}
\cs_new:Npn \egreg_print_list_commas:n #1
{
\seq_use:Nnnn \l__egreg_output_string_seq { , } { , } { , }
}
\NewDocumentCommand{\extractdigits}{ m }
{
\egreg_string_process:nnnn
{ \egreg_store_digit:n }
{ \egreg_print_list_commas:n }
{ }
{ #2 }
}
and try with
\begin{document}
\extractdigits{a1b2c3}
\end{document}
to get
1,2,3
Complete code for the toy problem:
\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn
\seq_new:N \l__egreg_input_string_seq
\seq_new:N \l__egreg_output_string_seq
\cs_new_protected:Npn \egreg_string_process:nnnn #1 #2 #3 #4
% #1 = preprocess macro
% #2 = postprocess macro
% #3 = separator
% #4 = string
{
\seq_clear:N \l__egreg_output_string_seq
\seq_set_split:Nnn \l__egreg_input_string_seq { #3 } { #4 }
\seq_map_inline:Nn \l__egreg_input_string_seq
{ #1 { ##1 } }
#2
}
\cs_new_protected:Npn \egreg_store_digit:n #1
{
\bool_if:nT
{
\int_compare_p:n { `#1 >= `0 } && \int_compare_p:n { `#1 <= `9 }
}
{
\seq_put_right:Nn \l__egreg_output_string_seq { #1 }
}
}
\cs_new:Npn \egreg_print_list_commas:n #1
{
\seq_use:Nnnn \l__egreg_output_string_seq { , } { , } { , }
}
\NewDocumentCommand{\extractdigits}{ O{} m }
{
\egreg_string_process:nnnn
{ \egreg_store_digit:n }
{ \egreg_print_list_commas:n }
{ #1 }
{ #2 }
}
\ExplSyntaxOff
\begin{document}
\extractdigits{a1b2c3d}
\end{document}