12

I'm trying to learn myself some C++ from scratch at the moment.
I'm well-versed in python, perl, javascript but have only encountered C++ briefly, in a classroom setting in the past. Please excuse the naivete of my question.

I would like to split a string using a regular expression but have not had much luck finding a clear, definitive, efficient and complete example of how to do this in C++.

In perl this is action is common, and thus can be accomplished in a trivial manner,

/home/me$ cat test.txt
this is  aXstringYwith, some problems
and anotherXY line with   similar issues

/home/me$ cat test.txt | perl -e'
> while(<>){
>   my @toks = split(/[\sXY,]+/);
>   print join(" ",@toks)."\n";
> }'
this is a string with some problems
and another line with similar issues

I'd like to know how best to accomplish the equivalent in C++.

EDIT:
I think I found what I was looking for in the boost library, as mentioned below.

boost regex-token-iterator (why don't underscores work?)

I guess I didn't know what to search for.


#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
  string s;
  do{
    if(argc == 1)
      {
        cout << "Enter text to split (or \"quit\" to exit): ";
        getline(cin, s);
        if(s == "quit") break;
      }
    else
      s = "This is a string of tokens";

    boost::regex re("\\s+");
    boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
    boost::sregex_token_iterator j;

    unsigned count = 0;
    while(i != j)
      {
        cout << *i++ << endl;
        count++;
      }
    cout << "There were " << count << " tokens found." << endl;

  }while(argc == 1);
  return 0;
}

2
  • Check out Boost.Regex. I think you can find your answer here: stackoverflow.com/questions/181624/… Commented Jun 14, 2009 at 4:49
  • 1
    you should add the "found on my own" part as an answer to your own question instead of having it be part of your question... although mention that you found it and posted the answer. if someone else comes along and finds this question useful... they'll want to see the community selected answer along with your chosen one. Your answer might not be the communites's best choice.
    – Ape-inago
    Commented Jun 14, 2009 at 5:27

4 Answers 4

17

The boost libraries are usually a good choice, in this case Boost.Regex. There even is an example for splitting a string into tokens that already does what you want. Basically it comes down to something like this:

boost::regex re("[\\sXY]+");
std::string s;

while (std::getline(std::cin, s)) {
  boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
  boost::sregex_token_iterator j;
  while (i != j) {
     std::cout << *i++ << " ";
  }
  std::cout << std::endl;
}
1
  • although i found my own way to regex_token_iterator from oberoi's post, I chose this as an answer because it gives a concise, working example, and includes the link to the appropriate boost page. cheers.
    – Anonymous
    Commented Jun 14, 2009 at 5:14
2

If you want to minimize use of iterators, and pithify your code, the following should work:

#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main()
{
  const boost::regex re("[\\sXY,]+");

  for (std::string s; std::getline(std::cin, s); ) 
  {
    std::cout << regex_replace(s, re, " ") << std::endl;   
  }

}
1

Unlike in Perl, regular expressions are not "built in" into C++.

You need to use an external library, such as PCRE.

2
  • does this also contain a 'split' function? python contains a default regular expression module, 're', which provides string splitting convenience functions. i wonder if this works the same way?
    – Anonymous
    Commented Jun 14, 2009 at 4:46
  • 1
    This answer was true when submitted, but is no longer true with the availability of C++11. #include <regex>
    – Justin
    Commented Feb 15, 2017 at 17:17
1

Regex are part of TR1 included in Visual C++ 2008 SP1 (including express edition) and G++ 4.3.

Header is <regex> and namespace std::tr1. Works great with STL.

Getting started with C++ TR1 regular expressions

Visual C++ Standard Library : TR1 Regular Expressions