24

When I need to scan in values from a bunch of strings, I often find myself falling back to C's sscanf() strictly because of its simplicity and ease of use. For example, I can very succinctly pull a couple double values out of a string with:

string str;
double val1, val2;
if (sscanf(str.c_str(), "(%lf,%lf)", &val1, &val2) == 2)
{
    // got them!
}

This obviously isn't very C++. I don't necessarily consider that an abomination, but I'm always looking for a better way to do a common task. I understand that the "C++ way" to read strings is istringstream, but the extra typing required to handle the parenthesis and comma in the format string above just make it too cumbersome to make me want to use it.

Is there a good way to either bend built-in facilities to my will in a way similar to the above, or is there a good C++ library that does the above in a more type-safe way? It looks like Boost.Format has really solved the output problem in a good way, but I haven't found anything similarly succinct for input.

8
  • Huh, I would have really expected Boost to have something here. Now my fingers are itching to make a library of my own for it... Commented Mar 22, 2012 at 16:03
  • 1
    FWIW, I consider sscanf just as "C++" as anything else - it is just limited in ability (but not as syntactically awful as iostreams). I've seen proposals to implement the C formatting functions in terms of variadic templates (hence C++11 only). This would be a huge improvement if it could be made performant. A good little project - let me know when you have it finished. ;^)
    – mcmcc
    Commented Mar 22, 2012 at 16:08
  • @mcmcc: actually, implementing printf with variadic templates is pretty easy, apart from positional arguments. I would expect the same issue with sscanf. Apart from that, I do not see an issue in performance. If anything, partial inlining could really be beneficial here. Commented Mar 22, 2012 at 16:48
  • @JasonR: I have yet to find a simpler way too. Boost.Format does the equivalent of printf already, though with an awkward syntax as it was created before the variadic templates, but I don't know of a scanning library. Commented Mar 22, 2012 at 16:54
  • 3
    In this particular case, operator>> for std::complex<double> will read exactly this format.
    – Bo Persson
    Commented Mar 22, 2012 at 16:56

3 Answers 3

16

I wrote a bit of code that can read in string and character literals. Like normal stream reads, if it gets invalid data it sets the badbit of the stream. This should work for all types of streams, including wide streams. Stick this bit in a new header:

#include <iostream>
#include <string>
#include <array>
#include <cstring>

template<class e, class t, int N>
std::basic_istream<e,t>& operator>>(std::basic_istream<e,t>& in, const e(&sliteral)[N]) {
        std::array<e, N-1> buffer; //get buffer
        in >> buffer[0]; //skips whitespace
        if (N>2)
                in.read(&buffer[1], N-2); //read the rest
        if (strncmp(&buffer[0], sliteral, N-1)) //if it failed
                in.setstate(in.rdstate() | std::ios::failbit); //set the state
        return in;
}
template<class e, class t>
std::basic_istream<e,t>& operator>>(std::basic_istream<e,t>& in, const e& cliteral) {
        e buffer;  //get buffer
        in >> buffer; //read data
        if (buffer != cliteral) //if it failed
                in.setstate(in.rdstate() | std::ios::failbit); //set the state
        return in;
}
//redirect mutable char arrays to their normal function
template<class e, class t, int N>
std::basic_istream<e,t>& operator>>(std::basic_istream<e,t>& in, e(&carray)[N]) {
        return std::operator>>(in, carray);
}

And it will make input characters very easy:

std::istringstream input;
double val1, val2;
if (input >>'('>>val1>>','>>val2>>')') //less chars than scanf I think
{
    // got them!
}

PROOF OF CONCEPT. Now you can cin string and character literals, and if the input is not an exact match, it acts just like any other type that failed to input correctly. Note that this only matches whitespace in string literals that aren't the first character. It's only four functions, all of which are brain-dead simple.

EDIT

Parsing with streams is a bad idea. Use a regex.

4
  • @JasonR There, I got the overload resolution to work out, so now it uses >> like all other input Commented Mar 23, 2012 at 3:54
  • I like it. While I would prefer something that allows me to specify the format string and arguments separately (like sscanf or Boost.Format), this is definitely the best-available solution I've seen. Nice job.
    – Jason R
    Commented Mar 23, 2012 at 13:00
  • @JasonR: Actually that's what I set out to do, but I was wondering how to bypass the string parsing, and realized I could split the strings... which eventually led to this very simple solution. As you say, formatting things a specific way is still tricky. I wonder if I can address that. If I think of something I'll comment again. Commented Mar 23, 2012 at 16:21
  • 2
    @MooingDuck This is an awesome snippet that I plan on using to remove some of the older C code in my project, and replace it with this.
    – Drise
    Commented Jun 14, 2012 at 17:00
6

The best thing i've ever used for string parsing is boost.spirit. It's fast,safe and very flexible. The big advantage is that you can write parsing rules in form close to EBNF grammar

using namespace boost::spirit;

boost::fusion::vector < double, double > value_;

std::string string_ = "10.5,10.6 ";

bool result_ = qi::parse(
    string_.begin(),
    string_.end(),
    qi::double_ >> ',' >> qi::double_, // Parsing rule
    value_); // value
2
  • 1
    Thanks for the input. I wouldn't say that's on par with sscanf() for my needs. I'm sure it is very powerful (I'm not familiar with spirit or what "EVNF grammar" is), but for my purposes it's not simple enough to make me want to change.
    – Jason R
    Commented Mar 22, 2012 at 17:55
  • @JasonR: Extended Backus-Naur Form (ISO/IEC 14977).
    – MSalters
    Commented Mar 23, 2012 at 8:54
3

I think that with regex it could be done easy. So boost::regex or std::regex in a new standard. After that just convert your tokens to float by using lexical_cast or streams directly.

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.