1

I am using the following code for splitting of each word into a Token per line. My problem lies here: I want a continuous update on my number of tokens in the file. The contents of the file are:

Student details:
Highlander 141A Section-A.
Single 450988012 SA

Program:

#include <iostream>
using std::cout;
using std::endl;

#include <fstream>
using std::ifstream;

#include <cstring>

const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";

int main()
{
  // create a file-reading object
  ifstream fin;
  fin.open("data.txt"); // open a file
  if (!fin.good()) 
    return 1; // exit if file not found

  // read each line of the file
  while (!fin.eof())
  {
    // read an entire line into memory
    char buf[MAX_CHARS_PER_LINE];
    fin.getline(buf, MAX_CHARS_PER_LINE);

    // parse the line into blank-delimited tokens
    int n = 0; // a for-loop index

    // array to store memory addresses of the tokens in buf
    const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0

    // parse the line
    token[0] = strtok(buf, DELIMITER); // first token
    if (token[0]) // zero if line is blank
    {
      for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
      {
        token[n] = strtok(0, DELIMITER); // subsequent tokens
        if (!token[n]) break; // no more tokens
      }
    }

    // process (print) the tokens
    for (int i = 0; i < n; i++) // n = #of tokens
      cout << "Token[" << i << "] = " << token[i] << endl;
      cout << endl;
  }
}

Output:

Token[0] = Student
Token[1] = details:

Token[0] = Highlander
Token[1] = 141A
Token[2] = Section-A.

Token[0] = Single
Token[1] = 450988012
Token[2] = SA

Expected:

Token[0] = Student
Token[1] = details:

Token[2] = Highlander
Token[3] = 141A
Token[4] = Section-A.

Token[5] = Single
Token[6] = 450988012
Token[7] = SA

So I want it to be incremented so that I could easily identify the value by its variable name. Thanks in advance...

7
  • 2
    I'm just curious, but where are people finding this junk. There's no case (even in C) where strtok is an appropriate solution, and there's almost no case in C++ where you should be using the member getline, rather than reading into an std::string. And of course, !fin.eof() as a loop condition is wrong as well. Commented Sep 30, 2013 at 14:40
  • strtok(0, DELIMITER); is not valid, and should be generating a warning. Strtok's first parameter is a char*, and you have passed an int.
    – abelenky
    Commented Sep 30, 2013 at 14:41
  • boost.org/doc/libs/1_54_0/libs/tokenizer/tokenizer.htm ?
    – Jaffa
    Commented Sep 30, 2013 at 14:46
  • 1
    @NeilKirk The first thing you need to learn when learning C++ is that nothing is obvious. But why are so many tutorials so bad? You'd think that word would get around after a while, people would stop linking to them, and they'd stop showing up in Google. Commented Sep 30, 2013 at 14:50
  • 2
    @andre If by "more effective", you mean correct, or "that actually work", then I agree. The issue isn't effectiveness here, it is correctness. Commented Sep 30, 2013 at 14:51

2 Answers 2

2

What's wrong with the standard, idiomatic solution:

std::string line;
while ( std::getline( fin, line ) ) {
    std::istringstream parser( line );
    int i = 0;
    std::string token;
    while ( parser >> token ) {
        std::cout << "Token[" << i << "] = " << token << std::endl;
        ++ i;
    }
}

Obviously, in real life, you'll want to do more than just output each token, and you'll want more complicated parsing. But anytime you're doing line oriented input, the above is the model you should be using (probably keeping track of the line number as well, for error messages).

It's probably worth pointing out that in this case, an even better solution would be to use boost::split in the outer loop, to get a vector of tokens.

4
  • You should move int i = 0; before the wile loop. Otherwise you won't have the expected output. Commented Sep 30, 2013 at 15:10
  • @OlafDietsche The int i = 0; is before the while loop. (Look at his sample output to see what he wants.) Commented Sep 30, 2013 at 16:33
  • Sorry, I meant to move it before the first while loop. The output labeled "Output:" is what he gets and the output "Expected:" is what he wants. At least, that's what I understand. Commented Sep 30, 2013 at 19:51
  • @OlafDietsche Yes. It was I who misread his question. Yes, the variable (and its initialization) does belong before the first loop. (And in this case, there's no reason to use the nested loops, unless you want to keep track of the line number for error messages. Or use boost::split, which is really more appropriate in this case.) Commented Oct 1, 2013 at 7:57
0

I would just let iostream do the splitting

std::vector<std::string> token;
std::string s;
while (fin >> s)
    token.push_back(s);

Then you can output the whole array at once with proper indexes.

for (int i = 0; i < token.size(); ++i)
    cout << "Token[" << i << "] = " << token[i] << endl;

Update:

You can even omit the vector altogether and output the tokens as you read them from the input strieam

std::string s;
for (int i = 0; fin >> s; ++i)
    std::cout << "Token[" << i << "] = " << token[i] << std::endl;
12
  • 2
    What's with the !fin.eof()? That's never an appropriate loop condition. Commented Sep 30, 2013 at 14:40
  • See here: stackoverflow.com/questions/5605125/… for a discussion of what's wrong with !fin.eof().
    – us2012
    Commented Sep 30, 2013 at 14:45
  • @JamesKanze, us2012 You're both right. But if OP insists on doing it that way, he can achieve his objective with a separate output variable. Commented Sep 30, 2013 at 14:53
  • @user2754070 What do you mean with it breaks at line[2]? Commented Sep 30, 2013 at 14:59
  • @OlafDietsche If the OP insists on using fin.eof(), his code will never work. And if he insists on using strtok, it will be excessively fragile, and unmaintainable. You're first solution is fine, at least if he doesn't need to keep the lines separate; there's no point in trying to pretend that the alternatives he seems to favor are acceptable. Commented Sep 30, 2013 at 15:00

Not the answer you're looking for? Browse other questions tagged or ask your own question.