5

I apologize if this question is a bit vague or just plain stupid, I am still very much a novice.

I need to extract information from a web log file in c++. The string manipulations are relatively, accessing the data in a timely fashion isn't. What I am doing currently

string str;

ifstream fh("testlog.log",ios::in);

while (getline(fh,str));

From here I get the useful data from the string. This works fine for a log file with 100 entries, but takes forever on a log file with million+ entries. Any help would greatly be appreciated

2
  • 2
    Just for testing, could you try using fgets ? Open the file with fopen and then something like while(fgets(cstr, 256, fp)). Tell us what your results are (how long it takes).
    – nc3b
    Commented Apr 18, 2011 at 21:55
  • Have you profiled to see where the bottleneck is? If it's disk, nothing in code is going to fix that, per se. (If your CPU time is not dwarfed by IO time, you could multithread.)
    – GManNickG
    Commented Apr 18, 2011 at 22:07

4 Answers 4

2

I really suspect that I/O is hurting you more than ifstream here. Have you checked to see that you're actually CPU bound? Most likely you're having disk and cache locality issues.

There may not be a lot you can do in that case.

If it is CPU bound have you profiled to see where the CPU time is going?

1
  • 2
    I know this question is old, but giving answers such as "disk" and "cache locality" as issues to someone who has said they are a novice, without explaining, and asking for profiling, isn't the most effective imo
    – Gary Allen
    Commented Aug 29, 2021 at 15:29
2

After wasting hours and hours of my time, I compiled the same code in Quincy2005 instead of Microsoft Visual studio. The result was dramatic. From a 40min execution time to 1 min. The some improvement can accomplished in Microsoft Visual Studio by passing a pointer of the filehandler to the getline function. On a Linux based system it takes about 40 sec to execute. I cursed Microsoft for a good 40 min for wasting my time.

0
1

Here the fastest way I found to extract a file :

std::ifstream file("test.txt", std::ios::in | std::ios::end);

std::size_t fileSize = file.tellg();

std::vector<char> buffer(fileSize);

file.seekg(0, std::ios::beg);

file.read(buffer.data(), fileSize);

std::string str(buffer.begin(), buffer.end());

Yet, if your file is really that big, I strongly suggest you to manipulate it as a stream...

1
  • Ok what I have done is obtained filesize, like you have done. divided the value by 2, seems like the is to big for 1 dynamic array. Reading the chars into a string seems to be wasting all the time. So I suppose I could try and read each line from the array without the help of string functions. It just seems a bit messy
    – Jacques
    Commented Apr 19, 2011 at 9:26
1

@Errata:

are you sure, that your code would be faster than say:

std::ifstream in("test.txt");
in.unsetf(std::ios::skipws);
std::string contents;
std::copy(
        std::istream_iterator<char>(in),
        std::istream_iterator<char>(),
        std::back_inserter(contents));

Also, the OP wants linewise access, which would conveniently be done:

std::ifstream in("test.txt");
in.unsetf(std::ios::skipws);
size_t count = std::count_if(
        std::istream_iterator<std::string>(in),
        std::istream_iterator<std::string>(),
        &is_interesting);
std::cout << "Interesting log lines: " << count << std::endl;

of course define a predicate, e.g.

static bool is_interesting(const std::string& line)
{ 
    return std::string::npos != line.find("FATAL");
}

Not the answer you're looking for? Browse other questions tagged or ask your own question.