1
\$\begingroup\$

What I need:

I need a lookup table. Once the keywords are all found, then the value of the std::unordered_map is returned.And I don't care about the sequence of the strings in the std::vector.

I think std::unordered_map<std::vector<std::string>, int> could achieve this goal. Any better method to achieve this goal? Or any improvement for this code snippet?

#include <string>
#include <unordered_map>
#include <vector>
#include <iostream>

class Hash
{
    public:
        std::size_t operator()(const std::vector<std::string>& vec) const
        {
            std::hash<std::string> hasher;

            std::size_t hash_value{};
            for(auto itr = vec.begin(); itr!=vec.end(); itr++)
            {
                hash_value ^= (hasher(*itr));  //Why `hash_value ^= ((hasher(*itr))<<1);` does not influence the output of mp.find(to_search)
            }

            return hash_value;
        }
};

class Equal
{
    public:

        bool operator()(const std::vector<std::string>& vec1, const std::vector<std::string>& vec2) const
        {
            bool ret = true;

            if(vec1.size() != vec2.size())
            {
                ret = false;
            }
            else
            {
                for(auto itr_outer = vec1.begin(); itr_outer!=vec1.end(); itr_outer++)
                {
                    bool found = false;
                    for(auto itr_inner = vec1.begin(); itr_inner!=vec2.end(); itr_inner++)
                    {
                        if(*itr_outer == *itr_inner)
                        {
                            found = true;
                            break;
                        }
                    }

                    if(!found)
                    {
                        ret = false;
                        break;
                    }
                }
            }

            return ret;
        }
};

int main()
{
   const std::unordered_map<std::vector<std::string>, int, Hash, Equal> mp = {
        {{"hello word"}, 1},
        {{"thanks a lot", "I need some help"}, 2},
        {{"how to make it better"}, 3},
  };
  
  std::vector<std::string> to_search = {"I need some help", "thanks a lot"};

  auto itr = mp.find(to_search);
  if(itr!=mp.end())
  {
    std::cout <<"found:" << itr->second << std::endl;
  }
  else
  {
      std::cout << "not found" << std::endl;
  }
}
```
\$\endgroup\$
7
  • 2
    \$\begingroup\$ It might be faster to use a std::unordered_set instead of a vector; lookup is faster, and you say you don't care about sequence. If each string exists, and the size is equal, you have equality, as no duplicates can happen. \$\endgroup\$
    – Aganju
    Commented May 26, 2022 at 15:39
  • \$\begingroup\$ @Aganju Thank you for the rapid reply. A question arises after I carefully read your comment.Why std::unordered_set is faster, could you please explain that in more detail for me? \$\endgroup\$
    – John
    Commented May 27, 2022 at 1:24
  • \$\begingroup\$ std::unordered_set uses a hash to check if a key (here: one of your strings) is inside already. This is faster than looping through the vector and doing a string compare for each element. Note: I meant inside your map - not instead. You would use a std::unordered_map<std::unordered_set<std::string>,int>. \$\endgroup\$
    – Aganju
    Commented May 27, 2022 at 1:35
  • \$\begingroup\$ @Aganju I see. Thanks a lot. Then a question arises, how std::string::operator==() is implemented? If I understand you correctly, the goal is achieved by comparing byte one by one other than by comparing hash. But I think it's not the main point, why std::set is better for this post is because that it could avoid to loop through the vector of strings. Am I right? \$\endgroup\$
    – John
    Commented May 27, 2022 at 1:41
  • \$\begingroup\$ You still have to loop through one of the sets, but yes, not the second one. And the comparison on hash is one instruction, not another loop over all characters. - if you have small amounts, you wouldn't see a performance gain, but for large amounts of data, it should be huge. \$\endgroup\$
    – Aganju
    Commented May 27, 2022 at 1:47

3 Answers 3

2
+100
\$\begingroup\$

It depends what you really need. As @Aganju pointed out that looking up a hash is much faster than looping through strings and comparing.

But for your case, you seems to find out whether a specific sentence contains all the keywords or not, in this case hash does not work indeed. Help that helps.

\$\endgroup\$
1
\$\begingroup\$

To achieve a lookup table in this case, there is no need to use unordered_map at all. std::vector<std::pair<std::unordered_set<std::string>, int>> is much simpler than std::unordered_map<std::unordered_set<std::string>> since the former one doesn't need a custom hash function at all.

Here is the code snippet:

#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
#include <iostream>

int main()
{
   const std::vector<std::pair<std::unordered_set<std::string>, int>> mp = {
        {{"hello word"}, 1},
        {{"thanks a lot", "I need some help"}, 2},
        {{"how to make it better"}, 3},
  };
  
  std::string str{"Guys, I need some help, thanks a lot!"};

  auto Convert = [&mp](const std::string& str)->int{
        int ret = -1;
        for(auto const& pair:mp)
        {
            bool match = true;
            for(auto const& keyword:pair.first)
            {
                if(str.find(keyword)==std::string::npos)
                {
                    match = false;
                }
            }

            if(match)
            {
                ret = pair.second;
                break;
            }
        }

        return ret;
    };


    std::cout << Convert(str) << std::endl;
    std::cout << Convert("") << std::endl;
    std::cout << Convert("hello") << std::endl;
}

```
\$\endgroup\$
1
  • \$\begingroup\$ While no hashing can be easier, for large amounts there is a price, namely speed: looking up a hash is much faster than looping trough strings and comparing. \$\endgroup\$
    – Aganju
    Commented Jun 5, 2022 at 3:40
0
\$\begingroup\$

Thanks to the guidance of @Aganju. It is faster to use a std::unordered_set instead of a vector.

Here is a better implementation(any suggestion is welcome):

#include <string>
#include <unordered_map>
#include <unordered_set>
#include <iostream>

class Hash
{
    public:
        std::size_t operator()(const std::unordered_set<std::string>& vec) const
        {
            std::hash<std::string> hasher;

            std::size_t hash_value{};
            for(const auto& str:vec)
            {
                hash_value ^= (hasher(str)); 
            }

            return hash_value;
        }
};

class Equal
{
    public:

        bool operator()(const std::unordered_set<std::string>& vec1, const std::unordered_set<std::string>& vec2) const
        {
            bool ret = true;

            if(vec1.size() != vec2.size())
            {
                ret = false;
            }
            else
            {
                bool found = true;
                for(const auto& to_search_str:vec1)
                {
                    if(vec2.find(to_search_str)==vec2.end())
                    {
                        found = false;
                        break;
                    }
                }

                ret = !found? false:true;
            }

            return ret;
        }
};

int main()
{
   const std::unordered_map<std::unordered_set<std::string>, int, Hash, Equal> mp = {
        {{"hello word"}, 1},
        {{"thanks a lot", "I need some help"}, 2},
        {{"how to make it better"}, 3},
  };
  
  std::unordered_set<std::string> to_search = {"I need some help", "thanks a lot"};

  auto itr = mp.find(to_search);
  if(itr!=mp.end())
  {
    std::cout <<"found:" << itr->second << std::endl;
  }
  else
  {
      std::cout << "not found" << std::endl;
  }

  to_search.erase(to_search.begin());
  auto itr1 = mp.find(to_search);
  if(itr1!=mp.end())
  {
    std::cout <<"found:" << itr1->second << std::endl;
  }
  else
  {
      std::cout << "not found" << std::endl;
  }
}

```
\$\endgroup\$
7
  • \$\begingroup\$ You do not need Equal; std::unordered_set already has an operator==() that checks if two sets are equal. \$\endgroup\$
    – G. Sliepen
    Commented May 27, 2022 at 9:36
  • \$\begingroup\$ @G.Sliepen I doubt that the said equal provided by default works for std::unordered_set<std::string>? You see the hash provided by default does not suit for std::unordered_set<std::string>. \$\endgroup\$
    – John
    Commented May 27, 2022 at 10:32
  • \$\begingroup\$ @G.Sliepen Though I really doubt, but after I did a test, I think you are right.Could you please explain that in more detail? \$\endgroup\$
    – John
    Commented May 27, 2022 at 10:41
  • \$\begingroup\$ @G.Sliepen Maybe, there is still some problem indeed. If it's std::vector<std::string> other than std::unordered_set<std::string>, The default equal would not work any more. See this. \$\endgroup\$
    – John
    Commented May 27, 2022 at 10:43
  • \$\begingroup\$ Correct, because order doesn't matter when comparing sets; if two sets contain the same items, they are equal. However, vectors preserve order, so two vectors containing the same items in different orders are not equal. \$\endgroup\$
    – G. Sliepen
    Commented May 27, 2022 at 17:56

Not the answer you're looking for? Browse other questions tagged or ask your own question.