std::lower_bound slower for std::vector than std::map::find

Ask Time：2012-01-09T14:38:38 Author：Mooing Duck

I wrote a class to act as a wrapper around a sequential container (std::vector/std::queue/std::list) to have the interface of a std::map, for performance when using small numbers of small objects. The coding was all remarkably simple given the algorithms that already existed. This code is obviously highly trimmed from my full code, but shows the problem.

template <class key_, 
          class mapped_, 
          class traits_ = std::less<key_>,
          class undertype_ = std::vector<std::pair<key_,mapped_> >
         >
class associative
{
public:
    typedef traits_ key_compare;
    typedef key_ key_type;
    typedef mapped_ mapped_type;
    typedef std::pair<const key_type, mapped_type> value_type;
    typedef typename undertype_::allocator_type allocator_type;
    typedef typename allocator_type::template rebind<value_type>::other value_allocator_type;
    typedef typename undertype_::const_iterator const_iterator;

    class value_compare {
        key_compare pred_;
    public:
        inline value_compare(key_compare pred=key_compare()) : pred_(pred) {}
        inline bool operator()(const value_type& left, const value_type& right) const {return pred_(left.first,right.first);}
        inline bool operator()(const value_type& left, const key_type& right) const {return pred_(left.first,right);}
        inline bool operator()(const key_type& left, const value_type& right) const {return pred_(left,right.first);}
        inline bool operator()(const key_type& left, const key_type& right) const {return pred_(left,right);}
        inline key_compare key_comp( ) const {return pred_;}
    };
    class iterator  {
    public:       
        typedef typename value_allocator_type::difference_type difference_type;
        typedef typename value_allocator_type::value_type value_type;
        typedef typename value_allocator_type::reference reference;
        typedef typename value_allocator_type::pointer pointer;
        typedef std::bidirectional_iterator_tag iterator_category;
        inline iterator(const typename undertype_::iterator& rhs) : data(rhs) {}
    inline reference operator*() const { return reinterpret_cast<reference>(*data);}
        inline pointer operator->() const {return reinterpret_cast<pointer>(structure_dereference_operator(data));}
        operator const_iterator&() const {return data;}
    protected:
        typename undertype_::iterator data;
    };

    template<class input_iterator>
    inline associative(input_iterator first, input_iterator last) : internal_(first, last), comp_() 
    {if (std::is_sorted(internal_.begin(), internal_.end())==false) std::sort(internal_.begin(), internal_.end(), comp_);}

inline iterator find(const key_type& key) {
    iterator i = std::lower_bound(internal_.begin(), internal_.end(), key, comp_);
    return (comp_(key,*i) ? internal_.end() : i);
}

protected:
    undertype_ internal_;
    value_compare comp_;
};

SSCCE at http://ideone.com/Ufn7r, full code at http://ideone.com/MQr0Z (note: resulting times as IdeOne are highly erratic, probably due to server load, and do not clearly show the results in question)

I tested with std::string, and PODs from 4 to 128 bytes, ranging from 8 to 2000 elements with MSVC10.

I expected higher performance for (1) creating from a range for small objects, (2) random insertion/erasure for small numbers of small objects, and (3) lookup for all objects. Surprisingly, the vector was significantly faster for creating from a range for all tests, and faster for random erasure depending on size up to about 2048 bytes (512 4-byte objects, or 128 16-byte objects, etc). However, most shocking of all, was that the std::vector using std::lower_bound was slower than the std::map::find for all PODs. The difference was miniscule for 4 and 8-byte PODs, and but for 128-byte PODs, std::vector was up to 36% slower! However, for std::string, the std::vector was 6% faster on average.

I feel like std::lower_bound on a sorted std::vector should have outperformed std::map due to better cache locality/smaller memory size, and since the map can be imperfectly balanced, or in the worst case it should match std::map, but can't for the life of me think of any reason that std::map should be faster. My only thought is the predicate is somehow slowing it down, but I can't figure out how. So the question is: How could it be that std::lower_bound on a sorted std::vector be outperformed by a std::map (in MSVC10)?

[EDIT] I've confirmed that std::lower_bound on std::vector<std::pair<4BYTEPOD,4BYTEPOD>> uses fewer comparisons on average than std::map<4BYTEPOD,4BYTEPOD>::find (by 0-0.25), but my implementation is still up to 26% slower.

[POST-ANSWER-EDIT] I made a SSCCE at http://ideone.com/41iKt that removes all unneeded fluff, and clearly shows that find on the sorted vector is slower than that of the map, by ~15%.

Author:Mooing Duck，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/8784732/stdlower-bound-slower-for-stdvector-than-stdmapfind

Dietmar Kühl :

This is a somewhat more interesting nut to crack! Before discussing my findings so far, let me point out that the associative::find() function behaves differently to std::map::find(): if the key isn't found the former returns the lower bound while the latter returns end(). To fix this, associative::find() needs to be changed to become something like this:\n\nauto rc = std::lower_bound(this->internal_.begin(), this->internal_.end(), key, this->comp_);\nreturn rc != this->internal_.end() && !this->comp_(key, rc->first)? rc: this->internal_.end();\n\n\nNow that we are more likely to compare apples to apples (I haven't verified if the logic is really correct now), let's go on to investigate the performance. I'm not quite convinced that the approach used to test performance really hold water but I'm sticking with it for now and I could definitely improve the performance of the associative container. I don't think I have quite found all performance issues in the code but, at least, made some progress. The biggest is to noticed that the comparison function used in the associative is pretty bad because it keeps making copies. This is putting this container somewhat at a disadvantage. If you are checking the comparator now you probably don't see it because it looks as if this comparator is passing by reference! The issue is actually rather subtle: the underlying container has a value_type of std::pair<key_type, mapped_type> but the comparator is taking std::pair<key_type const, mapped_type> as argument! Fixing this seems to give the associative container quite a bit of a performance boost.\n\nTo implement a comparator class which has not opportunity to fail matching the arguments exactly I'm using a simple helper to detect if a type is a std::pair<L, R>:\n\ntemplate <typename> struct is_pair { enum { value = false }; };\ntemplate <typename F, typename S> struct is_pair<std::pair<F, S>> { enum { value = true }; };\n\n\n... and then I replaced the comparator with this, slightly more complicated, one:\n\nclass value_compare {\n key_compare pred_;\npublic:\n inline value_compare(key_compare pred=key_compare()) : pred_(pred) {}\n template <typename L, typename R>\n inline typename std::enable_if<is_pair<L>::value && is_pair<R>::value, bool>::type\n operator()(L const& left, R const& right) const {\n return pred_(left.first,right.first);\n }\n template <typename L, typename R>\n inline typename std::enable_if<is_pair<L>::value && !is_pair<R>::value, bool>::type\n operator()(L const& left, R const& right) const {\n return pred_(left.first,right);\n }\n template <typename L, typename R>\n inline typename std::enable_if<!is_pair<L>::value && is_pair<R>::value, bool>::type\n operator()(L const& left, R const& right) const {\n return pred_(left,right.first);\n }\n template <typename L, typename R>\n inline typename std::enable_if<!is_pair<L>::value && !is_pair<R>::value, bool>::type\n operator()(L const& left, R const& right) const {\n return pred_(left,right);\n }\n inline key_compare key_comp( ) const {return pred_;}\n};\n\n\nThis generally gets the two approaches quite a bit closer together. Given that I would expect that the std::vector<T> with lower_bound() approach should be a lot better than using std::map<K, T> I feel the investigation isn't over, yet.\n\nAddendum:\n\nRethinking the exercise a bit more I spotted why I felt uncomfortable with the implementation of the predicate class: it is way to complex! This can be done a lot simpler by not using std::enable_if for a change: this nicely reduces the code to something which is much easier to read. The key is to get the key:\n\ntemplate <typename Key>\nKey const& get_key(Key const& value) { return value; }\ntemplate <typename Key, typename Value>\nKey const& get_key(std::pair<Key, Value> const& pair) { return pair.first; }\n\n\nWith this implementation to get hold of a \"key\" from either a value or a pair of values, the predicate object can just define one very simple function call operator:\n\ntemplate <typename L, typename R>\nbool operator()(L const& l, R const& r)\n{\n return this->pred_(get_key<key_type>(l), get_key<key_type>(r));\n}\n\n\nThere is a slight trick in this as well, though: the expected key_type needs to be passed to the get_key() function. Without this the predicate wouldn't work in cases where the key_type is itself a std::pair<F, S> of objects.",

2012-01-09T21:19:53

std::lower_bound slower for std::vector than std::map::find