書名： Mastering the C++17 STL
作者名： Arthur O'Dwyer
本章字數： 495字
更新時間： 2021-07-08 10:20:23

Read-only range algorithms

In the preceding chapters, we built up an algorithm that we called distance and another called count_if. Both of these algorithms appear in the standard library.

std::count_if(a,b,p) returns the number of elements between a and b that satisfy the predicate function p--that is, the number of elements e for which p(e) is true.

Notice that, whenever we say "between a and b", we're talking about the range that includes *a but does not include *b--what mathematicians call a "half-open range" and represented by the asymmetrical notation [a,b). Why should we not include *b? Well, for one thing, if b is the end() of some vector, then it doesn't point to an element of that vector at all! So in general, dereferencing the end point of a range is a dangerous thing to do. For another thing, using half-open ranges conveniently allows us to represent empty ranges; for example, the range "from x to x" is an empty range consisting of zero data elements.

Half-open ranges are quite natural in C++ just as they are in C. For decades, we've been writing for-loops that range from a lower bound (inclusive) to an upper bound (exclusive); this idiom is so common that deviation from the idiom often indicates a bug:

    constexpr int N = 10;
    int a[N];

    // A correct for-loop.
    for (int i=0; i < N; ++i) {
      // ...
    }

    // One variety of "smelly" for-loop.
    for (int i=0; i <= N; ++i) {
      // ... 
    }

    // A correct invocation of a standard algorithm.
    std::count_if(std::begin(a), std::end(a), [](int){ return true; });

    // A "smelly" invocation.
    std::count_if(std::begin(a), std::end(a) - 1, [](int){ return true; });

    // A "trivial" invocation: counting a range of length zero.
    std::count_if(std::begin(a), std::begin(a), [](int){ return true; });

std::distance(a,b) returns the number of elements between a and b--that is, the number of times you'd have to apply ++ to a in order to reach b. You could think of this function as being equivalent in its effects to std::count_if(a,b,[](auto&&){return true;}).

As we saw in Chapter 2, Iterators and Ranges, if the iterators in question are random-access iterators, this number can be quickly computed as (b - a), and so the standard std::distance will do so. Notice that (b - a) might be a negative number, if you gave the arguments in the "wrong" order!

    int a[] {1, 2, 3, 4, 5};
    std::list<int> lst {1, 2, 3, 4, 5};
    std::forward_list<int> flst {1, 2, 3, 4, 5};

    assert(std::distance(std::begin(a), std::end(a)) == 5);
    assert(std::distance(std::begin(lst), std::end(lst)) == 5);
    assert(std::distance(std::begin(lst), std::end(lst)) == 5);

    assert(std::distance(std::end(a), std::begin(a)) == -5);

When the iterators are random-access iterators, std::distance does nothing more than subtract them; so passing in "wrongly ordered" arguments is explicitly supported and blessed by the C++ standard. However, if the iterators in question are merely bidirectional iterators (such as std::list<int>::iterator--see Chapter 4, The Container Zoo), "wrongly ordered" iterators are not supported. You might expect that std::distance(b,a) == -std::distance(a,b) should be true of all iterator types; but consider, how would the std::distance algorithm itself have any idea whether the iterators you gave it were "wrongly ordered" or not? The only thing it can do (in the absence of an operator-) is to keep incrementing a--perhaps past the end of the container and off into space--in the vain hope that it'll eventually reach b:

    // The following line gives an "incorrect" answer!
    // assert(std::distance(std::end(lst), std::begin(lst)) == 1);
    // And this one just segfaults!
    // std::distance(std::end(flst), std::begin(flst));

Consult the diagrams of std::list and std::forward_list i n Chapter 4, The Container Zoo , to understand this code sample's odd behavior.

std::count(a,b,v) returns the number of elements between a and b that are equal to v--that is, the number of elements e for which e == v is true. You can think of this function as being equivalent in its effects to std::count_if(a,b,[&v](auto&& e){return e == v;}), and in fact both versions should give the same assembly code. If C++ had had lambda-expressions in 1998, they probably wouldn't have put the std::count algorithm in the standard library.

Notice that std::count(a,b,v) necessarily loops over all of the elements in the range between a and b. It can't take advantage of special information you might have about the arrangement of the data in the range. For example, suppose I want to count the instances of 42 in a std::set<int>? I could write the code in either of the following ways:

    std::set<int> s { 1, 2, 3, 10, 42, 99 };
    bool present;

    // O(n): compare each element with 42
    present = std::count(s.begin(), s.end(), 42);

    // O(log n): ask the container to look up 42 itself
    present = s.count(42);

The raw algorithm std::count is outperformed by the second approach, which simply asks the set itself for the answer. This turns a O(n) traversal of the whole set into a O(log n) tree lookup. Similarly, std::unordered_set provides a count method that is roughly O(1).

For more about these containers, see Chapter 4, The Container Zoo; the takeaway point here right now is that, Q sometimes there is important structure in your data that can be exploited by choosing the proper tool for the job. Even though I'm pointing to cases where the standard algorithms seem to "magically" do the right thing (as with std::distance delegating to (b - a)), you should not imagine that this "magic" stretches farther than it does. The standard algorithms know only as much as they're told, which is to say, only about the properties of the iterator types you pass them. They'll never change their behavior based on the relationships of the underlying data elements to each other. Arranging your code to exploit relationships in the underlying data (for example, "this data is sorted," "this range spans the entire container") is part of your job as the programmer.

Here are some more algorithms similar to std::count and std::count_if.

std::find(a,b,v) and std::find_if(a,b,p) work just like std::count(a,b,v) and std::count_if(a,b,p) respectively, except that, rather than looping over the entire range and returning the count of matching elements, the find variants loop only until they've found the first match, and then return an iterator to the data element that matched. There is also a variant find_if_not that is just like find_if but with the sense of the predicate negated; this variant also probably wouldn't have needed to exist if we'd gotten lambdas earlier in the history of C++:

    template<class InputIterator, class UnaryPredicate>
    InputIterator find_if(InputIterator first, InputIterator last,
      UnaryPredicate p) 
    {
      for (; first != last; ++first) {
        if (p(*first)) {
          return first;
        }
      }
      return last;
    }

    template<class It, class U>
    It find_if_not(It first, It last, U p) {
      return std::find_if(first, last, [&](auto&& e){ return !p(e); }); 
    }

    template<class It, class T>
    It find(It first, It last, T value) {
      return std::find_if(first, last, [&](auto&& e)
        { return e == value; }); 
    }

Notice that because find returns immediately upon finding the first match, it's faster on average than the count algorithm (which scans the whole range no matter what). This kind of "return immediately" behavior is often referred to as "short-circuiting".

std::all_of(a,b,p), std::any_of(a,b,p), and std::none_of(a,b,p) return either true or false, depending on how often the provided predicate function p is true of the elements in the range. They can all be built on top of the find algorithms, thus picking up the short-circuiting behavior for free:

    template<class It, class UnaryPredicate>
    bool all_of(It first, It last, UnaryPredicate p)
    {
      return std::find_if_not(first, last, p) == last;
    }

    template <class It, class U>
    bool any_of(It first, It last, U p)
    {
      return std::find_if(first, last, p) != last;
    }

    template <class It, class U>
    bool none_of(It first, It last, U p)
    {
      return std::find_if(first, last, p) == last;
    }

There is one more find-related algorithm I should mention: find_first_of. It implements the operation of "looking in a sequence for the first occurrence of any of a fixed set of target elements"--that is, just like strcspn in the C standard library, but for any type, not just char. Abstractly speaking, find_first_of takes two conceptual parameters: the range to search in, and the set of target elements. This being the STL, they're both passed in as ranges, which is to say, pairs of iterators. So a call to this algorithm looks like find_first_of(haystack, haystack, needle, needle): two pairs of iterators side by side. This can get confusing--beware of algorithms taking multiple similar parameters!

    template <class It, class FwdIt>
    It find_first_of(It first, It last, FwdIt targetfirst,
      FwdIt targetlast)
    {
      return std::find_if(first, last, [&](auto&& e) {
        return std::any_of(targetfirst, targetlast, [&](auto&& t) {
          return e == t;
        });
      });
    }

    template <class It, class FwdIt, class BinaryPredicate>
    It find_first_of(It first, It last, FwdIt targetfirst,
      FwdIt targetlast, BinaryPredicate p)
    {
      return std::find_if(first, last, [&](auto&& e) {
        return std::any_of(targetfirst, targetlast, [&](auto&& t) {
          return p(e, t);
        });
      });
    }

Notice that the "haystack" iterators are expected to be of any old InputIterator type, but the "needle" iterators are required to be at least ForwardIterator. Recall from Chapter 2, Iterators and Ranges, that the big thing about ForwardIterator types is that they can be meaningfully copied, letting the same range be traversed multiple times. This is exactly what find_first_of needs! It traverses the "needle" range once per character in the "haystack"; so the "needle" must be re-traversable--and incidentally, must be finite in size! Contrariwise, there's nothing particularly requiring that the "haystack" be finite; it might be pulling its elements from a potentially unbounded input stream:

    std::istream_iterator<char> ii(std::cin);
    std::istream_iterator<char> iend{};
    std::string s = "hello";

    // Chomp characters from std::cin until finding an 'h', 'e', 'l', or 'o'.
    std::find_first_of(ii, iend, s.begin(), s.end());

Speaking of multiple similar parameters, let's finish our look at simple read-only algorithms with these two: std::equal and std::mismatch.

std::equal(a,b,c,d) takes two iterator-pairs: the range [a,b) and the range [c,d). It returns true if the two ranges are element-for-element equal, and false otherwise.

std::mismatch(a,b,c,d) is sort of like find: it'll tell you exactly which pair of elements was the one that torpedoed the match:

    template<class T> constexpr bool is_random_access_iterator_v =
      std::is_base_of_v<std::random_access_iterator_tag, typename 
      std::iterator_traits<T>::iterator_category>;

    template<class It1, class It2, class B>
    auto mismatch(It1 first1, It1 last1, It2 first2, It2 last2, B p)
    {
      while (first1 != last1 && first2 != last2 && p(*first1, *first2)) {
        ++first1;
        ++first2;
      }
      return std::make_pair(first1, first2);
    }

    template<class It1, class It2>
    auto mismatch(It1 first1, It1 last1, It2 first2, It2 last2)
    {
      return std::mismatch(first1, last1, first2, last2, std::equal_to<>{});
    }

    template<class It1, class It2, class B>
    bool equal(It1 first1, It1 last1, It2 first2, It2 last2, B p)
    {
      if constexpr (is_random_access_iterator_v<It1> &&
        is_random_access_iterator_v<It2>) {
        // Ranges of different lengths can never be equal.
        if ((last2 - first2) != (last1 - first1)) {
          return false;
        }
      }
      return std::mismatch(first1, last1, first2, last2, p) ==
        std::make_pair(last1, last2);
    }

    template<class It1, class It2>
    bool equal(It1 first1, It1 last1, It2 first2, It2 last2)
    {
      return std::equal(first1, last1, first2, last2, std::equal_to<>{});
    }

Notice the use of std::equal_to<>{} as a predicate object; we won't cover the built-in predicates in depth in this book, so just take it for granted that std::equal_to<>{} is an object whose behavior is similar to [](auto a, auto b){ return a == b; } but with more perfect forwarding involved.

Finally, watch out again! Many of the two-range algorithms in the C++17 standard library also have variant forms colloquially known as one-and-a-half-range algorithms. For example, in addition to std::mismatch(a,b,c,d) you'll find std::mismatch(a,b,c)--the second range's "end" point is simply assumed to be at c + std::distance(a, b). If c actually points into a container where c + std::distance(a, b) would be "off the end," then, tough luck!

Because "tough luck" is never a really great answer to a technical question, the C++17 standard added safe two-range variants for many of the one-and-a-half-range algorithms that had existed in C++14.

官术网_书友最值得收藏!

Mastering the C++17 STL

Read-only range algorithms