- Mastering the C++17 STL
- Arthur O'Dwyer
- 985字
- 2021-07-08 10:20:25
Deleting from a sorted array with std::remove_if
In all our discussion of standard generic algorithms up to this point, we haven't covered the question of how to remove items from a range. This is because the concept of "a range" is fundamentally read-only: we might change the values of the elements of a given range, but we can never use a standard algorithm to shorten or lengthen the range itself. When, in the Shunting data with std::copy section, we used std::copy to "insert into" a vector named dest, it wasn't the std::copy algorithm that was doing the inserting; it was the std::back_insert_iterator object itself that held a reference to the underlying container and was able to insert into the container. std::copy didn't take dest.begin() and dest.end() as parameters; instead it took the special object std::back_inserter(dest).
So how do we erase items from a range? Well, we can't. All we can do is erase items from a container; and the algorithms of the STL do not deal in containers. So what we ought to be looking for is a way to rearrange the values of a range so that the "removed" items will wind up somewhere predictable, so that we can quickly erase them all from the underlying container (using some means other than an STL algorithm).
We've seen one possible approach already:
std::vector<int> vec = {1, 3, 3, 4, 6, 8};
// Partition our vector so that all the non-3s are at the front
// and all the 3s are at the end.
auto first_3 = std::stable_partition(
vec.begin(), vec.end(), [](int v){ return v != 3; }
);
assert((vec == std::vector{1, 4, 6, 8, 3, 3}));
// Now erase the "tail" of our vector.
vec.erase(first_3, vec.end());
assert((vec == std::vector{1, 4, 6, 8}));
But this is much more wasteful than it needs to be (notice that stable_partition is one of those few STL algorithms that allocates a temporary buffer on the heap!). The algorithm we want is actually much simpler:
template<class FwdIt, class T>
FwdIt remove(FwdIt first, FwdIt last, const T& value)
{
auto out = std::find(first, last, value);
if (out != last) {
auto in = out;
while (++in != last) {
if (*in == value) {
// don't bother with this item
} else {
*out++ = std::move(*in);
}
}
}
return out;
}
void test()
{
std::vector<int> vec = {1, 3, 3, 4, 6, 8};
// Partition our vector so that all the non-3s are at the front.
auto new_end = std::remove(
vec.begin(), vec.end(), 3
);
// std::remove_if doesn't preserve the "removed" elements.
assert((vec == std::vector{1, 4, 6, 8, 6, 8}));
// Now erase the "tail" of our vector.
vec.erase(new_end, vec.end());
assert((vec == std::vector{1, 4, 6, 8}));
// Or, do both steps together in a single line.
// This is the "erase-remove idiom":
vec.erase(
std::remove(vec.begin(), vec.end(), 3),
vec.end()
);
// But if the array is very long, and we know it's sorted,
// then perhaps it would be better to binary-search for
// the elements to erase.
// Here the "shifting-down" is still happening, but it's
// happening inside vector::erase instead of inside std::remove.
auto first = std::lower_bound(vec.begin(), vec.end(), 3);
auto last = std::upper_bound(first, vec.end(), 3);
vec.erase(first, last);
}
std::remove(a,b,v) removes all values equal to v from a range [a,b). Notice that the range does not have to be sorted--but remove will preserve whatever order was there before, by "shifting down" the non-removed elements to fill in the gaps in the range. If remove removes k elements from the range, then when the remove function returns, there will be k elements at the end of the range whose values are in the moved-from state, and return value of remove will be an iterator pointing to the first such moved-from element.
std::remove_if(a,b,p) removes all elements satisfying the given predicate; that is, it removes all elements e such that p(e) is true. Just like remove, remove_if shifts elements down to fill in the range and returns an iterator to the first "moved-from" element.
The common idiom for removing items from a sequence container is what's known as the erase-remove idiom, because it involves passing that return value straight into the container's own .erase() member function.
Another standard library algorithm that works with the erase-remove idiom is std::unique(a,b), which takes a range and, for each set of consecutive equivalent items, removes all but the first of them. Like std::remove, the input range doesn't need to be sorted; the algorithm will preserve whatever ordering was there to begin with:
std::vector<int> vec = {1, 2, 2, 3, 3, 3, 1, 3, 3};
vec.erase(
std::unique(vec.begin(), vec.end()),
vec.end()
);
assert((vec == std::vector{1, 2, 3, 1, 3}));
Finally, notice that we can often do better than std::remove in general, either by using the erase member function of whatever our underlying container is (for example, we'll see in the next chapter how std::list::erase can be much faster than the erase-remove idiom on a std::list)--and even if we're removing from a vector whose order happens not to be significant, we'll still usually be better off with something like the following generic algorithm unstable_remove, which has been proposed for future standardization but (at the time of writing) not yet adopted into the STL:
namespace my {
template<class BidirIt, class T>
BidirIt unstable_remove(BidirIt first, BidirIt last, const T& value)
{
while (true) {
// Find the first instance of "value"...
first = std::find(first, last, value);
// ...and the last instance of "not value"...
do {
if (first == last) {
return last;
}
--last;
} while (*last == value);
// ...and move the latter over top of the former.
*first = std::move(*last);
// Rinse and repeat.
++first;
}
}
} // namespace my
void test()
{
std::vector<int> vec = {4, 1, 3, 6, 3, 8};
vec.erase(
my::unstable_remove(vec.begin(), vec.end(), 3),
vec.end()
);
assert((vec == std::vector{4, 1, 8, 6}));
}
In the next chapter, we'll look at containers--the STL's answer to the question, "Where are all these elements being stored, anyway?"