Saturday, 26 July 2014

Python's in in c++ - Part 1: Expression Templates

A few years ago, my colleague and I, we had started Python project in our work. We were both hardcore c++ programmers, so it was a new experience. After learning Python's syntax and writing some code, we had compared ours c++ and Python insights. One nice feature in Python is consistency and its keyword in. If You want to check if some object o is in collection C, You only have to write:
if o in C: # code...
Similarly, iterating over any collection in Python is straightforward:
for e in C: # code... 
But in c++, it is complicated. Simple iteration has the form:
for(type_of_collection::iterator it=c.begin(); it!=c.end(); ++it) { /* code... */ }
We have been discussing c++'98 back then, so a new for loop was not in standard yet. Today, in c++11, You can write:
for(auto& o : C) { /* code... */ }
But the other use-case for Python's in keyword has not been addressed by c++ standard. Of course, You can use standard functions and methods, but it is not as consistent as in Python:
std::find(begin, end, value) // for any range
std::set<k>::find(value), std::map<k v="">::find(key) // for associative collections
std::binary_search(begin, end, value) // for sorted, random access collections
Lack of consistency is some kind of trade-off: by using different signatures You can test whether some object is in collection and obtain iterator to it in one pass over the collection. Once more, c++ chooses efficiency instead of simplicity. ;)

Back then I have thought that creating something similar to Python's in keyword in c++ would be a nice exercise. Sure, by using such construct ''in a Python's way'' You will sacrifice ability to test if element is in collection and obtain iterator to it in one pass over the collection. I was well aware of that trade-off. One evening spent on coding and prototype was ready. Few days ago, I recalled about my old code and decided to modernize it and show it on this blog. Full source code can be found on GitHub in 1.0.0 tag. Presented code is still a prototype and shouldn't be used in production code.

First, I will show usage of my solution. Although I could have written some unit tests, I decided not to do it. It is prototype and console output will suffice. The only source file looks like this:
#include <iostream>
#include <array>
#include <vector>
/* And includes for other standard collections: list, set, unordered_set, map, unordered_map */

// Has to go after standard headers...
#include "in.hpp"

template<typename C>
void check(const int e, const C& c) {
 std::cout
  << "\t" << e << ((e in c) ? " is" : " is not")
  << " in given collection: " << typeid(c).name() << std::endl;
}

int main() {
 std::cout << "Checking raw array:" << std::endl;
 int array[] = { 3, 2, 7, 10 };
 check(50, array);
 check(2, array);
 check(10, array);
 check(1, array);

 std::cout << "Checking vector:" << std::endl;
 const std::vector vec = { 3, 2, 7, 10 };
 check(50, vec);
 check(2, vec);
 check(10, vec);
 check(1, vec);

 std::cout << "Checking map:" << std::endl;
 const std::map om = { { 3, "three" }, { 2, "two" }, { 7, "seven" }, { 10, "ten" } };
 check(50, om);
 check(2, om);
 check(10, om);
 check(1, om);

 /* In the same way other standard collections are tested. */
}
There is a lot of code compared to one tiny line with in ''keyword'' in function check, which tests whether element is in collection and prints proper message to standard output. But how the c++'s in is working?

First of all, in is defined as a macro in in.hpp file:
#define in << ::mad::cppin::in_ <<
So, the macro only adds calls of operator << for object mad::cppin::in_. This object is instance of empty type in_type, for which two operator << are defined:
struct in_type {};

template<typename Element>
details::in_left<Element> operator<<(const Element& e, const in_type) {
 return details::in_left(e);
}

template<typename Collection>
details::in_right<Collection> operator<<(const in_type, const Collection& c) {
 return details::in_right(c);
}
If the in_ object is on the right hand side of an binary shift expression, then instance of details::in_left is returned in which value of other parameter of binary shift is stored. Similarly, if the in_ object is on the left hand side of binary shift operator, the instance of in_right class is returned. Both classes are defined as follows:
template<typename Element>
class in_left {
public:
 in_left(const Element& e) : e_(e) {}

 template<typename Collection> friend
 in_impl<Element, Collection> operator<<(const in_left<Element>& l, const Collection& c) {
  return in_impl<Element, Collection>(l.e_, c);
 }
private:
 const Element& e_;
};

template<typename Collection>
class in_right {
public:
 in_right(const Collection& c) : c_(c) {}

 template<typename Element> friend
 in_impl<Element, Collection> operator<<(const Element& e, const in_right<Collection>& r) {
  return in_impl<Element, Collection>(e, r.c_);
 }
private:
 const Collection& c_;
};

} // details
The purpose of above presented classes is to store arguments to the binary shift operators and remember side of binary shift operator, on which parameter was. If parameter was on the left side it is considered as an object to be searched for. If it was on the right hand side, then it is treated as a collection to be search in. Both classes define operator << for remaining part of the in expression. If the binary shift operator is called, the instance of in_impl class is returned. It is simple application of the expression templates idiom. The in_impl class is shown below:
template<typename Element, typename Collection>
class in_impl {
public:
 in_impl(const Element& e, const Collection& c)
  : c_(c), e_(e) {}

 explicit operator bool() const {
  return find_impl(e_, c_);
 }
private:
 const Collection& c_;
 const Element& e_;
};
It only stores its arguments and in context of boolean expression (such as if statement) returns result of function find_impl call. In this function the magic happens, which enables efficient working with different types of collections. The magic will be described in part 2.