mad IT rants: August 2014

Tuesday, 5 August 2014

Python's in in c++ - Part 3: Testing and finding in one pass

In two parts: 1 and 2 I have shown simple implementation of Python's in keyword in c++. I have mentioned, that there are two issues with the presented solution:

After testing if some object is in a collection, You have to make a second pass over the collection to find searched object's position.
If collection is random access and it is sorted, searching can be done more efficient by std::binary_search algorithm.

The second problem requires compile-time check if the collection is sorted or prior knowledge. Both solutions are trade-offs, so I will not tackle this problem.

To deal with the first inefficiency I had merged finder and in_impl template classes into one template class with two specializations and common base class. These changes lead to the removal of find_impl function, which was no longer needed. To obtain iterator type for any collection I have created iterator_type metafunction.

The iterator_type metafunction is simple metafunction, which for arrays return pointer type and for other types, it uses decltype keyword to calculate the type of iterator for given collection type. Users can specialize iterator_type metafunction for their own classes. Source code of that utility metafunction is presented below.

template<typename Collection>
struct iterator_type {
 typedef decltype(std::end(Collection())) type;
};

template<typename T, size_t S>
struct iterator_type<T[S]> {
 typedef T* type;
};

Common base class for specializations of the new in_impl template class is called in_impl_base, and its purpose is to gather common code for both specializations. It is: storing iterators to a position of searched element and end of the collection, calculating whether the element is in the collection and returning position of the element. The source code presented below.

template<typename Collection>
class in_impl_base {
public:
 typedef typename iterator_type<Collection>::type iterator;

 in_impl_base(const iterator& position, const iterator& end)
  : position_(position), end_(end) {}

 explicit operator bool() const {
  return end_ != position_;
 }

 iterator position() const {
  return position_;
 }
private:
 iterator position_;
 iterator end_;
};

Specializations of the in_impl template class differ only by the value of HasMemberFind template parameter and means of obtaining a position of the searched element. If HasMemberFind equals to true, then Collection::find member function is used. In the other case, the std::find algorithm is applied. To simplify type name used in in_left and in_right classes, there is also template alias in_implementation defined. It is used by both above-mentioned classes instead of old in_impl class. Described code is presented below.

template<typename Element, typename Collection, bool HasMemberFind = false>
class in_impl : public in_impl_base<Collection> {
public:
 in_impl(const Element& e, const Collection& c)
  : in_impl_base<Collection>(std::find(std::begin(c), std::end(c), e), std::end(c)) {}
};

template<typename Element, typename Collection>
class in_impl<Element, Collection, true> : public in_impl_base<Collection> {
public:
 in_impl(const Element& e, const Collection& c)
  : in_impl_base<Collection>(c.find(e), std::end(c)) {}
};

template<typename Element, typename Collection> using in_implementation = in_impl<Element, Collection, has_member_find<Collection>::value>;

To test new features, I have changed the check function. By calling print it uses proper specialization of printer template class and prints value stored in the collection under obtained position. The body of the main function remained unchanged. Testing utilities are presented below.

template<typename ValueType>
struct printer {
 printer(const ValueType& val) {
  std::cout << "\t\tvalue = " << val << std::endl;
 }
};

template<typename Key, typename Value>
struct printer<std::pair<Key, Value>> {
 printer(const std::pair<Key, Value>& val) {
  std::cout << "\t\tvalue = (" << val.first << ", " << val.second << ")" << std::endl;
 }
};

template<typename ValueType> void print(const ValueType& val) {
 printer<ValueType> p = val;
}

template<typename C>
void check(const int e, const C& c) {
 if(auto pos = e in c) {
  std::cout << "\t" << e << " is in given collection: " << typeid(c).name() << std::endl;
  print(*pos.position());
 }
 else {
  std::cout << "\t" << e << " is not in given collection: " << typeid(c).name() << std::endl;
 }
}

These are all changes made to make a port of Python's in keyword into c++. It turned to be even slightly more powerful than its Python's predecessor as it can test element existence and obtain its position in one pass over the collection. Modified source code can be found on GitHub in 1.5.0 tag.

Saturday, 2 August 2014

Python's in in c++ - Part 2: Template Specializations

In part 1 I have presented expression templates which allowed creation of Python's in construct in c++. In this post I will present some template machinery, which made my in implementation quite efficient.

In part 1, it was shown that finding element in collection is done by find_impl function. Its code is presented below:

template<typename Element, typename Collection>
bool find_impl(const Element& e, const Collection& c) {
 return finder<Element, Collection, has_member_find<Collection>::value>()(e, c);
}

It only creates instance of finder template class and calls its function call operator. The finder class is parametrized by:

type of object searched in collection
type of collection
boolean flag, whether collection has efficient member function find

If given collection supports efficient lock-up, then its member function find is used by proper specialization of finder template. In other case, std::find algorithm is used. Code for both specializations is presented below:

template<typename Element, typename Collection, bool HasMemberFind = false>
class finder {
public:
 bool operator()(const Element& e, const Collection& c) const {
  using namespace std;
  return end(c) != find(begin(c), end(c), e);
 }
};

template<typename Element, typename Collection>
class finder {
public:
 bool operator()(const Element& e, const Collection& c) const {
  using namespace std;
  return end(c) != c.find(e);
 }
};

The tricky part was to create has_member_find template, because it has to deal with concrete types and templates - for instance standard collections. For concrete types has_member_find is a metafunction always returning false. User can define specialization for its own concrete types with efficient find member function. For templates has_member_find inherits is value from template_has_member_find template, which handles templates and for most of them is also a metafunction returning false. The template_has_member_find template is specialized for all standard associative containers to be a metafunction returning true. Code for both templates is presented below:

// Has a template for collection member function find?
template<template<typename...> class TCollection>
struct template_has_member_find : std::false_type {};

template<>
struct template_has_member_find<std::set> : std::true_type {};

template<>
struct template_has_member_find<std::multiset> : std::true_type {};

template<>
struct template_has_member_find<std::map> : std::true_type {};

template<>
struct template_has_member_find<std::multimap> : std::true_type {};

template<>
struct template_has_member_find<std::unordered_set> : std::true_type {};

template<>
struct template_has_member_find<std::unordered_multiset> : std::true_type {};

template<>
struct template_has_member_find<std::unordered_map> : std::true_type {};

template<>
struct template_has_member_find<std::unordered_multimap> : std::true_type {};


// Has a concrete type member function find?
template<typename Collection, typename... Args>
struct has_member_find : public std::false_type {};

template<template<typename...> class Collection, typename... Args>
struct has_member_find<Collection<Args...>> : template_has_member_find<Collection> {};

These are all building blocks, which were used to port Python's in keyword into c++. There are two main problems with presented solution:

After testing if some object is in collection, You have to make second pass over collection to find searched object's position.
If collection is a random access and it is sorted, searching can be done more efficient by std::binary_search algorithm.

The first issue will be addressed in the third part of this miniseries.

Presented code can be found on GitHub in 1.0.0 tag.