I was double checking that accumulator_set is essentially a zero-cost abstraction — But I was surprised to find that's not quite the case!
accumulator_set<double, features<tag::mean, tag::lazy_variance>> acc;
struct HandRolled { std::size_t count; double sum; double sum_sq; };
static_assert(sizeof(HandRolled) == 24);
static_assert(sizeof(acc) == 40); // 66% overhead!!!
I think the overhead comes from the fusion cons list fusion::cons<A, cons<B, cons<C, nil_>>>. Each of the child elements is a full object, so the compiler can't optimize out empty types:
// boost/fusion/container/list/cons.hpp:142-143
car_type car;
cdr_type cdr;
| impl |
stores |
cost after padding in cons |
count_impl |
size_t |
8 |
sum_impl |
double |
8 |
moment<2>_impl |
double |
8 |
mean_impl |
|
8 |
lazy_variance_impl |
|
8 |
I honestly hoped I could attempt a two line fix in cons.hpp:
- car_type car;
- cdr_type cdr;
+ BOOST_ATTRIBUTE_NO_UNIQUE_ADDRESS car_type car;
+ BOOST_ATTRIBUTE_NO_UNIQUE_ADDRESS cdr_type cdr;
I hotpatched /usr/include/boost/fusion/container/list/cons.hpp, but I couldn't observe any actual memory savings with this on clang or gcc.
Environment
- Boost 1.89.0
- GCC 15.2 / Clang 19
- x86-64 Linux (Arch)
Is this problem worth solving? If I spent longer on this to try and filter out empty types eg. mean_impl lazy_variance_impl at compile time, would this PR be accepted?
I was double checking that
accumulator_setis essentially a zero-cost abstraction — But I was surprised to find that's not quite the case!I think the overhead comes from the fusion cons list
fusion::cons<A, cons<B, cons<C, nil_>>>. Each of the child elements is a full object, so the compiler can't optimize out empty types:// boost/fusion/container/list/cons.hpp:142-143 car_type car; cdr_type cdr;count_implsize_tsum_impldoublemoment<2>_impldoublemean_impllazy_variance_implI honestly hoped I could attempt a two line fix in
cons.hpp:I hotpatched
/usr/include/boost/fusion/container/list/cons.hpp, but I couldn't observe any actual memory savings with this on clang or gcc.Environment
Is this problem worth solving? If I spent longer on this to try and filter out empty types eg.
mean_impllazy_variance_implat compile time, would this PR be accepted?