Detecting content-independent transformations from observations is one of the important problems in biological and artificial intelligence. For example, decomposing raw image data into objects and transformations, when possible, provides a natural disentangled representation. The task of motion detection can be cast as a transformation learning problem. About two decades back, Rao and Ruderman formulated the problem of unsupervised learning of a visual motion detector from pairs of consecutive video frames form. I will revisit the problem of learning infinitesimal transformation operators (Lie group generators) from observations and discuss a neural network capable of performing this task. This unconstrained approach to transformation learning runs into obstacles as the potential number of generators gets large, and the number of base images on which the transformation is applied remains limited. I discuss when and how could one insert prior knowledge on the transformation invariances of a domain, with the related problem of diffeomorphism-aware image clustering in mind.