Nov 28, 2024; Talk
Echtzeit-AGA Novel Approach to Pipeline Processing
Pipeline based data processing is a common approach in various domains, from simple command line tools to database query executors. It can efficiently combine the performance advantages of processing multiple elements at once (SIMD) with the memory savings of not keeping an entire dataset in memory (batching). Pipelines are also a very intuitive and compact way of describing data transformation problems.
The talk will feature a novel system for describing and executing data transformation pipelines based on a data structure optimized for batch operations. This new approach will then be compared and benchmarked against a wide range of different data processing tools in the field (nushell, pandas, sqlite, ...) on a variety of representative usecases.