Pre-processing data often simplifies ML training. This page is for highlighting Rust libraries that aid in pre-processing data via cleaning, normalization, transformation, as well as feature extraction and selection.
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
A low-latency data-parallel dataflow system in Rust
Fast CSV parsing with support for serde.
opendifferentialprivacy/whitenoise-core [ repo · ]
Differential privacy validator and runtime
Thread-safe Rust bindings for the HDF5 library.
Data Preprocessing library for Machine Learning
A multithreaded ETL with inspiration drawn from Keras.
A rust interface to [OpenML](http://openml.org/).
ARFF file format serializer and deserializer
A data handling library (designed for machine learning).