Pre-processing data often simplifies ML training. This page is for highlighting Rust libraries that aid in pre-processing data via cleaning, normalization, transformation, as well as feature extraction and selection.
Fast CSV parsing with support for serde.
Rust implementation of Apache Arrow
DataFrame Library based on Apache Arrow
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
Thread-safe Rust bindings for the HDF5 library.
A low-latency data-parallel dataflow system in Rust
A multithreaded ETL with inspiration drawn from Keras.
ARFF file format serializer and deserializer
Data Preprocessing library for Machine Learning
A data handling library (designed for machine learning).
A rust interface to [OpenML](http://openml.org/).