Pre-processing data often simplifies ML training. This page is for highlighting Rust libraries that aid in pre-processing data via cleaning, normalization, transformation, as well as feature extraction and selection.
Fast CSV parsing with support for serde.
Rust implementation of Apache Arrow
DataFrame Library based on Apache Arrow
Thread-safe Rust bindings for the HDF5 library.
A low-latency data-parallel dataflow system in Rust
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
A multithreaded ETL with inspiration drawn from Keras.
Data Preprocessing library for Machine Learning
ARFF file format serializer and deserializer
A rust interface to [OpenML](http://openml.org/).
A data handling library (designed for machine learning).