Pre-processing data often simplifies ML training. This page is for highlighting Rust libraries that aid in pre-processing data via cleaning, normalization, transformation, as well as feature extraction and selection.
Fast CSV parsing with support for serde.
Rust implementation of Apache Arrow
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model
DataFrame library based on Apache Arrow
Provides an implementation of today's most used tokenizers, with a focus on performances and versatility.
Thread-safe Rust bindings for the HDF5 library.
A low-latency data-parallel dataflow system in Rust
Rust implementation to the PyTorch DataLoader
A multithreaded ETL with inspiration drawn from Keras.
ARFF file format serializer and deserializer
Data Preprocessing library for Machine Learning
A data handling library (designed for machine learning).
A rust interface to [OpenML](http://openml.org/).