GitHub - brody-0125/dart_tensor_preprocessing: Tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime, TFLite, and other AI inference engines.
Key Points
- 1`dart_tensor_preprocessing` is a Flutter/Dart library providing a NumPy-like tensor preprocessing pipeline for AI inference engines like ONNX Runtime and TFLite, designed for PyTorch compatibility.
- 2It offers a comprehensive set of operations including resizing, normalization, layout manipulation, and activations, with features like non-blocking asynchronous execution, type-safety, and zero-copy view operations.
- 3The library emphasizes high performance through SIMD acceleration for core operations, memory efficiency via buffer pooling, and provides preset pipelines for common AI models.
The dart_tensor_preprocessing library is a Dart/Flutter utility designed for efficient and NumPy-like tensor manipulation and preprocessing, primarily intended for preparing input data for AI inference engines such as ONNX Runtime and TFLite. Its core objective is to provide a comprehensive, high-performance, and PyTorch-compatible set of tensor operations.
The library's core methodology revolves around the TensorBuffer class, which represents a multi-dimensional array with metadata including shape, stride, and dtype over a physical memory buffer. DType aligns with ONNX-compatible data types (e.g., DType.float32, DType.uint8), ensuring seamless integration with various inference runtimes.
Operations are categorized and implemented as distinct classes or methods, enabling a declarative pipeline approach. Key operation categories include:
- Resize & Crop: Functions like
ResizeOp(supporting bilinear, bicubic, area, lanczos modes),CenterCropOp,ClipOp(element-wise clamping),PadOp, andSliceOpfor region extraction.ResizeNormalizeFusedOpcombines resizing and normalization for efficiency. - Normalization:
NormalizeOp(e.g., ImageNet mean/std),ScaleOp(e.g., to ), and various deep learning normalization layers likeBatchNormOp,LayerNormOp,GroupNormOp,InstanceNormOp, andRMSNormOpfor preparing data distributions. - Layout Manipulation:
PermuteOp(e.g., HWC to CHW),ToTensorOp(scales uint8 HWC to float32 CHW and normalizes), andToImageOp. The library explicitly manages memory layouts (e.g., NCHW vs. NHWC) by adjusting strides. For an NCHW tensor with shape , its strides are for contiguous memory. For an NHWC tensor, strides might be (or similar depending on memory layout and padding). - Data Augmentation: Includes
RandomCropOpandGaussianBlurOp. - Activation Functions: A suite of common activation functions like
ReLUOp,LeakyReLUOp,GELUOp,SiLUOp,HardsigmoidOp,HardswishOp,MishOp,ELUOp,SigmoidOp,TanhOp, andSoftmaxOp. - Math and Arithmetic Operations: Element-wise operations such as
AbsOp,NegOp,SqrtOp,ExpOp,LogOp,PowOp,AddOp,SubOp,MulOp, andDivOp. - Shape Manipulation:
UnsqueezeOp(adds dimension),SqueezeOp(removes size-1 dimensions),ReshapeOp, andFlattenOp. - Utility:
concat()andstack()for combining tensors.
The library prioritizes performance through several key methodologies:
- Zero-Copy View Operations: Many fundamental operations (
transpose(),reshape(),squeeze(),unsqueeze(),sliceFirst(),unbind(),select(),narrow(),toChannelsLast(),toChannelsFirst(),flatten()) achieve time complexity. This is done by manipulating theshapeandstridemetadata of theTensorBufferrather than copying the underlying data, allowing for efficient memory access and avoiding reallocation overhead. For example,tensor.transpose([2, 0, 1])changes only strides to modify the perceived axis order without moving elements. - SIMD Acceleration: Selected element-wise operations (e.g.,
AbsOp,ClipOp,NormalizeOp,ReLUOp,ScaleOp,AddOp,MulOp) leverage Dart'sFloat32x4andFloat64x2vector types for Single Instruction, Multiple Data (SIMD) processing. This provides significant throughput improvements, reporting up to 4x speedup for Float32 operations by processing 4 elements concurrently. - Buffer Pooling: The
BufferPoolmechanism reduces Garbage Collection (GC) pressure by reusing pre-allocated memory buffers for temporary tensor data in hot paths, optimizing memory management. - In-Place Operations: Many operations support an
applyInPlace()method, modifying the tensor directly to eliminate intermediate memory allocations, thereby reducing memory footprint and improving performance for sequential transformations. - Fused Operations:
ResizeNormalizeFusedOpexemplifies the fusion of multiple operations into a single pass, which eliminates the need for intermediate tensor allocations and data copies, further optimizing performance. - Asynchronous Execution with Isolates: For larger tensors or performance-critical applications, the
runAsync()method offloads processing to a DartIsolate. This prevents UI jank by executing computation on a separate thread, providing responsiveness, although it introduces an initial overhead for isolate communication. Small tensors (below a configurableisolateThreshold) are processed synchronously to avoid this overhead.
The library emphasizes PyTorch Compatibility, ensuring that its operations yield identical results to their PyTorch/torchvision equivalents. This is crucial for developers porting models trained in PyTorch and requiring consistent preprocessing. Examples include TensorBuffer.zeros() matching torch.zeros(), tensor.transpose() matching tensor.permute(), and NormalizeOp.imagenet() matching transforms.Normalize(mean, std).
Developers can construct preprocessing pipelines using TensorPipeline by chaining operations (e.g., ResizeOp, ToTensorOp, NormalizeOp, UnsqueezeOp). The library also offers PipelinePresets for common AI tasks like imagenetClassification() or objectDetection(), providing pre-configured pipelines.
Performance benchmarks indicate that zero-copy operations complete in microseconds (e.g., transpose() at ~1Β΅s) and achieve millions of operations per second. Complex pipelines like ImageNet classification take a few milliseconds (e.g., ~3.0ms for a input).