tabibu-walk — parallel traversal & size tree

Purpose

Fast, cancellable filesystem traversal for every Tabibu scanner that needs sizes or file streams: the space map (size_tree), dedupe / old-files (walk_files), and quick totals (dir_size). Symlinks are inspected with symlink_metadata and never followed, so cycles cannot loop and a link counts only as the link object itself.

API

Item Signature Notes
DirNode { path, size_bytes, is_dir, children } serde Serialize; children sorted by size desc
size_tree (root, &CancelToken, max_depth: Option<usize>) -> Result<DirNode, WalkError> root = depth 0; nodes beyond max_depth are pruned but still aggregate into ancestors
walk_files (root, &CancelToken, &(dyn Fn(&Path, &Metadata) + Sync)) -> Result<(), WalkError> callback fires for every regular file, concurrently
dir_size (root, &CancelToken) -> Result<u64, WalkError> size_tree(root, _, Some(0)).size_bytes
WalkError Cancelled | Root { path, source } entry-level IO/permission errors are skipped, never fatal; only an unreadable root errors

Concurrency model

Recursive divide-and-conquer on rayon's work-stealing pool. Each directory is read sequentially (one read_dir per directory), then its entries are mapped with into_par_iter(); subdirectory recursion is stolen by idle workers. No shared mutable state — each subtree returns its DirNode (or unit, for walk_files) up the join tree. The CancelToken (an Arc<AtomicBool>) is checked at every directory boundary; a cancelled walk short-circuits via Result collection / try_for_each.

flowchart TD
    A[size_tree root] --> B{symlink_metadata}
    B -- file/symlink --> C[leaf DirNode: len]
    B -- dir --> D{cancelled?}
    D -- yes --> E[Err Cancelled]
    D -- no --> F[read_dir, skip bad entries]
    F --> G[rayon into_par_iter over entries]
    G -- file/symlink --> C
    G -- subdir --> D
    G --> H[collect children Results]
    H --> I[sum sizes, sort desc]
    I --> J{depth >= max_depth?}
    J -- yes --> K[drop children, keep size]
    J -- no --> L[keep children]
    K & L --> M[DirNode up the join tree]

Benchmarks

benches/walk.rs (criterion, harness = false) builds a ~5,000-file tempdir fixture once and benches size_tree with unlimited depth and with max_depth = 1. Run: cargo bench -p tabibu-walk.