mgreenblog

posts by category about this blog

Processing Semi-Structured Data in the Unix Shell

The Unix shell is incredibly powerful. I use it routinely for simple tasks (moving files around), routine work (grading scripts), and in my development process (building, deploying, etc.). When I'm working with text, the shell and its ecosystem is excellent: patching together cat, find, grep, sed, tr, and cut with shell pipelines and redirections is a convenient, expressive, and fast way to inspect and edit files.

But my shell toolchain is much less helpful when working with semi-structured data, like JSON and YAML. Folks have made wonderful contributions to the shell ecosystem to help---tools like jq and gron. These two tools provide new languages for manipulating JSON. It may be embarrassing to admit for a programming languages researcher, but... I'm kind of maxed out on new languages.

So I built a new tool that lets you use your usual shell tools to work with modern file formats: ffs, the file filesystem.

A GIF showing the following shell interaction, editing JSON in place.<p></p>~/ffs/demo $ echo '{}' >demo.json ~/ffs/demo $ ffs -i demo.json & [1] 56827 ~/ffs/demo $ cd demo ~/ffs/demo/demo $ echo 47 >favorite_number ~/ffs/demo/demo $ mkdir likes ~/ffs/demo/demo $ echo true >likes/dogs ~/ffs/demo/demo $ echo false >likes/cats ~/ffs/demo/demo $ touch mistakes ~/ffs/demo/demo $ echo Michael Greenberg >name ~/ffs/demo/demo $ echo https://mgree.github.io >website ~/ffs/demo/demo $ cd .. ~/ffs/demo $ umount demo ~/ffs/demo $ [1]+  Done                    ffs -i demo.json /ffs/demo $ cat demo.json {"favorite_number":47,"likes":{"cats":false,"dogs":true},"mistakes":null,"name":"Michael Greenberg","website":"https://mgree.github.io"}/ffs/demo $ ~/ffs/demo $

Editing JSON in place using ffs.

ffs lets you mount semi-structured data as a filesystem: objects and lists correspond to directories, while other types correspond to regular files. You can mount a file in one format, edit the filesystem, and write it back in another.

All you need to run ffs is FUSE, a kernel module that supports userspace filesystem. You'll want libfuse on Linux, or macFUSE on macOS. Download a binary and play around!

Michael Greenberg (mike@weaselhat.com)

Thanks! I had never heard of 9p, and it seems cool! I had some trouble finding details, but my impression is that FUSE is faster than 9p, and FUSE itself is already substantially slower than in-kernel filesystems. Bento seems like a cool option (and it's Rust based!). Please feel free to chime in on my tracking issue if you have thoughts on performance!

I'm hopeful that fuser will just get Windows support (https://github.com/cberner/fuser/issues/129) and I can use that.