reflections on minnowbrook logic programming seminar 2025

Friday, May 30, 2025 at 4:05pm

Following a mysterious and generous invitation from Kristopher Micinski, four computer scientists (Annie Liu, Frank McSherry, Scott Stoller, and me) drove into deep upstate New York. The trees close around them as the road shrank from painted asphalt to a one-lane private road to a gravel driveway. What awaited them in the cedar-paneled log cabins strewn over the hill? Two days of interesting conversation about logic programming, databases, and static analysis.

A beautiful lake is dotted with islands; mountains surround. It's very green; the sky is very blue; the clouds are fluffy.

There were plenty of interesting talks---there's lots of cool work happening in logic programming! A few themes emerged:

Everyone beats souffle

Many talks evaluated against Soufflé, and people invariably beat it. You could take away from this "wow, Soufflé is no good," but I don't think that's the right idea. Soufflé is incredibly useful for the community, because it's a reasonably implemented system that everyone can compare against. A standard implementation serves as a lingua franca for the community---and a punching bag for your research.

With Bernhard Scholz spending his time at Fantom, it's not clear how long Soufflé can continue to set the standard. A well defined set of benchmarks and an implementation that beats Soufflé across the board feels important---without further development, Soufflé will be a less and less convincing punching bag.

Denis Bueno's talk on CTADL presented a pair of analyses that would make a good start. What else? Graph and RDF workloads? Other analyses?

Datalog is high friction

We talk about Datalog programs. Actually using Datalog as part of a real workflow is complicated! Soufflé interpretation is not so fast; the C++ template hell of compilation can be quite slow. Work like Datafrog, Ascent, and Differential Datalog address this problem by building Datalog into a real language (Rust); Flix and Formulog address this problem by building a realistic language around Datalog. There's room for plenty more work here.

Imagine using Datalog to define an analysis for an LSP, i.e., updating live in an editor: programs change non-monotonically! While most Datalogs would need to fully recompute with every change, Differential Datalog opens the door to adding and removing (and, so, modifying) facts. (Flowlog is another non-monotonic Datalog-like language, aimed at SDN; Dedalus is an even older one, aimed at distributed systems.) What's the right interface for adding, removing, and updating facts? Datalog is all DDL (data definition language: CREATE TABLE, CREATE VIEW)... what's the corresponding DML (data manipulation language: SELECT, INSERT, UPDATE, DELETE) interface? Existing Datalog queries are just a very narrow kind of SELECT; Datomic has a perspective on queries. What are other perspectives?

Relatedly, one of the costliest things in running a Datalog program is actually getting the data in. If Datalog is going to integrate with real programs, it should ideally do so cheaply: reusing existing program data structures, indices, and internment. If you make multiple calls to a Datalog program, indices and interned symbols should be reused. Kris referenced some work in this direction that I haven't yet had the chance to read: BYODS.

Join planning is evergreen

Datalog is the "Oops! All Joins" database. But there is clearly plenty of innovation happening around join plans and indices, whether it's finding index structures that work well on GPUs, using worst-case optimal joins, or simply trying to get Soufflé to generate good plans. (Without further direction, Soufflé will do joins left-to-right. Some attendees were surprised to learn this---like me!) We've already seen there are wins for evaluation order (eager evaluation in Formulog... beats Soufflé 😁), and I think we have many more years of fruitful research on join planning for logic programs.

Kris and the HARP Lab have been on an absolute tear

One of my biggest takeaways is that Kris and the HARP Lab have been doing very cool work---and quite a bit of it. I was particularly impressed with their work on GDLog, a GPU-based Datalog engine. Sowmith Kunapaneni gave an interesting and energetic talk about the internals. Their numbers are impressive---and they have some very simple engineering wins that should offer significant speedups. Something I like particularly about the GDLog work is that it combines quite a bit of computer science that is often isolated in subfields: GPU-based architectural thinking, careful data structure selection, compilation. And of course logic, join planning, and programming languages.

Regional seminars rule

Minnowbrook was loads of fun. It's a beautiful place---Kris was very generous to host us there. (Thanks, Kris!) Getting together with twenty motivated, smart people leads to very interesting conversations---and I didn't have to get on a plane! If there's an NJPLS or NEPLS or SoCal PLS in your area, I recommend attending. And if you can organize or attend a narrower workshop, I recommend that even more strongly!

Have a comment? Send me an email!