reflections on minnowbrook logic programming seminar 2025
Following a mysterious and generous invitation from Kristopher Micinski, four computer scientists (Annie Liu, Frank McSherry, Scott Stoller, and me) drove into deep upstate New York. The trees close around them as the road shrank from painted asphalt to a one-lane private road to a gravel driveway. What awaited them in the cedar-paneled log cabins strewn over the hill? Two days of interesting conversation about logic programming, databases, and static analysis.
There were plenty of interesting talks---there's lots of cool work happening in logic programming! A few themes emerged:
- Everyone beats Soufflé.
- Datalog is high friction: it has programs, not queries; data ingest is not cheap.
- Join planning is evergreen.
- Kris and the HARP lab have been on an absolute tear.
Everyone beats souffle
Many talks evaluated against Soufflé, and people invariably beat it. You could take away from this "wow, Soufflé is no good," but I don't think that's the right idea. Soufflé is incredibly useful for the community, because it's a reasonably implemented system that everyone can compare against. A standard implementation serves as a lingua franca for the community---and a punching bag for your research.
With Bernhard Scholz spending his time at Fantom, it's not clear how long Soufflé can continue to set the standard. A well defined set of benchmarks and an implementation that beats Soufflé across the board feels important---without further development, Soufflé will be a less and less convincing punching bag.
Denis Bueno's talk on CTADL presented a pair of analyses that would make a good start. What else? Graph and RDF workloads? Other analyses?
Datalog is high friction
We talk about Datalog programs. Actually using Datalog as part of a real workflow is complicated! Soufflé interpretation is not so fast; the C++ template hell of compilation can be quite slow. Work like Datafrog, Ascent, and Differential Datalog address this problem by building Datalog into a real language (Rust); Flix and Formulog address this problem by building a realistic language around Datalog. There's room for plenty more work here.
Imagine using Datalog to define an analysis for an LSP, i.e., updating live in an editor: programs change non-monotonically!
While most Datalogs would need to fully recompute with every change,
Differential Datalog opens the door to adding and removing (and, so, modifying) facts. (Flowlog is another non-monotonic Datalog-like language, aimed at SDN; Dedalus is an even older one, aimed at distributed systems.)
What's the right interface for adding, removing, and updating facts?
Datalog is all DDL (data definition language: CREATE TABLE
, CREATE VIEW
)... what's the corresponding DML (data manipulation language: SELECT
, INSERT
, UPDATE
, DELETE
) interface? Existing Datalog queries are just a very narrow kind of SELECT
; Datomic has a perspective on queries. What are other perspectives?
Relatedly, one of the costliest things in running a Datalog program is actually getting the data in. If Datalog is going to integrate with real programs, it should ideally do so cheaply: reusing existing program data structures, indices, and internment. If you make multiple calls to a Datalog program, indices and interned symbols should be reused. Kris referenced some work in this direction that I haven't yet had the chance to read: BYODS.
Join planning is evergreen
Datalog is the "Oops! All Joins" database. But there is clearly plenty of innovation happening around join plans and indices, whether it's finding index structures that work well on GPUs, using worst-case optimal joins, or simply trying to get Soufflé to generate good plans. (Without further direction, Soufflé will do joins left-to-right. Some attendees were surprised to learn this---like me!) We've already seen there are wins for evaluation order (eager evaluation in Formulog... beats Soufflé 😁), and I think we have many more years of fruitful research on join planning for logic programs.
Kris and the HARP Lab have been on an absolute tear
One of my biggest takeaways is that Kris and the HARP Lab have been doing very cool work---and quite a bit of it. I was particularly impressed with their work on GDLog, a GPU-based Datalog engine. Sowmith Kunapaneni gave an interesting and energetic talk about the internals. Their numbers are impressive---and they have some very simple engineering wins that should offer significant speedups. Something I like particularly about the GDLog work is that it combines quite a bit of computer science that is often isolated in subfields: GPU-based architectural thinking, careful data structure selection, compilation. And of course logic, join planning, and programming languages.
Regional seminars rule
Minnowbrook was loads of fun. It's a beautiful place---Kris was very generous to host us there. (Thanks, Kris!) Getting together with twenty motivated, smart people leads to very interesting conversations---and I didn't have to get on a plane! If there's an NJPLS or NEPLS or SoCal PLS in your area, I recommend attending. And if you can organize or attend a narrower workshop, I recommend that even more strongly!