New Project: ETL Sheets
Experiments with spreadsheet-inspired UI
I recently started experimenting with building tooling for ETL systems. After many years of wrestling with ETL in industry, I had a few questions on my mind:
- Can we make common data issues quick to resolve?
- Can we make automated data transformations as easy to work with as spreadsheets?
I think the answer to both questions is yes, but don’t take my word for it – you can try the prototype at etlsheets.netlify.app (double-click on an issue to get started), and view the source code.
Motivation
Importing data at scale is painful. Your data providers will screw up the formatting, systems will experience connectivity issues, and your transformation logic will fail on cases you didn’t expect. What if our tools focused on helping with those failures, instead of assuming the happy path?
Speaking of transformations, how should we write them? Some systems take a code-first approach, which is great for coders and impenetrable for everyone else. Others take a GUI-driven approach, which usually becomes the stuff of nightmares. I think we can do better, by drawing inspiration from a tool that’s found in every office:
Transformations
Here’s what building a new transformation might look like, and here’s what it might look like when that transformation fails to run successfully.
This works very much like a spreadsheet, albeit with visually distinct stages/steps. New columns can be created by entering a simple formula, and existing rows can be filtered+altered with simple row-level formulae. Significantly, formula changes are reflected immediately in the data – this might seem like a trivial feature, but I think it’s a key part of usability. Bret Victor’s made an entire career out of his “creators need an immediate connection to what they’re creating” principle, and for good reason.
Under the hood, formulae are written in JavaScript with some syntactic sugar (to make it easier to refer to columns) and macros (for common uses like filtering). I’m not tied to JS in particular, but using a “real” programming language gives us a lot more power and flexibility. Interoperating with existing libraries+code becomes trivial.
Try opening up the failed extraction. Our counterparty has sent us an identifier with a typo (MSFY instead of MSFT), but we can just fix the data inline.
Extractions
Extractions are a bit less interesting than transformations, but we can go a long way with some relatively simple interfaces. We know that connectivity and formatting issues are common, so when they happen let’s show users exactly where the problem is:
Next Steps
To be honest, I’m not 100% sure where to take this next. The prototype makes for a great demo, but it would need a lot of work to make it production-ready. I currently run JS formulae in-browser because that was quick to develop, but I’d want to evaluate other languages and build out server-side execution. Change management needs some thought too; given that these transformation definitions are essentially code, do we serialize them and put them in source control?
These challenges are definitely surmountable, but they do require a lot of thought+work and I’ve mostly moved on to other projects. If you have any ideas or suggestions, give me a shout!