New Project: ETL Sheets

Experiments with spreadsheet-inspired UI

I recently started experimenting with building tooling for ETL systems. After many years of wrestling with ETL in industry, I had a few questions on my mind:

  1. Can we make common data issues quick to resolve?
  2. Can we make automated data transformations as easy to work with as spreadsheets?

I think the answer to both questions is yes, but don’t take my word for it – you can try the prototype at etlsheets.netlify.app (double-click on an issue to get started), and view the source code.

Motivation

Importing data at scale is painful. Your data providers will screw up the formatting, systems will experience connectivity issues, and your transformation logic will fail on cases you didn’t expect. What if our tools focused on helping with those failures, instead of assuming the happy path?

Speaking of transformations, how should we write them? Some systems take a code-first approach, which is great for coders and impenetrable for everyone else. Others take a GUI-driven approach, which usually becomes the stuff of nightmares. I think we can do better, by drawing inspiration from a tool that’s found in every office:

Transformations

Here’s what building a new transformation might look like, and here’s what it might look like when that transformation fails to run successfully.

Managing Postgres with Pulumi and Terraform

Adventures with Infrastructure as Code

I’m a big fan of Pulumi, and I’ve previously used it for cloud infrastructure. That’s their main selling point; Pulumi’s slogan is literally “Program the cloud.” Interestingly, Pulumi can also be used to manage on-premise infrastructure, including PostgreSQL databases. Let’s dive into the details.

Pulumi basics

Pulumi takes a declarative infrastructure definition and handles provisioning said infrastructure, just like Terraform, AWS CloudFormation, and Azure Resource Manager. The main difference is that in Pulumi you’re writing real code (Node, Python, or Go) to build that infrastructure definition instead of a YAML or JSON file, it’s great.

Pulumi/Terraform PostgreSQL Provider

After Pulumi has executed your code to build up an infrastructure definition, it needs to interact with external infrastructure resources (cloud providers, databases, etc.) to turn that definition into a reality. Some of Pulumi’s functionality for working with external resources is derived from Terraform providers. For example, Pulumi has a PostgreSQL provider which is derived from the Terraform PostgreSQL provider.

At a high level, the PostgreSQL provider lets you define databases, roles, schemas, permissions, and PostgreSQL extensions. It stops short of letting you manage DDL objects and data within the database (which is probably for the best given the complexities of schema+data migrations).

Defining a Postgres database with TypeScript

Pulumi can be used to instantiate entire Postgres clusters, but for this tutorial let’s assume you already have an existing cluster managed outside of Pulumi. It can be running anywhere: in the cloud, in a container, on your local machine, whatever.

First, install Pulumi then instantiate a new TypeScript project with pulumi new typescript. This scaffolds a bare-bones Node+TypeScript Pulumi project with the following files:

index.ts                package-lock.json       tsconfig.json
Pulumi.yaml             node_modules            package.json

Install the Pulumi PostgreSQL provider with npm i @pulumi/postgresql (or just edit your package.json directly).

Next, we need to tell Pulumi which Postgres cluster to connect to and how. The PostgreSQL provider’s configuration points are documented here. My Postgres cluster is running on a local VM with the hostname fedora-vm, so I set the postgresql:host variable like so:

pulumi config set postgresql:host fedora-vm

This creates a YAML configuration file for the current Pulumi stack, or modifies one if it already exists. For my dev stack, this creates a file named Pulumi.dev.yaml with the following contents:

config:
  postgresql:host: fedora-vm

You’ll want to do the same for postgresql:username and postgresql:password, to tell Pulumi which credentials to use. Sensitive configuration values like passwords can be encrypted using the --secret flag.

Finally, we’re ready to work with the fun stuff: configuring a database in real TypeScript code. Let’s say we want to instantiate a Pulumi-managed database, create a role with login permissions, and grant the role SELECT permission on tables in the public schema in the new database. This can all be done in the index.ts file like so:

import * as pulumi from "@pulumi/pulumi";
import * as postgresql from "@pulumi/postgresql";

const config = new pulumi.Config();

const managedDatabase = new postgresql.Database("managedDatabase", {
  name: "pulumi"
});

const publicReaderRole = new postgresql.Role("publicReaderRole", {
  login: true,
  name: "public_reader",
  password: config.require("publicReaderPassword")
});

const publicReaderSelectTablesGrant = new postgresql.Grant(
  "publicReaderSelectTablesGrant",
  {
    database: managedDatabase.name,
    objectType: "table",
    privileges: ["SELECT"],
    role: publicReaderRole.name,
    schema: "public"
  }
);

Future Imperfect 2.0

A near-complete website rewrite

I spent a few weeks in August rewriting this website, and as promised here are the deets. The source code is available here under the MIT license.

Background

When I initially put this website together in January 2018, I used Julio Pescador’s Hugo port of the Future Imperfect theme. I’d heard good things about Hugo, I wanted to write blog posts in Markdown, and it was a good-looking theme that required minimal additional setup. It served me well for over a year, but a few things kept bothering me:

  1. The styling was very difficult to modify – it was nearly 3000 lines of CSS in a single file, with many duplicated colours and styles.
  2. The theme relied on a lot of JavaScript libraries: jQuery, highlight.js, Fancybox, Skel
    1. My simple static website was serving up hundreds of kilobytes of largely unnecessary scripts. As someone who grew up using a dialup modem, this offended me.

My goal was to rewrite the Hugo theme for extensibility and performance, and I figured it would take maybe a week. Of course, it took about 3 times that.

UI components: Angular vs React

Please don't reinvent JavaScript in your web framework

I’ve been building web UIs in Angular and React for the last few years, and I’ve started to greatly prefer React. Extracting and using UI components is just easier and, for lack of a better word, more JavaScripty.

Extracting components in React

Say I notice that I’m creating multiple <span> elements with the same class and icon:

function Main() {
  return (
    <div>
      <span class="alert-tag"><i class="fa alert"></i>foo</span>
      <span class="alert-tag"><i class="fa alert"></i>bar</span>
      <span class="alert-tag"><i class="fa alert"></i>baz</span>
    </div>
  );
}

It’s trivial to refactor this repeated element into its own <Alert> component function within the same file:

function Main() {
  return (
    <div>
      <Alert>foo</Alert>
      <Alert>bar</Alert>
      <Alert>baz</Alert>
    </div>
  );
}

function Alert({ children }) {
  return (<span class="alert-tag"><i class="fa alert"></i>{children}</span>);
}

Super easy, uses JavaScript’s native language constructs, and I was able to do it all within the same file (but I can easily move Alert() elsewhere for wider re-use if needed). It’s just like extracting a function in a “regular” programming language, it’s something you do without even thinking about it.

Relational Database History

1983 called. They said SQL needs a makeover

I recently read 2 papers that I’d highly recommend if you ever wonder about the underpinnings of modern relational databases:

What Goes Around Comes Around (Stonebraker and Hellerstein, 2005)

This is an opinionated history of database architectures from roughly 1970 to 2005, and it’s got some serious credentials behind it – Michael Stonebraker was the driving force behind Postgres and many other things. It’s an easy read, and it’s a good explanation of how relational DBs and SQL came to (mostly) rule the world. It was written amidst a lot of hype over XML databases, and it (rightly, in retrospect) critiques XML DBs and suggests that they won’t see much commercial adoption.

One interesting bit: Stonebraker and Hellerstein do not like SQL, and they claim that it became dominant largely because of IBM’s dominance in the marketplace during the 1980’s. Which leads us to the next item…

A Critique of the SQL Database Language (Chris Date, 1983)

This was cited approvingly in the 1st paper as a “scathing critique of the semantics of SQL”, which caught my eye - virtually every database supports SQL, how bad can it be?

I won’t summarize the entire thing, it’s a laundry list of complaints, but this resonated with me:

Notice that it is just table-names that appear in the FROM clause. Completeness suggests that it should be table-expressions (as Gray puts it, “anything in computer science that is not recursive is no good”).

…a simple table-reference (i.e. a table-name) ought to be just a special case of a general table-expression

To give a (trivial) example, say we want to union 2 tables and extract a column. We’d write that like this in SQL:

SELECT EMPLOYEE_ID FROM EMPLOYEES
UNION
SELECT EMPLOYEE_ID FROM CONTRACTORS

But if table names were just special cases of table expressions, we could write it like this instead:

SELECT EMPLOYEE_ID FROM (EMPLOYEES UNION CONTRACTORS)

SQL gets the job done, and it could be a lot worse - but it is a little sad that many of the issues academics have known about for 35+ years still exist today. It’s a reminder of how entrenched something that’s only “good enough” can become.

headshot

Cities & Code

Top Categories

View all categories