Computer Literacy

Everyone has computers but are users. Computers are great at automating computation. Computer literacy has come to mean knowing how certain programs work (Excel, Word, Windows). There is no widespread computational literacy, except in power users of spreadsheets. Most people do not need to have computational literacy, because in their private lives there is little need for computation. Most people don't have enough books to require an automated way to catalogue them. Most people do not have complex enough finances to require automation of their accounting.

When non programmers require automation of computational tasks they will reach for a spreadsheet. The spreadsheet is the single most revolutionary tool available for a personal computer.

As a programmer it is easy to dismiss spreadsheets. If you ask a programmer to design a user friendly programming environment they will not redesign spreadsheets.

What if instead of trying to replace spreadsheets with something completely different, we were to super-charge spreadsheets? What if there was a spreadsheet program that would on launch look exactly like an existing spreadsheet, but as the user required more and more features would morph into something far more sophisticated?

What would the ultimate programmer friendly spreadsheet look like?

First it would separate data from storage. The spreadsheet view of some data would represent what is in memory, not what is on disk or in the database. In that way it would be more like an SQL workbench that displays the result of a query. It would be like if Excel always saved two files: the original data read-only in one file, and a file representing all the transformations that have been appiled to that data to yeild a result.

Excel is cell focused, meaning for example to join two tables you need to copy cell based functions into every cell of the resulting table. This is the reason for the little square in the bottom right of the cell.

The main workflow would be to open one of more data sources and make them available to the transformation logic. Then a powerful set of tools for transforming that data into a new view. As the data sources change the transformations are re-run to produce a new result. Consider a monthly report with graphs: the underlying data will change every month but the process you want to run on it won't change nearly as often. This is a drawback of Excel because the data and the transformations are bound tightly together.

  • Data sources are streams of data, rewindable
  • Buffers are a window on a stream
  • Transformations are pipelines that buffer segments flow through
  • Results are the output of pipelines

Transformations can be done on row level in the form of functions similar to Excel. Relational algebra performs operations on buffers.

A traditional spreadsheet view would consist of a stream source (a CSV or an Excel file) feeding into a buffer (the currently visible part of the spreadsheet). Excel sheets are simply one or more visible buffers. A sheet is a purely graphical feature related to the UI.

In order to support the simplest Excel workflow buffers would need to be quite flexible. The number of columns in a buffer would need to be undefined. The first view port buffer would be unconnected to any input stream because the data comes from the user entering values into columns. Only if the user were to copy-paste would there be a stream. Users also can create several "tables" on the same sheet, so the system would need to flexible enough to identify separate buffers on one visible sheet.