Rethinking computing

There are a lot of assumptions about how computing is currently structured. There are several core concepts that form the basis for computing for the "big three" operating systems. What if we could rethink these core concepts and design computing for the future.

  • Processes - the concept of multiple processes being executed on a machine at once.
  • File-systems - how we organise data and processes.
  • Immutability of code - how we change computation.

In order to rethink we need to look at what we are dealing with at the core of computing - computation on a CPU (or GPU). From the perspective of the CPU there is only one "program" being executed at a time. Operating systems schedule time on the CPU for each running process, switching processes out and in to give each process time on the CPU. This can be traced back to the early time sharing computing mainframes where multiple users (programmers) needed to run their programs on a single and expensive CPU. Today we can see this model being challenged by the task model of mobile devices. Mobile system designers seem to have realised that computer users are only really focussed on one task at a time. While your browser and your email client are technically "running" at the same time the mode of interaction means it appears as if you can only interact with one task at a time. While on the desktop it is completely normal to have your browser and some other program visible at the same time.

Desktop operating systems tend to ignore the mode of usage where your browser is "running" but you are not really using it. You may even want to pause the program so it is not executing but still technically keep it "open" so that it's state is preserved (your open tabs). We even see browsers now that by default will restore all open tabs after restarting the program in recognition of this.

Terminal based programs on Linux already have this concept known as suspending or back-grounding a process (Ctrl-z). The process is still loaded but it gets no CPU time. So what exactly does it mean to load a program?

Processes

The dominant model on desktop systems for processes is the executable. This is a file stored on the file-system that contains the code that you want to run. When starting the executable it is the responsibility of the operating system to load that process into memory and start giving that program time on the CPU. The first implication of the model is that once written to storage an executable the computation logic can not be changed. The computation can not be altered once the program is loaded into memory. The only real mechanisim for altering the computation of a running process is to save it's state, stop the process, change the executable and then start it again. This seems all very reasonable and safe, but the question is what interesting modes of computing does it prevent, and why is it like this? Our computing metaphor is still rooted in stacks of punch cards being loaded into readers to be prepared for execution.

The operating system and computer hardware then work together to keep these individual processes from "seeing" each other. Memory is virtual so that each process can not go and mess with another process. From the perspective of a program it is the only thing running on the computer. No surprise then that getting processes to see and interact with each other is a major source of complexity and difficulty in out current computing model. We have shared memory, signals, pipes and sockets. Each of these things goes against the grain of what the operating system is trying to do: prevent processes from directly interacting with each other.

So we invented remote procedure calling (RPC) protocols for calling code that lives in a different process (or even a different machine), usually over a socket of some form. These protocols are owned by the process, meaning two processes need to agree beforehand what protocol they will use to talk to each other. That decision is made at the time of writing the program code, which means it is set in stone well before we create a new process and load the code to be executed.

However within a process we also have a form of RPC: calling shared library code. The protocol is called the ABI (Application Binary Interface) which is the convention that machine code uses to pass data between two different functions. Between an executable and a shared library is this interface specifying how the called code will gain access to data prepared by the caller. Usually stored on the process stack or in CPU registers. There is also a specification for how called code can return data to the caller. What we as users are lacking in this process is control. First we do not have control over what shared libraries are visible to a process. This is decided for us when the executable is built. We do not have the ability to make a completely new shared library visible to a process once it has been loaded into memory, and even if it did we do not have a way of modifying that process to call into that shared library.

At a fundamental level we do not treat machine code or instructions as data. In our minds we separate machine instructions from the data those instructions are operating on. We give freedom to pull new data from different sources at run-time but we do not do the same for our code.

This has given rise to a trio of computing entities: The computing device, programmers and users. If we go back to the beginning of computing we only had two entities: The computing device and programmers. Computation was designed and executed by programmers and the programmers were the users. Over time the act or writing machine instructions became something independent from executing those instructions. Computation could be designed and written once but executed many times. Now we are in a completely different world. There are parts of the Linux kernel that are executed continuously day and night on computing devices owned by almost every human being on the planet, where that specific pattern of machine code was created by a single person. If you pluck any specific combination of instructions from the Linux kernel compiled for a mobile device, that combination was created by a specific person. And yet it is executed by billions of people every nanosecond of every day non-stop and only the tiniest fraction of people have the knowedge required to change that combination of instructions.

To me the closest thing to this situation outside of computing is the licence to practice law. However it a very loose analogy that quickly falls down.

I tried suspending my browser to see what it would do:

kill -STOP <browser main pid>

The program continued to show in the taskbar, and when I alt-tab'd to it the window was "frozen". I was able to resume the program:

kill -CONT <browser main pid>

And as expected my browser continued to work as normal. So what if the task switcher app in my window manager did this for me? I could configure that certain programs were automatically suspended when not in focus. When I alt-tab'd back to the program the program was simply resumed. No more annoying popups from the browser, but also no more listening to youtube while doing something else.