Computer Systems

Computers automate processes. Calculation processes. Payroll, missile trajectory. Now booking systems, online commerce, communication.

Many processes

Neural networks are very simple, but a neural net is useless without training data. Any programmer can write up a neural net program, the value is in the training data. Companies and organizations that have hoarded data can train neural nets. A trained neural net is an extremely valuable product.

The danger with AI is that a trained neural net despite being very expensive to build, is still cheaper than a trained person. That person may be many many times more complex and nuanced than a trained neural net, but if the job requires only a fraction of the full capabilities of a person the neural net can make that job obsolete. And unfortunately in this world, when someones career becomes obsolete, so does the person - even if a person is many times more complex and nuanced than an AI.

Jobs where the data "inputs" and "outputs" are already digitized will be the first to go. Google translate made a lot of "casual translation" extreemely cheap, but the next generation of neural nets will put professional translators out of a job within a few years.

Will programming become obsolete? Maybe some parts of it. But a large chunk of software is actually unrelated to programming. Gathering user requirements, understanding the business/data/process/system model in depth. Also often the requirements must be discovered since no one actually knows the requirements up front. It will be a long time before there is a way to write requirements in a way that an AI can understand. It will be a very long time before an AI is capable of working with a person to build requirements and iterate on a design. AI will find it very difficult to evolve a software product (just as people find it very difficult to do so).

But it's all about training data. Whoever is hoarding digitized "inputs" about the software development process itself may have an advantage. Github has hoarded source code and trained a model on it, but Copilot can't take user requirements and build a system. And it certainly can't slot into a startup, iterating on a product and refactoring until there is a market fit. Copilot is a neat trick but it is not the end of software development as a job by any reasonable definition.

But there is still opportunity. Atlassian has an extreemely large hoard of data about the communication between software teams and users/product managers. It may be possible for them to build an AI that can make predictions about the likelyhood of success of a project, given historical communication. But that is making a few very large assumptions: a) that companies will give permission for Atlassian to train on this data (if Atlassian even have it), b) that companies are willing or even aware about what constitutes a failed project and what doesn't (since a model would need this for training), c) that the data is not too "messy" (imagine a company wiki, full of outdated and sometimes plain wrong information). So while imaginable I think it is still highly unlikely that the data hoard Atlassian has can be used to train any model for the forseeable future.

Is there any data out there about the software development process that satisfies these criteria:

  • The data contains farily accurate information about what the model is trying to predict. If the model is trying to predict the potential failure of the software project, is that data actually recorded accurately somewhere digitally?
  • The "goal" of the model to is predict something concrete and measurable. If the model is trying to measure "happiness" of customers there are simply too many external factors. Think Facebook optimizing for "engagement" as a proxy for "interest" and the backlash against that. You might be able to build a model that is good at serving up clickbait, but that might not result in a product that people want to use.
  • There is a large enough corpus of data to effectively train a model.

Let's look at image-to-text software:

  • For a given image of text you have a digitized version of the same text, and the accuracy of a model can be easily tested.
  • The "goal" is also very clear: the digitized text must match as accurately as possible what the model spits out. Everyone unambigously wants that to be as accurate as possible and would benefit greatly from it being correct.
  • There is a hell of a lot of images of text that have already been converted into digitized text, so there is a very large corpus of training data.

Now "feed optimization" on social media:

  • For a given user you do not have much existing data on what was "good" for that person and it is very difficult to find that out.
  • The goal is very fuzzy so you reach for something like "engagement" or "clickbaity-ness" that is concrete but is unrelated to what is valuable for anyone.
  • There is unfortunately a lot of data which tricks you into thinking you have viable training data.

Unless your TikTok and you just make straight for "clickbaity-ness" as your primary objective and don't pretend your trying to make the world a better place.