Java from the ground up

Java is just a regular imperative programming language like any other. However it is wrapped in an intensely thick layer of tooling and convention that can seem very opaque to programmers from other languages. Almost every tutorial out there for Java starts by installing an IDE (which effectively doubles as a compiler) and a package manager. I would say a large amount of time is spent learning the IDE and package manager, as opposed to the language itself. These tutorials have the advantage that any Java programming you do in a work environment requires learning these two things. However I have a personal belief that good software is as simple as it can be (and no simpler). By treating every Java project (large or small) as a gigantic enterprise application, we loose the ability to pick and choose complexity as we need it. We end up creating an opaque layer of abstraction that has a large up-front cognitive cost associated. If there were a movement like "small Java" there would be no reason to not consider Java as an agile development language. Comitting to Java as a language does not automatically mean your development efforts must slow down to "enterprise" speeds.

Should you use an IDE and package manager? Yes, most likely. But you do not have to - Java has a very good command line compiler, and managing dependencies yourself gives you a lot of understanding and power. Furthermore, by understanding what is in the core domain of Java as a language you will be able to identify what is outside the core. The Java ecosystem is packed with names that can be confusing to classify in your brain - Maven, Spring Boot, JavaBeans - what are these things, what problems do they solve, how do they overlap with Java as a programming language?

This document is the result of my personal experiments in exploring Java using only the built-in programs: javac the compiler, jar the bundler and java the run-time. For the purposes of this document there is an important convention that I am going to follow:

  • All .java source files contain one and only one class definition.
  • The class name in the source file matches exactly the name of the .java source file.
  • The .java source file is located in a directory hierachy matching the package namespace.

While you can bend or break these conventions it's best not to. These conventions are assumed in the javac compiler and also at runtime by the java command.

At a high level

The java program starts the Java Virtual Machine (JVM) and loads .class bytecode files to execute them. It will search all the "class paths" to find .class files, and/or it will look into .jar bundles to find them. You compile .java files into .class files using the javac program. You create bundles of .class files into a jar using the jar program.

The Java compiler: javac

The Java compiler takes .java source files as input, and produces .class files as output. In the same way that a C compiler takes source code as input and produces machine code as output, the output of javac is binary bytecode for the JVM.

The javac compiler is "recursive", meaning it will examine "import" statements in source files, check if corresponding .java files exist, and recursively compile these. There is a one-to-one relationship between the file path of the .java source file and the package namespace + class name.

File b/B.java:

package b;
public class B {
    public static void somefunc() {
        System.out.println("hello from b.B");
    }
}

File a/A.java:

package a;
import b.B;
public class A {
    public static void main(String[] args) {
        System.out.println("hello from a.A");
        B.somefunc();
    }
}

Compiling using javac a/A.java will produce these files:

a/A.class
b/B.class

It is technically possible to compile a .java source file outside of the directory of it's package. However this becomes quickly infeasable and undesired due to the recursive compilation nature of javac expecting things to be in certain places. Imagine the compiler is doing something like this pseudocode:

compile(sourceFile):
    ...
    importStatements = getImportStatementsFromSource(sourceFile)
    for import in importStatements:
        importSourceFile = replaceDotsWithDirectorySlashes(import) + ".java"
        if fileExists(importSourceFile):
            compile(importSourceFile)
    ...

When the source file does not exist in an import statement the compiler does not error. This is assumed to be a "run time dependency". When the program is executed, the java command will attempt to resolve these undefined symbols from various places (explained later). In a project with no external dependencies (outside the core classes included in the distribution) it should be possible to compile an entire application into .class files by simply compiling the root class that recursively refers to all classes in the program. While this pseudocode is highly likely to be incorrect, it does have problems with circular dependencies which Java also abhors.

Executing compiled programs

Once compiled, the root class (assuming A contains the static "main" function) can be executed:

$ java a.A
hello from a.A
hello from b.B

Trying to execute a class without a static main function defined will produce an error:

$ java b.B
Error: Main method not found in class b.B, please define the main method as:
   public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application

Class Path

So far we have assumed that the place we compile .class files into, is the same relative location as where we execute the program from. By that I mean that if we compile a/A.java into a/A.class we can execute it using java a.A as long as we execute it in the directory containing the a subdirectory (and the dependent b directory containing B.class). However, the java program uses something called the "class path" as a collection of search locations for .class files. This means we can place our b/B.class file in a completely separate location, and tell java where to find it using the -classpath option. But first, let's see what happens without this class path:

$ mkdir hiddenClasses
$ mv b/ hiddenClasses/
$ java a.A 
hello from a.A
Exception in thread "main" java.lang.NoClassDefFoundError: b/B
        at a.A.main(A.java:6)
Caused by: java.lang.ClassNotFoundException: b.B
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 1 more

Notice that we see "hello from a.A" before the exception is thrown? This means class loading is a runtime activity. Now let's try to resolve that using a -classpath argument:

$ java -classpath ".:hiddenClasses/" a.A 
hello from a.A
hello from b.B

Notice how the class path value is ".:hiddenClasses/"? Because we are overriding the class path value completely, we must include the default "." there. So the class path is both "." and "hiddenClasses/".

Bundling in a jar

You can see that a Java application is made up of a collection of compiled .class files that are loaded into the JVM by searching the class path. The purpose of a .jar file is to bundle a collection of .class files (actually, any type of file) up into a single distributable file. You can think of a jar as a virtual directory that is merged with the class paths. A jar file is actually a regular zip archive, so you can unzip a jar and look at its contents. Let's create a jar file of our little program:

$ jar cvf app.jar a/A.class b/B.class
added manifest
adding: a/A.class(in = 720) (out= 514)(deflated 28%)
adding: b/B.class(in = 391) (out= 282)(deflated 27%)

Now let's try to execute it:

$ java -jar app.jar 
no main manifest attribute, in app.jar

The problem here is that java doesn't know in which class to find our main static function. This is stored in the "manifest" that is bundled in the jar file, and we haven't told the jar program what to put for that value. We can do that using the "e" option:

$ jar cvfe app.jar a.A a/A.class b/B.class
added manifest
adding: a/A.class(in = 449) (out= 312)(deflated 30%)
adding: b/B.class(in = 391) (out= 282)(deflated 27%)
$ java -jar app.jar 
hello from a.A
hello from b.B

A bit more complexity

Let's say we bundle our classes as separate jar's - how do we execute them? First, we can exclude the "e" option as it turns out we don't need it anymore:

$ jar cvf a.jar a/A.class 
added manifest
adding: a/A.class(in = 449) (out= 312)(deflated 30%)
$ jar cvf b.jar b/B.class 
added manifest
adding: b/B.class(in = 391) (out= 282)(deflated 27%)

And we can execute the application using the -classpath to specify our jar files:

$ java -classpath "a.jar:b.jar" a.A
hello from a.A
hello from b.B

See how we need to add "a.A" again to tell java where to find our "main" static function? We can actually build the combined jar file in the same way, excluding the "e" option:

$ jar cvf app.jar a/A.class b/B.class
added manifest
adding: a/A.class(in = 449) (out= 312)(deflated 30%)
adding: b/B.class(in = 391) (out= 282)(deflated 27%)
$ java -classpath app.jar a.A
hello from a.A
hello from b.B

One might ponder, can we do this:

$ jar cfe a.jar a.A a/A.class
$ jar cf b.jar b/B.class
$ java -jar a.jar -classpath b.jar

The answer is NO. It seems that the -jar and -classpath options do not work together as you might expect them to. Specifically when you use -jar, the -classpath option is ignored.

Fat jars

It might be convenient to distribute our application classes along with all dependencies as a single jar. Otherwise we would need to instruct our users to download and install our dependencies and make sure they are in the correct location, set class paths etc. The simple way to do this is to download our dependencies as a jar file, extract that file and then add all the .class files when we build our jar. Let's say we download "superawesome.jar":

$ unzip superawesome.jar
$ jar cvfe fat.jar a.A a/A.class b/B.class superawesome/*

Even if "superawesome" contains subdirectories the jar command is smart enough to just bundle everything into a single jar.

Resolving classes

Something along the lines of: Merge all class paths and jar's into a single "virtual" directory, and search it based on package namespace for .class files.

Core classes do not require an import

Nothing in the java.lang package needs to be explicitly imported.

Java Core Libraries

Aside from the auto-imported java.lang classes, there are a large number of core classes available. These define a "standard distribution" of a Java Runtime and you can depend on them being there, so you don't need to bundle them into your fat jar.

Misc about frameworks

My philosophy on software simplicity can very loosely be defined as: perfer tools over frameworks. Where I define a framework as something that you write your code for (you bend your code to fit the framework), and a tool being something you use within your code. When you write code that fits into a framework, you must understand the framework abstraction that wraps your code in order to understand what your program will do:

+-----------------+
| Framework       |
|  +-----------+  |
|  | Your code |  |
|  +-----------+  |
+-----------------+

When you use a tool you are in control of the program flow - there is no inversion of control:

+-----------------+
| Your code       |
|  +-----------+  |
|  | A tool    |  |
|  +-----------+  |
+-----------------+

The problem with frameworks is that they either 1) don't give you enough control where you need it (eg: do not provide the right hook points for your code) and 2) provide hook points and additional complexity when you don't need it. Because by definition a framework is there to cater to many needs, they suffer from point 1. until they become huge complex monsters that do too much of point 2. Furthermore a framework encourages you to write code that only works in the context of the framework - you become tightly coupled to the framework.

Java (and a lot of modern software) is framework heavy. In my experience tools live a lot longer and give programmers less headaches.