Java is just a regular imperative programming language like any other. However it is wrapped in an intensely thick layer of tooling and convention that can seem very opaque to programmers from other languages. Almost every tutorial out there for Java starts by installing an IDE (which effectively doubles as a compiler) and a package manager. I would say a large amount of time is spent learning the IDE and package manager, as opposed to the language itself. These tutorials have the advantage that any Java programming you do in a work environment requires learning these two things. However I have a personal belief that good software is as simple as it can be (and no simpler). By treating every Java project (large or small) as a gigantic enterprise application, we loose the ability to pick and choose complexity as we need it. We end up creating an opaque layer of abstraction that has a large up-front cognitive cost associated. If there were a movement like "small Java" there would be no reason to not consider Java as an agile development language. Comitting to Java as a language does not automatically mean your development efforts must slow down to "enterprise" speeds.
Should you use an IDE and package manager? Yes, most likely. But you do not have to - Java has a very good command line compiler, and managing dependencies yourself gives you a lot of understanding and power. Furthermore, by understanding what is in the core domain of Java as a language you will be able to identify what is outside the core. The Java ecosystem is packed with names that can be confusing to classify in your brain - Maven, Spring Boot, JavaBeans - what are these things, what problems do they solve, how do they overlap with Java as a programming language?
This document is the result of my personal experiments in exploring Java using only the built-in programs: javac
the compiler, jar
the bundler and java
the run-time. For the purposes of this document there is an important convention that I am going to follow:
.java
source files contain one and only one class
definition.class
name in the source file matches exactly the name of the .java
source file..java
source file is located in a directory hierachy matching the package namespace.While you can bend or break these conventions it's best not to. These conventions are assumed in the javac
compiler and also at runtime by the java
command.
The java
program starts the Java Virtual Machine (JVM) and loads .class
bytecode files to execute them. It will search all the "class paths" to find .class
files, and/or it will look into .jar
bundles to find them. You compile .java
files into .class
files using the javac
program. You create bundles of .class
files into a jar using the jar
program.
javac
The Java compiler takes .java
source files as input, and produces .class
files as output. In the same way that a C compiler takes source code as input and produces machine code as output, the output of javac
is binary bytecode for the JVM.
The javac
compiler is "recursive", meaning it will examine "import" statements in source files, check if corresponding .java
files exist, and recursively compile these. There is a one-to-one relationship between the file path of the .java
source file and the package namespace + class name.
File b/B.java
:
package b;
public class B {
public static void somefunc() {
System.out.println("hello from b.B");
}
}
File a/A.java
:
package a;
import b.B;
public class A {
public static void main(String[] args) {
System.out.println("hello from a.A");
B.somefunc();
}
}
Compiling using javac a/A.java
will produce these files:
a/A.class
b/B.class
It is technically possible to compile a .java
source file outside of the directory of it's package. However this becomes quickly infeasable and undesired due to the recursive compilation nature of javac
expecting things to be in certain places. Imagine the compiler is doing something like this pseudocode:
compile(sourceFile):
...
importStatements = getImportStatementsFromSource(sourceFile)
for import in importStatements:
importSourceFile = replaceDotsWithDirectorySlashes(import) + ".java"
if fileExists(importSourceFile):
compile(importSourceFile)
...
When the source file does not exist in an import statement the compiler does not error. This is assumed to be a "run time dependency". When the program is executed, the java
command will attempt to resolve these undefined symbols from various places (explained later). In a project with no external dependencies (outside the core classes included in the distribution) it should be possible to compile an entire application into .class
files by simply compiling the root class that recursively refers to all classes in the program. While this pseudocode is highly likely to be incorrect, it does have problems with circular dependencies which Java also abhors.
Once compiled, the root class (assuming A contains the static "main" function) can be executed:
$ java a.A
hello from a.A
hello from b.B
Trying to execute a class without a static main
function defined will produce an error:
$ java b.B
Error: Main method not found in class b.B, please define the main method as:
public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
So far we have assumed that the place we compile .class
files into, is the same relative location as where we execute the program from. By that I mean that if we compile a/A.java
into a/A.class
we can execute it using java a.A
as long as we execute it in the directory containing the a
subdirectory (and the dependent b
directory containing B.class
). However, the java
program uses something called the "class path" as a collection of search locations for .class
files. This means we can place our b/B.class
file in a completely separate location, and tell java where to find it using the -classpath
option. But first, let's see what happens without this class path:
$ mkdir hiddenClasses
$ mv b/ hiddenClasses/
$ java a.A
hello from a.A
Exception in thread "main" java.lang.NoClassDefFoundError: b/B
at a.A.main(A.java:6)
Caused by: java.lang.ClassNotFoundException: b.B
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 1 more
Notice that we see "hello from a.A" before the exception is thrown? This means class loading is a runtime activity. Now let's try to resolve that using a -classpath
argument:
$ java -classpath ".:hiddenClasses/" a.A
hello from a.A
hello from b.B
Notice how the class path value is ".:hiddenClasses/"? Because we are overriding the class path value completely, we must include the default "." there. So the class path is both "." and "hiddenClasses/".
You can see that a Java application is made up of a collection of compiled .class
files that are loaded into the JVM by searching the class path. The purpose of a .jar
file is to bundle a collection of .class
files (actually, any type of file) up into a single distributable file. You can think of a jar as a virtual directory that is merged with the class paths. A jar file is actually a regular zip archive, so you can unzip
a jar and look at its contents. Let's create a jar file of our little program:
$ jar cvf app.jar a/A.class b/B.class
added manifest
adding: a/A.class(in = 720) (out= 514)(deflated 28%)
adding: b/B.class(in = 391) (out= 282)(deflated 27%)
Now let's try to execute it:
$ java -jar app.jar
no main manifest attribute, in app.jar
The problem here is that java
doesn't know in which class to find our main
static function. This is stored in the "manifest" that is bundled in the jar file, and we haven't told the jar
program what to put for that value. We can do that using the "e" option:
$ jar cvfe app.jar a.A a/A.class b/B.class
added manifest
adding: a/A.class(in = 449) (out= 312)(deflated 30%)
adding: b/B.class(in = 391) (out= 282)(deflated 27%)
$ java -jar app.jar
hello from a.A
hello from b.B
Let's say we bundle our classes as separate jar's - how do we execute them? First, we can exclude the "e" option as it turns out we don't need it anymore:
$ jar cvf a.jar a/A.class
added manifest
adding: a/A.class(in = 449) (out= 312)(deflated 30%)
$ jar cvf b.jar b/B.class
added manifest
adding: b/B.class(in = 391) (out= 282)(deflated 27%)
And we can execute the application using the -classpath
to specify our jar files:
$ java -classpath "a.jar:b.jar" a.A
hello from a.A
hello from b.B
See how we need to add "a.A" again to tell java
where to find our "main" static function? We can actually build the combined jar file in the same way, excluding the "e" option:
$ jar cvf app.jar a/A.class b/B.class
added manifest
adding: a/A.class(in = 449) (out= 312)(deflated 30%)
adding: b/B.class(in = 391) (out= 282)(deflated 27%)
$ java -classpath app.jar a.A
hello from a.A
hello from b.B
One might ponder, can we do this:
$ jar cfe a.jar a.A a/A.class
$ jar cf b.jar b/B.class
$ java -jar a.jar -classpath b.jar
The answer is NO. It seems that the -jar
and -classpath
options do not work together as you might expect them to. Specifically when you use -jar
, the -classpath
option is ignored.
It might be convenient to distribute our application classes along with all dependencies as a single jar. Otherwise we would need to instruct our users to download and install our dependencies and make sure they are in the correct location, set class paths etc. The simple way to do this is to download our dependencies as a jar file, extract that file and then add all the .class
files when we build our jar. Let's say we download "superawesome.jar":
$ unzip superawesome.jar
$ jar cvfe fat.jar a.A a/A.class b/B.class superawesome/*
Even if "superawesome" contains subdirectories the jar
command is smart enough to just bundle everything into a single jar.
Something along the lines of: Merge all class paths and jar's into a single "virtual" directory, and search it based on package namespace for .class
files.
Nothing in the java.lang
package needs to be explicitly imported.
Aside from the auto-imported java.lang
classes, there are a large number of core classes available. These define a "standard distribution" of a Java Runtime and you can depend on them being there, so you don't need to bundle them into your fat jar.
My philosophy on software simplicity can very loosely be defined as: perfer tools over frameworks. Where I define a framework as something that you write your code for (you bend your code to fit the framework), and a tool being something you use within your code. When you write code that fits into a framework, you must understand the framework abstraction that wraps your code in order to understand what your program will do:
+-----------------+
| Framework |
| +-----------+ |
| | Your code | |
| +-----------+ |
+-----------------+
When you use a tool you are in control of the program flow - there is no inversion of control:
+-----------------+
| Your code |
| +-----------+ |
| | A tool | |
| +-----------+ |
+-----------------+
The problem with frameworks is that they either 1) don't give you enough control where you need it (eg: do not provide the right hook points for your code) and 2) provide hook points and additional complexity when you don't need it. Because by definition a framework is there to cater to many needs, they suffer from point 1. until they become huge complex monsters that do too much of point 2. Furthermore a framework encourages you to write code that only works in the context of the framework - you become tightly coupled to the framework.
Java (and a lot of modern software) is framework heavy. In my experience tools live a lot longer and give programmers less headaches.