Java Optimization

My first and most major advice regarding optimization in Java is this: don't do it. That is, unless you are sure that it is really called for. In nearly any large application, 90% the memory and CPU usage are tied up by just 1% of the code. Spending your time optimizing the other 99% is wasted effort.

So real optimization should concentrate on the optimizing the heck out of the limiting step. Before changing any code to be optimal, you need to find what is taking the most time and/or memory.

Below I cover optimization grouped by the class or category which it applies to. Unless otherwise noted, all these steps outlined below apply to JDK 1.1.

java.lang.String

The + operator, which is overridden in Java to concatenate String objects ends up creating a StringBuffer and appending each argument, then returning the concatenated string and continuing. Just create your own StringBuffer and use append(blah) yourself.
If you know that you are going to have many occurrances of the same string inside your application when it is running, do one of two things:
1. Use a static variable and make references to it all over the place.
2. Use String.intern() so that the VM will share copies of that String with other occurrances of it.
I've run into problems, however, intern()ing too many strings. Depending on how long each string is, your VM may throw an OutOfMemoryError after a few thousand calls. If this is a problem, just make a utility class that has a static hashtable and use that.
If you use String objects as keys into Hashtable objects, (and use the get(blah) and contains(blah) methods) then you will want to (ahem) modify java.lang.String so that it's hashCode() method caches it's return value, so that every time you look up a String it doesn't have to re-compute the hash value. JavaSoft knows about this, and decided not to include this enhancement in JDK 1.2 because it would add an extra 4 bytes to the String class (Editorial comment: they are already using Unicode, what's an extra 4 bytes?). If you make this modification, be sure that you declare the cached value transient or you will have problems serializing and de-serializing String objects.
If you are writing out ASCII data somewhere in big blocks and use the getBytes() method on a String, you may be frightened to know that it calls a method on each character it's going to convert. This is nice if you are writing out Turkish text of something, but sucks if you are just doing ASCII. If this is the case, use the getBytes(int, int, byte[] int) method instead. It's deprecated and doesn't handle Unicode, but it's much, much faster than getBytes(). If, on the other hand, you need to do lots of text I/O using byte[]s, then for god's sake cache the byte[].

java.lang.StringBuffer

If you are using StringBuffer in a non-multithreaded environment (or you at least know that nobody will be modifying the StringBuffer at the same time you are, which is essentially a single-threaded environment) you will want to write your own StringBuffer (I use one called SuperStringBuffer) that does not synchronize it's methods. Calling a synchronized method is about 4x slower than calling one that's not synchronized).

If you're doing something like this:

StringBuffer b = new StringBuffer();
b.append(foo);
b.append(bar);
b.append(baz);

it's considerably faster to do it like this:

StringBuffer b = new StringBuffer();
synchronized (b)
{
  b.append(foo);
  b.append(bar);
  b.append(baz);
}

because the former has to obtain three objectlocks, where as the latter only has to obtain one.

java.util.Vector

If you are using Vector in a non-multithreaded environment (or you at least know that nobody will be modifying the Vector at the same time you are, which is essentially a single-threaded environment) you will want to write your own Vector (I use one called SuperVector) that does not synchronize it's methods. See the StringBuffer discussion above.
If you know something about the behavior of your application while it's running, you can tell a Vector how much to grow by each time it needs to grow. By default, it will double it's size each time.

java.util.Hashtable

You may see a trend here... If you are using Hashtable in a non-multithreaded environment (or you at least know that nobody will be modifying the Hashtable at the same time you are, which is essentially a single-threaded environment) you will want to write your own Hashtable (I use one called SuperHashtable) that does not synchronize it's methods. See the StringBuffer and Vector discussion above.
If you know about how a particular Hashtable will be used at runtime, you can give the Hashtable hints about how to grow and when to rehash itself. When properly used, this can dramatically improve both speed and memory usage for a Hashtable.

Object Instantiation

Object creation is expensive (1,850ns on a 200MHz UltraSPARC). If it's at all possible to pool your objects (like buffers, etc), then do so.
Inner class object instantiation is about 2x as expensive (time-wise) as normal object creation. If you are going to be creating a lot of inner class objects, you may be better off making a support class in the same package and which is not public.

Methods

Unless you live in a cave, you probably call a lot of methods in your Java code. Here are the relevant benchmarks:

static (class) methods: These are the fastest to call, taking around 220ns.
final methods: These are somewhere between static and instance methods, taking around 300ns.
instance methods: These are a little slower, taking around 550ns.
interface methods: These are surprisingly slow, taking on the order of 750ns to call.
synchronized methods: These are by far the slowest, since an object lock has to be obtained, and take around 1,500ns.

The moral of the story is that if you can get away with it, use static final methods, and don't use interfaces. This is too bad, since most good OO design involves interfaces and for the most part, static methods are only useful in "library" classes that are just a collection on useful methods.

Loops

Most people write their for loops like this:

for (i=0; i<n; i++)
{
  // do some stuff
}

But, since in almost every language, comparing an int to something else is almost always faster if that something else is a 0. If we re-write the loop like this:

for (i=n-1; i>=0; i--)
{
  // do some stuff
}

It looks backwards, but it's faster. This can be a problem if you need to actually count up from 0 to n, but if not, it's nice. It's even better if you roll up the decrement into something like this:

for (i=n; --i>=0;)
{
  // do some stuff
}

This way, you don't have to do the substraction at the beginning of the loop.

Array bounds checking

As any good (or bad) C and C++ programmer knows, going off the end of an array is a Bad Thing ^(TM). As you may have noticed, Java does not let you get away with this kind of thing has provides an implicit bounds check, and throws an ArrayIndexOutOfBoundsException exception when you read off the end of an array. Some people have suggested that when you are setting all the values of an aray iteratively, like this:

for (i=array.length; --i>=0;)
{
  array[i] = stuff;
}

that this code could be better done like this:

try
{
  for (i=array.length; ; --i)
  {
    array[i] = stuff;
  }
}
catch (ArrayIndexOutOfBoundsException x) { ; }

thereby skipping the >= check done on each loop iteration. This can be considerably faster, but if you have short arrays, it's less optimal since throwing and catching a new Exception takes 9,500ns and doing a >= int compare takes about 250ns (on a 200MHz UltraSPARC), so the breakpoint is about 40 elements. Obviously this is no use if you're not going from some point inside the array (including either end) to one end or the other.

Also, if all you're doing is setting values of one array to another, use System.arraycopy(Object, int, Object, int, int) since it's much, much faster than a for loop.

Local variables

If you're going to do lots of operations with an instance variable inside a method, like this:

public class Foo
{
  private int[] array;
  ...
  public void foo()
  {
    try
    {
      for (int i=array.length; ; --i)
      {
        array[i] = i*2; // or something
      }
    }
    catch (ArrayIndexOutOfBoundsException x) { ; }
  }
}

this will run considerably faster if it's re-written like this:

public class Foo
{
  private int[] array;
  ...
  public void foo()
  {
    int myarray[] = array;
    try
    {
      for (int i=myarray.length; ; --i)
      {
        myarray[i] = i*2; // or something
      }
    }
    catch (ArrayIndexOutOfBoundsException x) { ; }
  }
}

the same goes for local references to loop index vars, etc... this is faster since the VM only has to resolve the reference outside the method once, at the start of the loop, rather than each time.