ThreadTracker -- Mapping Java threads to Linux threads

4 August 2008   Charles Roth

Resolved Nov 2011:  There's a better resolution to this problem -- see blogs.manageengine.com/appmanager/2011/02/09/identify-java-code-consuming-high-cpu-in-linux-linking-jvm-thread-and-linux-pid.  But I'm leaving the main article here for historical interest.

Introduction: the problem

Discussion
There is some interesting discussion about this in various places, although much of it focusses on getting the process id, which is really the "parent" process for all of the threads.  But what I really want to know is the linux (lightweight) process id of a Java thread.

Q:   "Why would you want to to know?" some folks are bound to ask.
 
A: Well, for starters, there are all kinds of cool monitoring tools that tell me which linux threads are using what resources (e.g. CPU).  I'd really like to know which threads inside my Java app are using those resources.  Quod erod Diagnostum.

An Answer -- sort of
After much research and hair-pulling (and I don't have much of those left), it became clear that Sun just doesn't support any such mapping... perhaps because the notion of a "thread" is so slippery and operating system-dependent.  (Personally, I disagree; I'm sure a high-enough level abstraction could be created, rather like File, that could support everything but do nothing where appropriate.)

But it did occur to me, partly thanks to Igor's blog, that a Java class could ask Linux about any newly-created threads... and if the timing is right and the gods are kind, one might get an answer that would be useful (although not 100% guaranteed to be correct). 

So I wrote a short class that reads data about the current process (and its threads) from /proc/self and /proc/self/task.  It's called ThreadTracker, it is published under the BSD open-source license, and it can be used like this:

   ThreadTracker tt = new ThreadTracker();
   Thread t = new Thread(...);
   t.start();
   System.out.println ("New thread id is probably " + tt.guessNewLinuxThreadId());

Practical Usage Notes:

  1. Most of the time, ThreadTracker will correctly distinguish the one new thread, and return that as the value of guessNewLinuxThreadId().
  2. Occasionally the new thread will have terminated before the caller even gets to guessNewLinuxThreadId(), in which case the returned value is "unknown".
  3. Occasionally more than one thread may get created in the parent process (say if a different thread is in turn creating yet another new thread).  In that case, guessNewLinuxThreadId() returns all of the thread ids, separated by spaces.
  4. It appears that a new thread does not show up in /proc/self/task until its start() method has been called.

Enjoy!  Drop me a line at roth@thedance.net if this is useful, or if you have any feedback.