Parwy's Blog: September 2009

30 September 2009

Tech Tip: TCP/IP Access Using bash | Linux Journal

Testing Exceptions in JUnit 4.7 | Java.net

26 September 2009

Two Kinds of Management by Bruce Eckel

Two Kinds of Management Bruce Eckel
Summary: Guidance & inspiration vs. directing & controlling.

Virtually all the management I've had to do involves people working independently, mostly because I can't bring myself to micromanage and control another person. The downside of this is that if someone isn't in touch with themselves, they can represent things the way they want them to be, rather than how they are. And when they are left to their own devices, they behave differently than what they represented.

I am then put in a position where I have to either (1) take over and start micromanaging/controlling or (2) give up whatever investment I have with that person and try to find someone else. Because of my nature, option 1 eventually fails and I end up with option 2; in recent times I've been trying to understand this well enough that I don't try to fix things and just realize that, for whatever reason, it hasn't worked out so I go as soon as possible (as soon as I can become aware) to option 2. This also follows the maxim "never consider sunk costs when making decisions."

I think this approach is also good for the person in question. As long as you are pretending that things are working OK, you are supporting that person in their illusions. What they actually need is help in moving on, so that they can discover whatever they are really good at.

If you have a company structure where managers are responsible for directing and controlling employee activities, those managers cannot do creative work; they are effectively lost to the needs of the corporate hierarchy. This seems like an unacceptable loss to me, both for the company and for the managers. (In Here Comes Everybody, Clay Shirky makes the case that electronic communication is disrupting the need for traditional corporate hierarchy).

In Outliers, Malcolm Gladwell shows how human-caused disasters never come from big, obvious events or mistakes, but are always an accumulation of small, apparently manageable errors and problems (usually about seven of these before everything goes pear-shaped). What I've been noticing lately is that relationship problems -- typically business relationships but I am starting to think also personal ones -- follow this pattern. People rarely do one big thing that makes it easy to see there's something wrong. In those cases, you don't usually have to wait for the big thing to happen, you see it coming long before (and it's often obvious enough that you don't hire the person in the first place). It's the small things that slip beneath your radar, where you say, "oh, that's just a little thing, I can let it slide" or "we can compensate for that" to the point where you have a bunch of small things causing problems, each of which you've kind of passed through and decided it wasn't a problem. The real failures, of course, are the ones that get past the Maginot Line of our attention.

I don't like command-and-control management, and I don't want to do it. To me it indicates a failure of team-building, and it wastes my time. The kind of management I want to do is the fun kind, where I can draw on my experience to help people do a better job, to inspire and guide. I've been fortunate that most of my consulting engagements have been like this. (Also note the similarity to Martin Fowler's discussion of software development attitudes).

What's your opinion of 'the Cloud'? the emperor's new clothes

Poll Result: 'the Cloud' Is Not an Earth-Shattering Development | Java.net

What's your opinion of 'the Cloud'?

* 13% (45 votes) - It's the future: gradually all apps will move to the Cloud.
* 6% (21 votes) - It's great. I'm using it already.
* 26% (91 votes) - It's an interesting development. I'll wait and see what comes of it.
* 19% (66 votes) - It's a passing phase, like so many other things we've seen.
* 30% (105 votes) - It's the emperor's new clothes: the Cloud is just a server, what's so new about it?
* 5% (19 votes) - I don't know; other

Continuous Deployment

Wow, here's an idea I never heard off or even contemplated, bit scary:

The most controversial practice that Marty promotes is Continuous Deployment. This is the automated deployment of code to production. It includes automated testing and continuous integration, simple deployment/rollback scripts, a successful CI build triggers deployment, and there's real-time alerts in production. When shit goes wrong, you should use the "five whys" to perform root cause analysis. Marty admits that this is only a good idea when there's a high-level of trust in your development team and lots of tests to prove nothing is broken.

The benefits of continuous deployment is there's a lower story cycle time, you eliminate waste in deploying code, you deliver features/bugs fixes faster and you find integration issues quicker and in isolation. It's also a great way to promote not checking in shitty code.

The skeptics think this is a bad idea because 1) it's scary, 2) they believe it causes lower quality and 3) it causes more issues in production. The good news is you can still control production deployments with your source control system (e.g. branches and such). More than anything, it forces you to have a high quality continuous integration system that acts as the gatekeeper for what goes to production.

TCP tuning

ESnet Network Performance Knowledge Base - Host Tuning

25 September 2009

Rambus has devised a way of making DDR3 memory multi-threaded, so that it can better support multi-core servers.

24 September 2009

Project Shoal announces the final release of Shoal 1.1 FCS

Shoal 1.1, the latest release of the Java based dynamic clustering framework, contains a number of features and bug fixes that significantly improve clustering reliability, and scalability for your applications.

Kirk Pepperdine on Performance Tuning and Cloud Computing | Java.net

Early in the interview, Janice asked Kirk: "What misconceptions do you encounter among developers about performance tuning?"

Kirk's response wasn't what I would have expected. He answered:

Here's a major one: All developers believe they are good at performance tuning. They might be very good at writing performance code, and they might be very good at coding, but generally I find that most developers are not very good at performance tuning.

How can this be?

When developers are put in situations where they are asked to performance tune, they typically look at code and take some action. But they invariably forget the dynamics of the system. If you don't include the dynamics of the system when you performance tune or if you think you understand the dynamics of the system and guess wildly wrong - which is quite often the case - you end up doing the wrong thing, which frequently happens.

Kirk notes that software testers tend to be more successful when it comes to performance tuning than most developers, because they are more accustomed to working within a defined process.

IntelliJ IDEA and JRebel: Better Together | JetBrains IntelliJ IDEA Blog

When long is not long enough | Java.net

Some Java Concurrency Tips | Java.net

Benchmarking - thrift-protobuf-compare

20 September 2009

Tervela bags $18m for go faster messaging appliances • The Register

In June this year Tervela launched the third generation of its messaging appliance, theTMX-500 Message Switch. This uses custom ASIC and FPGA chips to speed up the processing of Java Message Service (JMS) messages as they bounce around n-tier Java applications, loosely gluing them together so they can do transactions like the ones at the heart of financial systems on display at HPC on Wall Street.

The TX-500 comes in a 2U form factor (and is basically three motherboards and some fans) that can have sixteen Gigabit Ethernet links or four 10 Gigabit Ethernet links that allows anywhere from 4 to 64 million JMS messages per second to be processed, all with keeping the message latency under 10 microseconds.

12 September 2009

Developing with real-time Java, Part 2: Improve service quality

Eliminate Architecture | Javalobby

Therefore, I’ll conclude that the goal of software architecture must be to eliminate the impact and cost of change, thereby eliminating architectural significance. And if we can do that, we have eliminated the architecture.

Creating Objects Without Calling Constructors

[JavaSpecialists 175] - Creating Objects Without Calling Constructors

import sun.reflect.ReflectionFactory;
import java.lang.reflect.Constructor;

public class SilentObjectCreator {
public static  T create(Class clazz) {
 return create(clazz, Object.class);
}

public static  T create(Class clazz, Class parent) {
 try {
   ReflectionFactory rf = getReflectionFactory();
   Constructor objDef = parent.getDeclaredConstructor();
   Constructor intConstr = rf.newConstructorForSerialization(clazz, objDef);
   return clazz.cast(intConstr.newInstance());
 } catch (RuntimeException e) {
   throw e;
 } catch (Exception e) {
   throw new IllegalStateException("Cannot create object", e);
 }
}
}

09 September 2009

It is Not About Writing Tests, It's About Writing Stories

It is Not About Writing Tests, It's About Writing Stories | Javalobby

Cliff Click on Java vs C performance... again...

Cliff Click on Java vs C performance... again... and a little mention of C#

07 September 2009

Lambdaj

CMSMaxAbortablePrecleanTime

Jon Masamitsu's Weblog

CMSMaxAbortablePrecleanTime

Our low-pause collector (UseConcMarkSweepGC) which we are usually careful to call our mostly concurrent collector has several phases, two of which are stop-the-world (STW) phases.

# STW initial mark
# Concurrent marking
# Concurrent precleaning
# STW remark
# Concurrent sweeping
# Concurrent reset

The first STW pause is used to find all the references to objects in the application (i.e., object references on thread stacks and in registers). After this first STW pause is the concurrent marking phase during which the application threads runs while GC is doing additional marking to determine the liveness of objects. After the concurrent marking phase there is a concurrent preclean phase (described more below) and then the second STW pause which is called the remark phase. The remark phase is a catch-up phase in which the GC figures out all the changes that the application threads have made during the previous concurrent phases. The remark phase is the longer of these two pauses. It is also typically the longest of any of the STW pauses (including the minor collection pauses). Because it is typically the longest pause we like to use parallelism where ever we can in the remark phase.

Part of the work in the remark phase involves rescanning objects that have been changed by an application thread (i.e., looking at the object A to see if A has been changed by the application thread so that A now references another object B and B was not previously marked as live). This includes objects in the young generation and here we come to the point of these ramblings. Rescanning the young generation in parallel requires that we divide the young generation into chunks so that we can give chunks out to the parallel GC threads doing the rescanning. A chunk needs to begin on the start of an object and in general we don't have a fast way to find the starts of objects in the young generation.

Given an arbitrary location in the young generation we are likely in the middle of an object, don't know what kind of object it is, and don't know how far we are from the start of the object. We know that the first object in the young generation starts at the beginning of the young generation and so we could start at the beginning and walk from object to object to do the chunking but that would be expensive. Instead we piggy-back the chunking of the young generation on another concurrent phase, the precleaning phase.

During the concurrent marking phase the applications threads are running and changing objects so that we don't have an exact picture of what's alive and what's not. We ultimately fix this up in the remark phase as described above (the object-A-gets-changed-to-point-to-object-B example). But we would like to do as much of the collection as we can concurrently so we have the concurrent precleaning phase. The precleaning phase does work similar to parts of the remark phase but does it concurrently. The details are not needed for this story so let me just say that there is a concurrent precleaning phase. During the latter part of the concurrent precleaning phase the the young generation "top" (the next location to be allocated in the young generation and so at an object start) is sampled at likely intervals and is saved as the start of a chunk. "Likely intervals" just means that we want to create chunks that are not too small and not too large so as to get good load balancing during the parallel remark.

Ok, so here's the punch line for all this. When we're doing the precleaning we do the sampling of the young generation top for a fixed amount of time before starting the remark. That fixed amount of time is CMSMaxAbortablePrecleanTime and its default value is 5 seconds. The best situation is to have a minor collection happen during the sampling. When that happens the sampling is done over the entire region in the young generation from its start to its final top. If a minor collection is not done during that 5 seconds then the region below the first sample is 1 chunk and it might be the majority of the young generation. Such a chunking doesn't spread the work out evenly to the GC threads so reduces the effective parallelism.

If the time between your minor collections is greater than 5 seconds and you're using parallel remark with the low-pause collector (which you are by default), you might not be getting parallel remarking after all. A symptom of this problem is significant variations in your remark pauses. This is not the only cause of variation in remark pauses but take a look at the times between your minor collections and if they are, say, greater than 3-4 seconds, you might need to up CMSMaxAbortablePrecleanTime so that you get a minor collection during the sampling.

And finally, why not just have the remark phase wait for a minor collection so that we get effective chunking? Waiting is often a bad thing to do. While waiting the application is running and changing objects and allocating new objects. The former makes more work for the remark phase when it happens and the latter could cause an out-of-memory before the GC can finish the collection. There is an option CMSScavengeBeforeRemark which is off by default. If turned on, it will cause a minor collection to occur just before the remark. That's good because it will reduce the remark pause. That's bad because there is a minor collection pause followed immediately by the remark pause which looks like 1 big fat pause.l

Understanding Concurrent Mark Sweep Garbage Collector Logs

03 September 2009

Nuba - Free conference calling

Nuba, free conference calling:

* Up to 100 participants
* Audio Recording
* MP3 audio Download & Podcasts
* WEB & VIDEO Conferencing
* Available 24/7, RESERVATION-LESS
* International access
* No Invoicing, no billing, no hidden cost
* New! Outlook plugin

Passing parameters to JUnit tests

JUnit: A Little Beyond @Test, @Before, @After

Looks like JUnit 4 has a feature that I find incredibiliy useful in TestNG. Its the ability to invoke a single test method many times with different arguments. This is very useful to invoke the test method passing in input parameters and expected results.

multiple ssh private keys

In quite a few situations its preferred to have ssh keys dedicated for a service or a specific role. Eg. a key to use for home / fun stuff and another one to use for Work things, and another one for Version Control access etc. Creating the keys is simple, just use

ssh-keygen -t rsa -f ~/.ssh/id_rsa.work

... ssh config lets you get down to a much finer level of control on keys and other per-connection setups ... ~/.ssh/config looks like this:

Host *.home.lan
IdentityFile ~/.ssh/id_dsa.home
User kbsingh

Host *.vpn
IdentityFile ~/.ssh/id_rsa.work
User karanbir
Port 44787

Host *.d0.karan.org
IdentityFile ~/.ssh/id_rsa.d0
User admin
Port 21871

Ofcourse, if I am connecting to a remote host that does not match any of these selections, ssh will default back to checking for and using the 'usual' key, ~/.ssh/id_dsa or ~/.ssh/id_rsa

Staying Current: A Software Developer's Responsibility | Javalobby

Converting log files into csv

Here is how I converted a log file contains lines like

type:Trade, clientOrderId:20634240922, marketOrderId:bsqj0, marketEventId:ccpfv, clientEventId:ccpfv, status:PARTIAL_FILL, orderParameters:[market:RPR, party:MYCO, instrument:USDCAD, side:BUY, quantity:1000000.0, minimumQuantity:10000.0, maximumShowQuantity:1000000.0, price:1.0975700000000002, type:LIMIT, quoteType:INDICATIVE, timeInForce:GTC, properties:null], transactDateTime:2009-08-27T00:00:15.712Z, receivedDateTime:2009-08-27T00:00:15.712Z, openQuantity:980000.0, executedQuantity:20000.0, averageExecutedPrice:1.0975700000000002, marketTradeId:c4v, clientTradeId:null, tradeQuantity:20000.0, tradePrice:1.0975700000000002, counterparty:SOMECO, settlementDateTime:2009-08-28T00:00:00.000Z, taker:false, comments:null

into a csv file that looked more like:

c4v, 20634240922, 2009-08-27T00:00:15.712Z, MYCO, USDCAD, BUY, PARTIAL_FILL, 20000.0, 1.0975700000000002,

tail -F mylogfile.log | sed 's/.*, clientOrderId:$\S*$,.*, status:$\S*$,.*, party:$\S*$, instrument:$\S*$, side:$\S*$,.*, transactDateTime:$\S*$,.*, marketTradeId:$\S*$,.*, tradeQuantity:$\S*$, tradePrice:$\S*$.*/\7, \1, \6, \3, \4, \5, \2, \8, \9/'