26 October 2008

Introduction to nonblocking algorithms

Milton - Java webdav api

Milton is an open-source server-side webdav API written in java. Could be useful, certainly found webdav useful for exposing file-based services through firewalls.

20 October 2008

Access java code in Excel

Obba: A Java Object Handler for Excel | Javalobby
With Obba, you can easily build Excel GUIs to Java code.
Its main features are:

* Loading of arbitrary jar or class files at runtime through an Excel worksheet function.
* Instantiation of Java objects, storing the object reference under a given object label.
* Invocation of methods on objects referenced by their object handle, storing the handle to the result under a given object label.
* Asynchronous method invocation and tools for synchronization, turning your spreadsheet into a multi-threaded calculation tool.
* Serialization and de-serialization (save Serializable objects to a file, restore them any time later).
* All this though spreadsheet functions, without any additional line of code (no VBA needed, no additional Java code needed).

Is Code Coverage Important?

Setup svnsync

02 October 2008

SVNKit - Java api to Subversion

Controlling subversion from java via an api like SVNKit could be quite useful. Never really thought about using it from a typical application but I have written applications that have versioned files, so why reinvent the wheel, just use Subversion via svnkit. An example might be managing changes to configuration files at runtime.

SVNKit is a pure Java toolkit - it implements all Subversion features and provides APIs to work with Subversion working copies, access and manipulate Subversion repositories - everything within your Java application.

SVNKit is written in Java and does not require any additional binaries or native applications. It is portable and there is no need for OS specific code. SVNKit is compatible with the latest version of Subversion.

01 October 2008

Bash parameter expansion

Bash Parameter Expansion

Ensure a database doesnt undo all your hard work in increasing concurrency

Caching, Parallelism and Scalability | Javalobby

So you make your application super scalable... dont just stick a database in the way....

databases that touch disks that inherently involve sequential, serial processing. And there is no way you can change your database vendor’s code. Systems that involve all but the simplest, most infrequent database use would be facing a massive bottleneck thanks to this serialization. Databases are just one common example of process serialization, but there could be others as well. Serialization is the real enemy here, as it undoes any of the throughput gains parallelism has to offer. Serialization would only allow a single core in a multi-core system to work at a time, and limit usable computing power. And this is made even worse in a cluster or grid. Using a bigger and beefier database server reduces this somewhat, but still does not overcome the problem that as you add more servers to your grid, you still have process serialization going on in your database server.

Distributed caches are a solution to this problem.  They are simple to implement where data is immutable (read-only), things become a bit more complex when the data being distributed is mutable.  Ideally the cache dynamically distributes the data across nodes based on runtime usage.

Replacing System.currentTimeMillis() with a Clock interface

I've often not used System.currentTimeMillis() in code, rather I inject an interface into objects that need to know the current time. This interface, lets call it Clock, basically provides a single function getCurrentTime(). Reading Alex Miller - Controlling time I see I'm not alone in this.
This is awesome for unit testing as you can create your own Clock, fix the time, control the rate at which it advances, test rolling over daylight savings time changes, time zones, etc. But all of your code just relies on Clock, which is the standard facade.

... its also very useful event processing systems, you can factor out time-based behaviour, so that the same code can be used for processing historical and realtime events.

Feedback and project management during the software development lifecycle

Alex Miller - Software Rhythm: Mid-Game

Some good comments on feedback during software development

What I have seen is the value of visible public feedback during development. One way to do that is with iterations that provide working features on a regular basis. Another way is to do periodic milestone demos or usability tests or performance testing or whatever makes sense for your project. The most important demo is the first one as you’re probably building something completely wrong and when the user or product manager sees it, they’re going to freak out. That’s ok. In fact, it’s great because you will get plenty of feedback to build the next version.

I think you’ll find that many of our best practices happen to correlate to cycle feedback. Releases get feedback through acceptance testing and actual users. Iterations (or milestones) get feedback by showing work earlier to users and re-planning. Daily builds give feedback by running a suite of tests and reporting on quality. Daily standup meetings give feedback on who’s doing what. Commit sets give feedback by exposing your changes to others and possibly by running commit-time hooks for test suites. Code/test/refactor cycles give you feedback by green bar unit test runs. Post-mortems (or the trendier term “retrospectives”) can give the team feedback about itself at whatever frequency you want.

I’ve come to believe that most process improvements can be tied to a feedback cycle

And on project management:

From a project management point of view, one of the most important things you need to figure out is how you will track the project while you’re in the middle of it.

The most commonly used tool when planning and tracking is the dreaded Gantt chart. I’ve loathed the Gantt chart for a long time but I’ve finally figured out that I hate it because my early exposure to it was using it as a tracking tool. And for tracking the progress of software development, there are few tools worse suited than the Gantt chart. The Gantt displays all tasks as independent and of fixed length. But tasks in the development world tend to be far more malleable, interconnected, and incremental than it’s possible to represent in a Gantt.

Faced with this mismatch, you have two choices: 1) ignore what’s actually happening in the project and approximate it in your rigid schedule or 2) modify the Gantt on a daily basis to reflect the actual state of the project down to the minute. The first will cause developers to ignore the schedule because the schedule seems completely disconnected from reality. The second is madness because the amount of daily churn means the project manager will do nothing else and he will be pestering developers constantly as he does it. These both suck.

For every feature, you should:

* have a list of tasks to be completed (this evolves constantly)
* know the order they need to be done in (but I find this to be intuitive and not necessary to spell out)
* know whether tasks are completed, in progress, or not started
* know who is doing them
* know estimates for each remaining task

There are a bunch of ways to do this kind of tracking during the life of a project. I’ve most commonly used a spreadsheet or plain text file, but have had some success with agile-oriented PM tools like Rally or VersionOne. The point is that for many, many projects you can track this stuff with little work by relaxing your death grip on the initial schedule. 4

You do need to know how much work remains so that you can either adjust end dates or de-scope by dropping or shaving features. You can determine this with: (tasks remaining * estimates) - people time available. If that’s <= 0, you need to take corrective action. I find doing this adjustment on a weekly basis works pretty well and can take less than an hour if you stay on top of it. A burn-down chart is a really nice way to represent this info.

Can't see offshore software development working for any complex project

What does the offshoring backlash tell us? • The Register

Quite funny

After 2 years of excuses, laziness, constant turnover (complete waste of training time when the guy/girl buggers off and leaves you with a new muppet), terrible or copied-from-Google code, never-ending bugs, headaches, baffling phone calls where no-one understood each other, emails that promised to "do the needful" but went ignored, applications that just didn't work, MILLIONS of dollars, and much, much more....... we had enough, and told the Indian coding behemoth we'd had enough and brought our dev team back in house. Saying that things go more smoothly is a massive understatement. Don't know why we bothered. Oh yes, some spreadsheet said it would be cheaper.
This informal communication is completely lost when parts of a project are outsourced. Sending the same spec to another country to be evaluated by a developer who has never met the author and who must route all queries through an account manager just does not work.

Lifehacker's best-of awards for productivity apps

Best of the Best: The Hive Five Winners

Nice to see "pen and paper" winning the best GTD application. Still not happy with the choice of web-based contact management apps, I wish google would pull their finger out and make their contacts app as good as their email and calendar apps.

Some interesting new java annotations in JSR 305

java.net: The Open Road: javax.annotation

Covering annotations such as:






Universal uploader for flickr and google docs and other web services

Smooks looks like a useful java library for processing data files

Milyn - Smooks
Smooks is a Java Framework/Engine for processing XML and non XML data (CSV, EDI, Java etc).

Smooks can be used to:

* Perform a wide range of Data Transforms - XML to XML, CSV to XML, EDI to XML, XML to EDI, XML to CSV, Java to XML, Java to EDI, Java to CSV, Java to Java, XML to Java, EDI to Java etc.
* Populate a Java Object Model from a data source (CSV, EDI, XML, Java etc). Populated object models can be used as a transformation result itself, or can be used by (e.g.) Templating resources for generating XML or other character based results. Also supports Virtual Object Models (Maps and Lists of typed data), which can be used by EL and Templating functionality.
* Process huge messages (GBs) - Split, Transform and Route message fragments to JMS, File, Database etc destinations.
* Enrich a message with data from a Database, or other Datasources.
* Perform Extract Transform Load (ETL) operations by leveraging Smooks' Transformation, Routing and Persistence functionality.

Smooks supports both DOM and SAX processing models, but adds a more "code friendly" layer on top of them. It allows you to plug in your own "ContentHandler" implementations (written in Java or Groovy), or reuse the many existing handlers.