24 April 2010

Scribe log centralisation

SourceForge.net: scribeserver
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn't available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers.

Scribe is unique in that clients log entries consisting of two strings, a category and a message. The category is a high level description of the intended destination of the message and can have a specific configuration in the scribe server, which allows data stores to be moved by changing the scribe configuration instead of client code. The server also allows for configurations based on category prefix, and a default configuration that can insert the category name in the file path. Flexibility and extensibility is provided through the "store" abstraction. Stores are loaded dynamically based on a configuration file, and can be changed at runtime without stopping the server. Stores are implemented as a class hierarchy, and stores can contain other stores. This allows a user to chain features together in different orders and combinations by changing only the configuration.

Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.

10 April 2010


OpenFAST - About
OpenFAST is a 100% Java implementation of the FAST Protocol (FIX Adapted for STreaming). The FAST protocol is used to optimize communications in the electronic exchange of financial data. OpenFAST is flexible and extensible through high volume - low latency transmissions. The FAST protocol uses a data compression algorithm to decrease the size of data by two processes.

Java Tester - What Version of Java Are You Running?

Maven repositories

List of maven repositories currently proxied through Nexus

Maven 2
Maven Central: http://repo1.maven.org/maven2/
Apache Snapshots: http://repository.apache.org/snapshots
Codehaus Snapshots: http://snapshots.repository.codehaus.org/
Java.net: http://download.java.net/maven/2

Maven 1
Java.net.1: https://maven-repository.dev.java.net/repository

Kx Kdb Wiki

Kx Kdb Downloads

The W3C Markup (HTML, XHTML) Validation Service

The W3C Markup Validation Service

ReplacementDocs - online manuals for games

replacementdocs: The original web archive of game manuals
- Have you ever rented a game that came with no instructions?
- Have you ever bought a used game and found out later that the package you received didn't come with an essential map or answers to copy protection questions required to play the game?

If so, replacementdocs is here to help! We're here to provide you with those manuals for situations when you really should've had them to begin with.

Cheat Sheets

Our Favorite Cheat Sheets

Java.net project list

Sonatype Maven Online Books

Maven: The Definitive Guide | Sonatype

Kx Kdb c.java

Enterprise Integration Patterns - Table of Contents

Enterprise Integration Patterns - Table of Contents

07 April 2010

Always code as... | PHP Zone

Always code as... | PHP Zone
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live

Always code as if you were paying your lines' weight in gold.  The less code you write to solve a problem, the less code you'll have to maintain: code is widely considered a liability more than an asset.  You should favor verbosity only to improve readability and encapsulation: the trade-off is difficult to find here.

Always code as if you had to deploy and use your application at the end of the day.  Which may be the case if it's a web application.  Portability is not a feature you can add as single user story: the best way to make an application portable, configurable, deployable and most of all working is to build it as simple as possible with these characteristics (a walking skeleton), and keep them while you expand the codebase with new features.

Evolutionary architecture and emergent design: Leveraging reusable code, Part 1

Evolutionary architecture and emergent design: Leveraging reusable code, Part 1
Ease of manufacturing explains why we don't have much mathematical rigor in software development. Traditional engineers developed mathematical models and other sophisticated techniques for predictability so that they weren't forced to build things to determine their characteristics. Software developers don't need that level of analysis. It's easier to build our designs and test them than to build formal proofs of how they will behave. Testing is the engineering rigor of software development. Which leads to the most interesting conclusion from Reeves' essay:

Given that software designs are relatively easy to turn out, and essentially free to build, an unsurprising revelation is that software designs tend to be incredibly large and complex.

Another conclusion from Reeves' essay is that design in software (that is, writing the entire source code) is by far the most expensive activity. That means that time wasted when designing is a waste of the most expensive resource. Which brings me back around to emergent design. If you spend a great deal of time trying to anticipate all the things you'll need before you've started writing code, you will always waste some time because you don't yet know what you don't know. In other words, you always run into unexpected time sinks when writing software because some requirements are more complex than you thought, or you didn't fully understand the problem at the beginning. The longer you can defer decisions, the greater your ability to make better decisions — because the context and knowledge you acquire increase with time.

Yet another conclusion from Reeves' essay revolves around the importance of readable design, which translates to more readable code. Finding idiomatic patterns in code is hard enough, but if your language adds extra cruft, it becomes even harder. Finding an idiomatic pattern in an assembly language code base, for example, is very difficult because the language imposes so many opaque elements that you must be able to see around to "see" the design.

I think that the complete source code is the design artifact in software. Once you understand that, it explains a lot about past failures (such as model-driven architecture, which tries to go directly from UML artifacts to code and fails because the diagramming language isn't expressive enough to capture the required nuances). This understanding has several side effects, including the realization that design (which is coding) is the most expensive activity you can perform. This doesn't mean that you shouldn't use preliminary tools (such as UML or something similar) to help you understand the design before you start coding, but the code becomes the real design once you move to that phase.

Readable design matters. The more expressive your design, the easier it is to modify it and eventually harvest idiomatic patterns from it via emergent design.

02 April 2010


zeromq: Fastest. Messaging. Ever.
What is ØMQ?

Imagine pipes that connect your app to many other apps. That lets you talk using a simple socket API. From any language and on any OS. Really fast, that gets out of your way. It's like TCP on steroids!

* ØMQ is a lightweight messaging implementation with a socket-style API.
* Sends and receives messages asynchronously (a.k.a. "message queueing").
* Supports different messaging patterns such as point-to-point, publish-subscribe, request-reply, paralellized pipeline and more.
* Is fast. 13.4 usec end-to-end latencies and over 8M messages a second today (Infiniband).
* Is thin. The core requires just a couple of pages in resident memory.
* Is open source, LGPL-licensed software written in C++.
* Has bindings for many different languages (see the "Languages" section on left).
* Supports different transport protocols: TCP, PGM, IPC, and more.
* Runs on HP-UX, Linux, Mac OS X, NetBSD, OpenVMS, Solaris, Windows, and more.
* Supports microarchitectures such as x86, AMD64, SPARC, IA-64, ARM and more.
* Is fully distributed: no central servers to crash, millions of WAN and LAN nodes.

ØMQ aims to turn messaging patterns as 1st class citizens of the Internet.

Compare to:

* TCP: message based, messaging patterns rather than stream of bytes.
* Jabber: do not confuse instant messaging with real messaging.
* AMQP: 100x faster to do the same work and with no brokers (and 278 pages less spec).
* IPC: we abstract across boxes not a single machine.
* CORBA: we do not enforce horrible complex message formats on you.
* RPC: 0MQ is totally asynchronous, and lets you add/remove participants at any time.
* RFC 1149: a lot faster!
* 29west LBM: we're free software!
* IBM Low-latency: we're free software!
* Tibco: we're still free software!