11 November 2012

kryo - Fast, efficient Java serialization and cloning

kryo - Fast, efficient Java serialization and cloning - Google Project Hosting
Kryo is a fast and efficient object graph serialization framework for Java. The goals of the project are speed, efficiency, and an easy to use API. The project is useful any time objects need to be persisted, whether to a file, database, or over the network. Kryo can also perform automatic deep and shallow copying/cloning. This is direct copying from object to object, not object->bytes->object.

Finagle

Finagle, from Twitter
Finagle is a network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language. Finagle provides a rich set of protocol-independent tools.

Snappy - A fast compressor/decompressor

snappy - A fast compressor/decompressor - Google Project Hosting
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

30 June 2012

5' on IT-Architecture: Four Laws of Robust Software Systems | Javalobby

Interesting point here.  Given there is nothing permanent except change, its important that you build systems that are easy to change later?  Unit/integration tests facilitate change.

5' on IT-Architecture: Four Laws of Robust Software Systems | Javalobby
There is nothing permanent except change. If a system isn't designed in accordance to this superior important reality, then the probability of failure may increase above average. A widely used technique to facilitate change is the development of a sufficient set of unit tests. Unit testing enables to uncover regressions in existing functionality after changes have been made to a system. It also encourages to really think about the desired functionality and required design of the component under development.

28 June 2012

FolderSync • Reg Hardware

FolderSync • Reg Hardware
lets you synchronise your Android folders with SkyDrive and just about every other cloud storage supplier.

FolderSync supports most of the popular commercial cloud wallahs, including Dropbox, SkyDrive, SugarSync, Ubuntu One, Box, Amazon S3 and Google Drive. It also works with FTP, Samba/Windows share and WebDAV remote storage for the DIY enthusiast. It also supports multiple accounts of the same type.

Envers - Hibernate audit tables

Envers - JBoss Community
The Envers project aims to enable easy auditing/versioning of persistent classes. All that you have to do is annotate your persistent class or some of its properties, that you want to audit, with @Audited. For each audited entity, a table will be created, which will hold the history of changes made to the entity. You can then retrieve and query historical data without much effort.

The Netflix Tech Blog: The Netflix Simian Army

The Netflix Tech Blog: The Netflix Simian Army
Imagine getting a flat tire. Even if you have a spare tire in your trunk, do you know if it is inflated? Do you have the tools to change it? And, most importantly, do you remember how to do it right? One way to make sure you can deal with a flat tire on the freeway, in the rain, in the middle of the night is to poke a hole in your tire once a week in your driveway on a Sunday afternoon and go through the drill of replacing it. This is expensive and time-consuming in the real world, but can be (almost) free and automated in the cloud.

This was our philosophy when we built Chaos Monkey, a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables -- all the while we continue serving our customers without interruption. By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won't even notice.

Inspired by the success of the Chaos Monkey, we’ve started creating new simians that induce various kinds of failures, or detect abnormal conditions, and test our ability to survive them; a virtual Simian Army to keep our cloud safe, secure, and highly available.

Latency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately. In addition, by making very large delays, we can simulate a node or even an entire service downtime (and test our ability to survive it) without physically bringing these instances down. This can be particularly useful when testing the fault-tolerance of a new service by simulating the failure of its dependencies, without making these dependencies unavailable to the rest of the system.

Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down. For example, we know that if we find instances that don’t belong to an auto-scaling group, that’s trouble waiting to happen. We shut them down to give the service owner the opportunity to re-launch them properly.

Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. Once unhealthy instances are detected, they are removed from service and after giving the service owners time to root-cause the problem, are eventually terminated.

Janitor Monkey ensures that our cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.

Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal.

10-18 Monkey (short for Localization-Internationalization, or l10n-i18n) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets.

Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.
With the ever-growing Netflix Simian Army by our side, constantly testing our resilience to all sorts of failures, we feel much more confident about our ability to deal with the inevitable failures that we'll encounter in production and to minimize or eliminate their impact to our subscribers. The cloud model is quite new for us (and the rest of the industry); fault-tolerance is a work in progress and we have ways to go to fully realize its benefits. Parts of the Simian Army have already been built, but much remains an aspiration -- waiting for talented engineers to join the effort and make it a reality.

24 June 2012

Spring 3.1 Constructor Namespace

Spring 3.1 Constructor Namespace makes it much neater to use constructor injection but you do need to use debug-compiled versions of classes

23 June 2012

Tracking Excessive Garbage Collection in Hotspot JVM

Tracking Excessive Garbage Collection in Hotspot JVM

Interesting looks like setting "-XX:GCHeapFreeLimit=20" and "-XX:GCTimeLimit=90" will help to ensure that OutOfMemory errors are generated earlier so that an application doesn't remain in degraded state involving too frequent garbage collections.

08 June 2012

GuavaExplained - guava-libraries - Every Java developer should read this wiki in detail

GuavaExplained - guava-libraries

Every Java developer should read this wiki in detail.
The Guava project contains several of Google's core libraries that we rely on in our Java-based projects: collections, caching, primitives support, concurrency libraries, common annotations, string processing, I/O, and so forth. Each of these tools really do get used every day by Googlers.

But trawling through Javadoc isn't always the most effective way to learn how to make best use of a library. Here, we provide readable and pleasant explanations of some of the most popular and most powerful features of Guava.

This wiki is a work in progress, and parts of it may still be under construction.

Basic utilities: Make using the Java language more pleasant.
Preconditions: Test preconditions for your methods more easily.
Common object methods: Simplify implementing Object methods, like hashCode() and toString().
Ordering: Guava's powerful "fluent Comparator" class.
Throwables: Simplify propagating and examining exceptions and errors.
Collections: Guava's extensions to the JDK collections ecosystem. These are some of the most mature and popular parts of Guava.
Immutable collections, for defensive programming, constant collections, and improved efficiency.
New collection types, for use cases that the JDK collections don't address as well as they could: multisets, multimaps, tables, bidirectional maps, and more.
Powerful collection utilities, for common operations not provided in java.util.Collections.
Extension utilities: writing a Collection decorator? Implementing Iterator? We can make that easier.
Caches: Local caching, done right, and supporting a wide variety of expiration behaviors.
Functional idioms: Used sparingly, Guava's functional idioms can significantly simplify code.
Concurrency: Powerful, simple abstractions to make it easier to write correct concurrent code.
ListenableFuture: Futures, with callbacks when they are finished.
Service: Things that start up and shut down, taking care of the difficult state logic for you.
Strings: A few extremely useful string utilities: splitting, joining, padding, and more.
Primitives: operations on primitive types, like int and char, not provided by the JDK, including unsigned variants for some types.
Ranges: Guava's powerful API for dealing with ranges on Comparable types, both continuous and discrete.
I/O: Simplified I/O operations, especially on whole I/O streams and files, for Java 5 and 6.
Hashing: Tools for more sophisticated hashes than what's provided by Object.hashCode(), including Bloom filters.
EventBus: Publish-subscribe-style communication between components without requiring the components to explicitly register with one another.
Math: Optimized, thoroughly tested math utilities not provided by the JDK.
Reflection: Guava utilities for Java's reflective capabilities.

27 May 2012

Continuous delivery/deployment

Have been reading the http://continuousdelivery.com/ book.  Really like the ideas:


Infrastructure as code

The desired state of your infrastructure should be specified through version-controlled configuration.

You should always know the desired state of your infrastructure through your instrumentation and monitoring.

Automated deployment of infrastructure includes
  • operating system configuration
  • middleware stack and its configuration: app servers, messaging systems,  databases
  • network infrastructure: routers, firewalls, switches, DNS, DHCP, DMZs
Acceptance tests

Keeping the complex acceptance tests running will take time from your development team. However, this cost is in the form of an investment which, in our experience, is repaid many times over in reduced maintenance costs, the protection that allows you to make wide-ranging changes to your application, and significantly higher quality. This follows our general principle of bringing the pain forward in the process. We know from experience that without excellent automated acceptance test coverage, one of three things happens:
  • Either a lot of time is spent trying to find and fix bugs at the end of the process when you thought you were done, or 
  • You spend a great deal of time and money on manual acceptance and regression testing, or 
  • You end up releasing poor-quality software.
It is difficult to enumerate the reasons that make a project warrant such an investment in automated acceptance testing. For the type of projects that we usually get involved in, our default is that automated acceptance testing and an implementation of the deployment pipeline are usually a sensible starting point. For projects of extremely short duration with a small team, maybe four or fewer developers, it may be overkill—you might instead run a few end-to-end tests as part of a single-stage CI process. But for anything larger than that, the focus on business value that automated acceptance testing gives to developers is so valuable that it is worth the costs. It bears repeating that large projects start out as small projects, and by the time a project gets large, it is invariably too late to retro-fit a comprehensive set of automated acceptance tests without a Herculean level of effort. We recommend that the use of automated acceptance tests created, owned, and maintained by the delivery team should be the default position for all of your projects.


Capacity Testing System

The capacity test system is usually the closest analog to your expected production system. As such, it is a very valuable resource. Further, if you follow our advice and design your capacity tests as a series of composable, scenario-based tests, what you really have is a sophisticated simulation of your production system. This is an invaluable resource for a whole variety of reasons. We have discussed already why scenario-based capacity testing is of importance, but given the much more common approach of benchmarking specific, technically focused interactions, it is worth reiterating. Scenario-based testing provides a simulation of real interactions with the system. By organizing collections of these scenarios into complex composites, you can effectively carry out experiments with as much diagnostic instrumentation as you wish in a production-like system. We have used this facility to help us perform a wide variety of activities:
  • Reproducing complex production defects
  • Detecting and debugging memory leaks
  • Longevity testing
  • Evaluating the impact of garbage collection
  • Tuning garbage collection
  • Tuning application configuration parameters
  • Tuning third-party application configuration, such as operating system, application server, and database configuration
  • Simulating pathological, worst-day scenarios
  • Evaluating different solutions to complex problems
  • Simulating integration failures
  • Measuring the scalability of the application over a series of runs with different hardware configurations
  • Load-testing communications with external systems, even though our capacity tests were originally intended to run against stubbed interfaces
  • Rehearsing rollback from complex deployments.
  • Selectively failing parts or the application to evaluate graceful degradation of service
  • Performing real-world capacity benchmarks in temporarily available production hardware so that we could calculate more accurate scaling factors for a longer-term, lower-specification capacity test environment
This is not a complete list, but each of these scenarios comes from a real project.

Release Strategy

The most important part of creating a release strategy is for the application’s stakeholders to meet up during the project planning process. The point of their discussions should be working out a common understanding concerning the deployment and maintenance of the application throughout its lifecycle. This shared understanding is then captured as the release strategy. This document will be updated and maintained by the stakeholders throughout the application’s life. When creating the first version of your release strategy at the beginning of the project, you should consider including the following:
  • Parties in charge of deployments to each environment, as well as in charge of the release.
  • An asset and configuration management strategy.
  • A description of the technology used for deployment. This should be agreed upon by both the operations and development teams.
  • A plan for implementing the deployment pipeline.
  • An enumeration of the environments available for acceptance, capacity, integration, and user acceptance testing, and the process by which builds will be moved through these environments.
  • A description of the processes to be followed for deployment into testing and production environments, such as change requests to be opened and approvals that need to be granted.
  • Requirements for monitoring the application, including any APIs or services the application should use to notify the operations team of its state.
  • A discussion of the method by which the application’s deploy-time and runtime configuration will be managed, and how this relates to the automated deployment process.
  • Description of the integration with any external systems. At what stage and how are they tested as part of a release? How do the operations personnel communicate with the provider in the event of a problem?
  • Details of logging so that operations personnel can determine the application’s state and identify any error conditions.
  • A disaster recovery plan so that the application’s state can be recovered following a disaster.
  • The service-level agreements for the software, which will determine whether the application will require techniques like failover and other high-availability strategies.
  • Production sizing and capacity planning: How much data will your live application create? How many log files or databases will you need? How much bandwidth and disk space will you need? What latency are clients expecting?
  • An archiving strategy so that production data that is no longer needed can be kept for auditing or support purposes.
  • How the initial deployment to production works.
  • How fixing defects and applying patches to the production environment will be handled.
  • How upgrades to the production environment will be handled, including data migration.
  • How application support will be managed.
Release Plan
  • The steps required to deploy the application for the first time
  • How to smoke-test the application and any services it uses as part of the deployment process
  • The steps required to back out the deployment should it go wrong
  • The steps required to back up and restore the application’s state
  • The steps required to upgrade the application without destroying the application’s state
  • The steps to restart or redeploy the application should it fail
  • The location of the logs and a description of the information they contain
  • The methods of monitoring the application
  • The steps to perform any data migrations that are necessary as part of the release
  • An issue log of problems from previous deployments, and their solutions
...

There are several methods of performing a rollback that we will discuss here.

The more advanced techniques—blue-green deployments and canary releasing—can also be used to perform zero-downtime releases and rollbacks. Before we start, there are two important constraints. The first is your data. If your release process makes changes to your data, it can be hard to roll back. Another constraint is the other systems you integrate with. With releases involving more than one system (known as orchestrated releases), the rollback process becomes more complex too.

There are two general principles you should follow when creating a plan for rolling back a release. The first is to ensure that the state of your production system, including databases and state held on the filesystem, is backed up before doing a release. The second is to practice your rollback plan, including restoring from the backup or migrating the database back before every release to make sure it works.

...

One way to approach this problem is to put the application into read-only mode shortly before switchover. You can then take a copy of the green database, restore it into the blue database, perform the migration, and then switch over to the blue system. If everything checks out, you can put the application back into read-write mode. If something goes wrong, you can simply switch back to the green database. If this happens before the application goes back into read-write mode, nothing more needs to be done. If your application has written data you want to keep to the new database, you will need to find a way to take the new records and migrate them back to the green database before you try the release again. Alternatively, you could find a way to feed transactions to both the new and old databases from the new version of the application.

Continuous Deployment

Continuous deployment isn’t for everyone. Sometimes, you don’t want to release new features into production immediately. In companies with constraints on compliance, approvals are required for deployments to production. Product companies usually have to support every release they put out. However, it certainly has the potential to work in a great many places.

The intuitive objection to continuous deployment is that it is too risky. But, as we have said before, more frequent releases lead to lower risk in putting out any particular release. This is true because the amount of change between releases goes down. So, if you release every change, the amount of risk is limited just to the risk inherent in that one change. Continuous deployment is a great way to reduce the risk of any particular release.

Perhaps most importantly, continuous deployment forces you to do the right thing (as Fitz points out in his blog post). You can’t do it without automating your entire build, deploy, test, and release process. You can’t do it without a comprehensive, reliable set of automated tests. You can’t do it without writing system tests that run against a production-like environment. That’s why, even if you can’t actually release every set of changes that passes all your tests, you should aim to create a process that would let you do so if you choose to.

Your authors were really delighted to see the continuous deployment article cause such a stir in the software development community. It reinforces what we’ve been saying about the release process for years. Deployment pipelines are all about creating a repeatable, reliable, automated system for getting changes into production as fast as possible. It is about creating the highest quality software using the highest quality process, massively reducing the risks of the release process along the way. Continuous deployment takes this approach to its logical conclusion. It should be taken seriously, because it represents a paradigm shift in the way software is delivered. Even if you have good reasons for not releasing every change you make—and there are less such reasons than you might think—you should behave as if you were going to do so.

17 May 2012

Growing a Team By Investing in Tools

Growing a Team By Investing in Tools
Where many teams struggle however is in finding the right balance of building out the product versus “sharpening the saw” by improving their tools and automating processes. With the same people attempting to do both jobs that contention is unavoidable.

The solution then is to build a second team who’s job is entirely focussed on building, maintaining, configuring and improving the tools that the product team uses. Since the majority of tools used by a project are fairly orthogonal to the project itself – and often shared with other product teams – the tool team can be much more independent, reducing the communication overhead. Not only do the product team now have more time to devote to improving the product, they are continuously made more productive because of the work done by the tools team.