28 December 2011
27 November 2011
23 November 2011
12 November 2011
10 Things That Good Bosses Do - CBS News
10 Things That Good Bosses Do - CBS News
Pay people what they're worth, not what you can get away with. What you lose in expense you gain back several-fold in performance. Take the time to share your experiences and insights. Labels like mentor and coach are overused. Let's be specific here. Employees learn from those generous enough to share their experiences and insights. They don't need a best friend or a shoulder to cry on. Tell it to employees straight, even when it's bad news. To me, the single most important thing any boss can do is to man up and tell it to people straight. No BS, no sugarcoating, especially when it's bad news or corrective feedback. Manage up ... effectively. Good bosses keep management off employee's backs. Most people don't get this, but the most important aspect of that is giving management what they need to do their jobs. That's what keeps management away. Take the heat and share the praise. It takes courage to take the heat and humility to share the praise. That comes naturally to great bosses; the rest of us have to pick it up as we go. Delegate responsibility, not tasks. Every boss delegates, but the crappy ones think that means dumping tasks they hate on workers, i.e. s**t rolls downhill. Good bosses delegate responsibility and hold people accountable. That's fulfilling and fosters professional growth. Encourage employees to hone their natural abilities and challenge them to overcome their issues. That's called getting people to perform at their best. Build team spirit. As we learned before, great groups outperform great individuals. And great leaders build great teams. Treat employees the way they deserve to be treated. You always hear people say they deserve respect and to be treated as equals. Well, some may not want to hear this, but a) respect must be earned, and b) most workers are not their boss's equals. Inspire your people. All the above motivate people, but few bosses have the ability to truly inspire their employees. How? By sharing their passion for the business. By knowing just what to say and do at just the right time to take the edge off or turn a tough situation around. Genuine anecdotes help a lot. So does a good sense of humor.
05 November 2011
22 October 2011
How to Determine and Set Up the Fastest DNS Server for Your Connection
Speed Up Your Web Browsing in a Few Clicks: A Brief Introduction to DNS
On Windows: the free DNS Jumper makes it a lot easier:
1. Download DNS Jumper, and extract it to any location on your hard drive. It's a portable application, so there's no need to install it—just start it up.
2. If you know what DNS server you want to use, pick it from the drop-down menu or type it in the boxes at the bottom. If not, hit the "Fastest DNS" button on the left. It'll check a number of different servers to find out which one is the fastest for you.
3. When it's done, click the "Apply DNS Servers" button to use the fastest server.
Sometimes, your ISP's default DNS server really is the fastest, but other times, it could be something else, so even if it ends up being the ones you already use, it was still worth running the test to find out. When you're done, you can delete the app or file it away for future use.
On Windows: the free DNS Jumper makes it a lot easier:
1. Download DNS Jumper, and extract it to any location on your hard drive. It's a portable application, so there's no need to install it—just start it up.
2. If you know what DNS server you want to use, pick it from the drop-down menu or type it in the boxes at the bottom. If not, hit the "Fastest DNS" button on the left. It'll check a number of different servers to find out which one is the fastest for you.
3. When it's done, click the "Apply DNS Servers" button to use the fastest server.
Sometimes, your ISP's default DNS server really is the fastest, but other times, it could be something else, so even if it ends up being the ones you already use, it was still worth running the test to find out. When you're done, you can delete the app or file it away for future use.
16 October 2011
Rip Rowan - Google+ - Stevey's Google Platforms Rant I was at Amazon for about…
Rip Rowan - Google+ - Stevey's Google Platforms Rant I was at Amazon for about…
So one day Jeff Bezos issued a mandate. He's doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion -- back around 2002 I think, plus or minus a year -- he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
1) All teams will henceforth expose their data and functionality through service interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4) It doesn't matter what technology they use. HTTP, Corba, Pubsub, custom protocols -- doesn't matter. Bezos doesn't care.
5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
6) Anyone who doesn't do this will be fired.
7) Thank you; have a nice day!
Ha, ha! You 150-odd ex-Amazon folks here will of course realize immediately that #7 was a little joke I threw in, because Bezos most definitely does not give a shit about your day.
#6, however, was quite real, so people went to work. Bezos assigned a couple of Chief Bulldogs to oversee the effort and ensure forward progress, headed up by Uber-Chief Bear Bulldog Rick Dalzell. Rick is an ex-Armgy Ranger, West Point Academy graduate, ex-boxer, ex-Chief Torturer slash CIO at Wal*Mart, and is a big genial scary man who used the word "hardened interface" a lot. Rick was a walking, talking hardened interface himself, so needless to say, everyone made LOTS of forward progress and made sure Rick knew about it.
Over the next couple of years, Amazon transformed internally into a service-oriented architecture. They learned a tremendous amount while effecting this transformation. There was lots of existing documentation and lore about SOAs, but at Amazon's vast scale it was about as useful as telling Indiana Jones to look both ways before crossing the street. Amazon's dev staff made a lot of discoveries along the way. A teeny tiny sampling of these discoveries included:
- pager escalation gets way harder, because a ticket might bounce through 20 service calls before the real owner is identified. If each bounce goes through a team with a 15-minute response time, it can be hours before the right team finally finds out, unless you build a lot of scaffolding and metrics and reporting.
- every single one of your peer teams suddenly becomes a potential DOS attacker. Nobody can make any real forward progress until very serious quotas and throttling are put in place in every single service.
- monitoring and QA are the same thing. You'd never think so until you try doing a big SOA. But when your service says "oh yes, I'm fine", it may well be the case that the only thing still functioning in the server is the little component that knows how to say "I'm fine, roger roger, over and out" in a cheery droid voice. In order to tell whether the service is actually responding, you have to make individual calls. The problem continues recursively until your monitoring is doing comprehensive semantics checking of your entire range of services and data, at which point it's indistinguishable from automated QA. So they're a continuum.
- if you have hundreds of services, and your code MUST communicate with other groups' code via these services, then you won't be able to find any of them without a service-discovery mechanism. And you can't have that without a service registration mechanism, which itself is another service. So Amazon has a universal service registry where you can find out reflectively (programmatically) about every service, what its APIs are, and also whether it is currently up, and where.
- debugging problems with someone else's code gets a LOT harder, and is basically impossible unless there is a universal standard way to run every service in a debuggable sandbox.
That's just a very small sample. There are dozens, maybe hundreds of individual learnings like these that Amazon had to discover organically. There were a lot of wacky ones around externalizing services, but not as many as you might think. Organizing into services taught teams not to trust each other in most of the same ways they're not supposed to trust external developers.
10 October 2011
09 October 2011
05 October 2011
29 September 2011
Elixir 2 • reghardware
Elixir 2 • reghardware
For me, the option to put widgets into the status bar is the killer feature. It’s a great way of saving screen real estate and means you have instant access to a vast selection of important stuff just by pulling down on the bar.
18 September 2011
11 September 2011
10 September 2011
Google Guava Libraries Essentials
Google Guava Libraries Essentials - Java Code Geeks
r to preconditions in a way that they can restrict what values are added to a collection. T
31 August 2011
30 August 2011
28 August 2011
20 August 2011
Pit mutation testing
PIT is an interesting tool. It modifies your code and then checks if the test passes. This is repeated many times to get a score, like you might use a code coverage measure.
16 August 2011
Greg Martin's blog - InfoSecurity 2.0: Why you don't steal from a hacker
Greg Martin's blog - InfoSecurity 2.0: Why you don't steal from a hacker
So during the London riots I return home the next morning to find my flat ransacked and my Macbook Pro laptop stolen!
Police showed up, took a report and dusted for prints, performed typical forensics... One thing they did not expect was that I had installed the amazing open source tracking software from http://preyproject.com
15 August 2011
Lessons in Software Reliability | Agile Zone
Lessons in Software Reliability | Agile Zone
Hire good developers and give them enough time to do a good job, including time to review and refactor.
Make sure the development team has training on the basics, that they understand the language and frameworks.
Regular code reviews (or pair programming, if you’re into it) for correctness and safety.
Use static analysis tools to find common coding mistakes and bug patterns.
Design for failure
Failures will happen: make sure that your design anticipates and handles failures. Identify failures, contain, retry, recover, restart. Contain failures, ensure that failures don’t cascade. Fail safe. Look for the simplest HA design alternative: do you need enterprise-wide clustering or virtual synchrony-based messaging, or can you rely on simpler active/standby shadowing with fast failover?
Keep it Simple
Attack complexity: where possible, apply Occam’s Razor, and choose the simplest path in design or construction or implementation. Simplify your technology stack, collapse the stack, minimize the number of layers and servers.
Test… test… test….
Testing for reliability goes beyond unit testing, functional and regression testing, integration, usability and UAT. You need to test everything you can every way you can think of or can afford to.
One of the best investments that we made was building a reference test environment, as big as, and as close to the production deployment configuration, as we could afford. This allowed us to do representative system testing with production or production-like workloads, as well as variable load and stress testing, operations simulations and trials.
Stress testing is especially important: identifying the real performance limits of the system, pushing the system to, and beyond, design limits, looking for bottlenecks and saturation points, concurrency problems – race conditions and deadlocks – and observing failure of the system under load. Watching the system melt down under extreme load can give you insight into architecture, design and implementation weaknesses.
Failure handing and failover testing – creating controlled failure conditions and checking that failure detection and failure handling mechanisms work correctly.
Get the development team, especially your senior technical leaders, working closely with operations staff: understanding operations' challenges, the risks that they face, the steps that they have to go through to get their jobs done. What information do they need to troubleshoot, to investigate problems? Are the error messages clear, are you logging enough useful information? How easy is it to startup, shutdown, recover and restart – the more steps, the more problems. Make it hard for operations to make mistakes: add checks and balances. Run through deployment, configuration and upgrades together: what seems straightforward in development may have problems in the real world.
Build in health checks – simple ways to determine that the system is in a healthy, consistent state, to be used before startup, after recovery / restart, after an upgrade. Make sure operations has visibility into system state, instrumentation, logs, alerts – make sure ops know what is going on and why.
When you encounter a failure in production, work together with the operations team to complete a Root Cause Analysis, a structured investigation where the team searches for direct and contributing factors to the failure, defines corrective and preventative actions. Dig deep, look past immediate causes, keep asking why. Ask: how did this get past your checks and reviews and testing? What needs to be changed in the product? In the way that it is developed? In the way that is implemented? Operated?
12 August 2011
DeskSMS Is the Best Phone-to-Desktop SMS Solution We've Seen Yet
DeskSMS Is the Best Phone-to-Desktop SMS Solution We've Seen Yet. Looks good, but don't use the "remove DeskSMS contacts" option, it will delete a lot of your contacts as opposed to removing DeskSMS email entries.
11 August 2011
05 August 2011
31 July 2011
22 July 2011
17 July 2011
16 July 2011
The LMAX Architecture
LMAX is a new retail financial trading platform. As a result it has to process many trades with low latency. The system is built on the JVM platform and centers on a Business Logic Processor that can handle 6 million orders per second on a single thread. The Business Logic Processor runs entirely in-memory using event sourcing. The Business Logic Processor is surrounded by Disruptors - a concurrency component that implements a network of queues that operate without needing locks. During the design process the team concluded that recent directions in high-performance concurrency models using queues are fundamentally at odds with modern CPU design.
04 June 2011
Command line tools for windows - coreutils
If you are used to using cygwin on windows you may also find it useful to install CoreUtils for Windows and place it on your path. That way you can use many *nix commands from a normal dos shell.
28 May 2011
19 May 2011
Design Meeting Patterns/Antipatterns | Javalobby
Design Meeting Patterns/Antipatterns | Javalobby
- Understand the Problem vs. Jump to the Solution
- Assume the Worst vs. Assume the Best
- Basing Decisions on the Current Situation vs. Basing Decisions on History
- Shooting for the “Best” Solution vs. the “Easiest” Solution
- Present Possible Solutions Objectively vs. My Solution is the Best
- Validating from Code vs. Validating from Memory
07 May 2011
Rambo Architecture
Amazon’s EC2 & EBS outage | Cloud Zone
That's a great quote. You got to build systems that expect dependencies to fail and test those scenarios.
I just read a post in Coding Horror which refers to a year old post in Netflix’s blog called “5 lessons we’ve learned using AWS [Amazon WebSevices]“. Netflix, in case you’re wondering survived Amazon’s outage and indeed, in lesson #3 they explain that if you want to survive failures you have to plan and constantly test for it:
3. The best way to avoid failure is to fail constantly. We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends. If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine.
That's a great quote. You got to build systems that expect dependencies to fail and test those scenarios.
02 May 2011
Rooting Nexus S problems
Anyone who has rooted their Nexus S using 1.0-XXJK8-nexuss-superboot and found that their camera and wifi stops working. Its because you need to use the correct version of superboot. Instructions can be found here http://forum.xda-developers.com/archive/index.php/t-882333.html. You can correct the problem by rerooting with the correct version. Also anyone who finds that on windows, the cmd line stays in "waiting for device" during fastboot mode, installing PDANet for Android fixes the problem as per http://forum.xda-developers.com/showthread.php?t=875580. Hope this saves you some time and heartache :)
30 April 2011
29 April 2011
[JavaSpecialists 191] - Delaying Garbage Collection Costs
[JavaSpecialists 191] - Delaying Garbage Collection Costs
In modern JVMs, the autoboxing cache size is configurable. It used to be that all values from -128 to 127 were cached, but nowadays we can specify a different upper bound. When we run the code with -XX:+AggressiveOpts, it simply increases this value.
25 April 2011
The Business and Technology of Low-Latency Trading « A-Team Group
The Business and Technology of Low-Latency Trading « A-Team Group
Interesting talking at the A-Team conference, this week. Lots of focus on equities but managed to squeeze a little FX in there ;). Enjoyed a presentation by Corvil discussing the art of measuring latency.
Interesting talking at the A-Team conference, this week. Lots of focus on equities but managed to squeeze a little FX in there ;). Enjoyed a presentation by Corvil discussing the art of measuring latency.
23 April 2011
The Five Traits That Get You Promoted to CEO
The Five Traits That Get You Promoted to CEO
# Passionate curiosity: Relentless questioning and being infectiously fascinated with everything around you, human nature in particular
# Battle-hardened confidence: Overcoming—and even relishing—adversity. CEOs most often ask job candidates how they've dealt with failure in the past.
# Team smarts: More than just being a team player, understanding how teams work and getting the most out of the team (in sports terms, being a playmaker)
# A simple mindset: Being concise, simple, and clear in your communications
# Fearlessness: Comfort with the unknown and taking calculated, informed risks; also, seeing opportunities and being proactive about positive change
22 April 2011
Discussion on code coverage
100% Code Coverage! | Javalobby. The fundamental point is you need to look at more than just code coverage. If you haven't though hard enough about your test cases you could have a system that is well covered (in the code coverage sense) but is not well tested.
If you have 100% coverage you don’t know if your system works, but you _do_ know that every line you wrote does what you thought it should.
Thus, code coverage and test are completely different things. Taken to the extreme, we could code-cover the entire application, that is achieve 100%, and not test a single line because we have no assertion!
Not all of our code is neither critical nor complex. In fact, some can even seen as downright trivial. Any doubt? Think about getters and setter then. Should they be tested, even though any IDE worth his salt can generate them faithfully for us? If your answer is yes, you should also consider testing the frameworks you use, because they are a bigger source of bugs than generated getters and setters.
13 April 2011
10 April 2011
ScaleBase Database Load Balancer
ScaleBase makes some bold claims about transparent sharding, no single point of failure and complete transparency to the application layer. I wonder what happens if you need a transaction between data on different shards?
09 April 2011
Java Floating-Point Number Intricacies Summary
Java Floating-Point Number Intricacies, nice short synopsis.
Centos 5.6
The CentOS team is pleased to announce the availability of CentOS 5.6. Major changes in CentOS 5.6 compared to CentOS 5.5 include:Update process from 5.5:
ext4 is now a fully supported file system
libvirt was updated to 0.8.2
bind was updated to 9.7 and supports NSEC3 now.
ebtables was added
php53 is available as a php replacement.
System Security Services Daemon (SSSD) has been added.
Other upgrades include newer version of several wireless drivers, Samba3x, ghostscript, LVM, mod_nss, subversion and gcc, plus others.
- yum clean all
- yum update glibc\*
- yum update yum\* rpm\* pyth\*
- yum clean all
- yum update mkinitrd nash
- yum update selinux\*
- yum update
- shutdown -r now
02 April 2011
01 April 2011
30 March 2011
101: What is Latency? « A-Team Group
101: What is Latency? « A-Team Group
Latency, n. The delay between the receipt of a stimulus and the response to it.
Network Latency
Whether local area, wide area, or metropolitan, owned or managed, lit or dark, your network is the piece that physically joins your components together, transporting bits from A to B. Networks and their associated components (switches, routers, firewalls and so on) tend to introduce three types of delay between stimuli and responses: serialisation delay, propagation delay, and queueing delay.
Serialisation delay is the time it takes to put a set of bits onto the physical medium (typically fibre optic cable). This is determined entirely by the number of bits and the data rate of the link. For example, to serialise a 1500 byte packet (12,000 bits) on a 1Gbit/second link takes 12 us (microseconds, or millionths of a second). Bump the link speed up to 10Gb/s and your serialisation delay drops by an order of magnitude, to 1.2 us.
But serialisation only covers putting the bits into the pipe. Before they can come out, they have to reach the other end, and that’s where propagation delay comes into play. The speed of propagation of light in a fibre is about two-thirds that in a vacuum, so roughly 200 million kilometres per second (which is still pretty fast, to be fair!). To put it another way, it takes light in a fibre about half a microsecond to travel 100 meters.
Pause and think about the relative sizes of serialisation and propagation delay for a moment. Over short distances (e.g. in a LAN environment) the former is much larger than the latter, even at 10 Gb/s link speeds. That’s one of the reasons why moving from 1 Gb/s to 10 Gb/s can have a big impact on network latency in LANs. This advantage diminishes significantly with distance, however – on a 100km fiber, propagation delay is going to be on the order of 500 us, or half a millisecond. At 1Gb/s link speeds serialisation plus propagation delay for a 1500 byte packet is about 512 us; moving up to 10 Gb/s reduces this to about 501.2 us – hardly a massive improvement.
There’s not a lot you can do about propagation latency – it’s a result of physical processes that you can’t change. The only real option is to move your processing closer to the source of the data – basically what’s happening with the move towards collocation. Even this has challenges though – if your trading strategy depends on data from multiple exchanges, where should you collocate? An interesting recent study from MIT suggests the optimal approach may be a new location somewhere between the two!
The final contribution to network latency is queuing delay. Consider two data sources sending data to a single consumer which has a single network connection. If both send a packet of data at the same time, one of those packets has to be queued at the switch which connects them to the consumer. The length of time for which any packet is queued is dependent on two factors: the number of packets which are in the queue ahead of it, and the data rate of the output link – this is the other reason why increasing data rates helps reduce network latency, because it reduces queuing delays. Note that there’s a crucial difference between serialisation delay, propagation delay and queuing delay – the first two are deterministic, the latter is variable, being dependent on how busy the network is.
Protocol Latency
Network links just provide dumb pipes for getting bits from A to B. In order to bring some order to these bits, various layers of protocols are used to deal with things like where the bits should go, how to get them in the right order, deal with losses, and so on. In the vast majority of cases these protocols were designed with the goal of ensuring the smooth and reliable flow of data between endpoints, and not with minimising the latency of that flow, so they can and do introduce delays through a variety of mechanisms.
The first such mechanism is protocol data overhead. Each protocol layer adds a number of bytes to the packet to carry management information between the two endpoints. In a TCP/IP connection (the most common type of connection) this overhead adds 40 bytes to each packet. This is additional data that is subject to the serialisation delay discussed above. The relevance of this overhead is very much dependent on the size of data packets – for example, if data is being sent in 40-byte chunks, the TCP/IP overhead will double the serialisation delay for each chunk, whereas if it’s being sent in 1500-byte chunks, TCP/IP will only increase serialisation delay by around 3%.
It is possible to enable header compression on most modern network devices. This can have a significant impact on packet sizes, reducing the 40-byte TCP/IP header down to 2-4 bytes – for small data payloads this might halve the packet size. However, in latency terms this is unlikely to have much impact as the reduction in serialisation delay would be offset by the time taken to compress and decompress the header at each end of the link.
Clearly, then, if you have a lot of data to send it’s preferable to send it in one large packet rather than lots of small ones. However, it obviously doesn’t make sense from a latency perspective, to delay sending one piece of data until you have more available, just so you can fill a larger packet. Unfortunately, that’s exactly what TCP does in some configuration - using a process called Nagle’s algorithm, a TCP sender will delay sending data until it has enough to fill the largest packet size as long as it still has some data waiting to be acknowledged by the receiver. Thankfully for those with latency-sensitive applications this option can be turned off in most implementations.
One of the uses of data that contributes to protocol overhead is to implement a mechanism called congestion control. In order to prevent networks becoming congested with data that can’t be delivered to a destination, each TCP connection has a data ‘window’ which is the number of bytes that the sender is allowed to transmit before it must wait for permission from the receiver to send more. This permission flows back from the receiver to the sender in the form of acknowledgement or ACKs, which are typically sent when a packet is correctly received. In a well-functioning network, with constant data flow in both directions, this mechanism works extremely well.
There are circumstances, however, where it can introduce problems. Imagine a window size set to 15,000 bytes, and a data producer sending 1,500 byte packets. If the producer is able to send ten packets before the consumer receives the first packet (that is, if the ‘pipe’ can accommodate more than ten packets at a time), then it will have to stop after the tenth and wait until it gets permission to go again. This can add significant latency to some packets. In order to avoid this ‘window exhaustion’ it is necessary to configure the parameters of the TCP stack to align with the network connections – specifically, the window size has to be greater than the delay-bandwidth product, the number you get when you multiply the one-way propagation delay by the data rate.
As an example, a 100 Mb/s link from New York to Chicago with a one-way latency of 12 ms (milliseconds) requires a TCP window size of 1.2 Mbits, or 150 KB. If you had a connection like this, and were using a more standard window size of 64 KB, then your data transfer could stall due to window exhaustion every 5 ms (time taken to send 64 KB at 100Mb/s), with each stall introducing 12 ms of additional latency.
The final way in which protocols can introduce latency is through packet loss. As mentioned earlier, there are occasions when packets need to be queued at switches or routers before they can be forwarded. Physically, this means holding a copy of the packet in memory somewhere. Since network devices have finite memory, there is a limit to the number of packets which can be queued. If a packet arrives and there is no more space available, it will be discarded. In this situation, TCP will eventually detect the missing packet at the receiver, and a re-transmission will occur; however, the delay between original transmission and re-transmission is likely to be on the order of a least three round trip times (RTTs), or six times the one-way propagation delay. As a result, packet loss can be one of the biggest contributors to network latency, albeit on a sporadic basis.
In a network which you own and control the likelihood of packet loss can be minimised by ensuring appropriate capacity is provisioned on all links and switches. If your network includes managed components, especially in a WAN, this is much more difficult to achieve, although your service provider will likely provide some SLAs on packet loss.
It’s worth noting that all of the preceding discussion is focussed on TCP, the Transmission Control Protocol. This protocol was designed to ensure guaranteed delivery of data between applications, rather than timely delivery. Many trading applications use UDP (the User Datagram Protocol) as an alternative to TCP. UDP does not guarantee delivery, so it doesn’t use any of the windowing or retransmission discussed above. As a result, UDP is less latency sensitive, but it is subject to unrecoverable packet loss – if data is discarded due to queuing, there is no way for it to be re-transmitted, of the for the sender to be aware that this happened.
Operating System (OS) Latency
When you’re deploying a trading application, the code has to run on something. That something is typically a set of servers, and those servers have operating systems that sit between your code and the hardware. These OSes are typically designed to provide functionality which makes it easy to run multiple applications on almost any type of hardware; in other words, like TCP, they’re optimised for flexibility and resilience rather than speed. As a result, they can introduce latency through a number of mechanisms.
All modern operating systems are multi-tasking, meaning they can do multiple things ‘simultaneously’. Since servers generally have more things to do than CPUs to do them on, this generally means that any running code can be suspended by the OS to allow another piece of code can be run. This pre-emptive scheduling can introduce variable delays into code. Note that this can be a problem even if your server is only running one application. The reasons is that the application still depends on various inputs and outputs (to users via a keyboard/monitor, to databases, to other servers via the network), and the OS has to make sure the drivers that manage all that I/O get some CPU time. In addition, the OS occasionally has to do some housekeeping work and in some cases this work is non-preemptable, which locks your application out of running until the OS has finished. All told, these types of OS operations can add tens of milliseconds latency to transactions being processed by your application. And, to make matters worse, this latency can be highly intermittent, making it very difficult to address
There are some things you can do to alleviate the problem of OS latency, but most of them are OS-specific – for example, in Windows processes can be set to Realtime priority, or in Linux the kernel can be compiled to allow pre-emption of OS tasks. Neither of these approaches will completely remove OS-latency, but they can reduce it to the millisecond or sub-millisecond region, which may be acceptable for your application. To get beyond this really requires the use of a real-time operating system (RTOS) such as QNX or RTLinux – these are more typically found in embedded systems and are beyond the scope of this article.
Many trading applications are written in programming languages that are executed in a runtime environment that creates yet another layer of complexity between the code and OS. These runtime environments include, for example, Java Virtual Machines (JVMs) for code written in Java, and Windows .Net Common Language Runtime (CLR) for code written in C#. These environments are designed to make code more stable and secure by, among other things, eliminating common programming problems. One of the most common functions of these runtime environments is automatic memory management or garbage collection. This refers to a mechanism whereby the runtime environment monitors the memory being used by an application and periodically tidies it up, reclaiming any memory which the application no longer needs. While this process improves program stability (memory management being one of the most common categories of coding defects), it has a latency cost because the application must be temporarily suspended which garbage collection takes place.
Some work has been done in these environments to minimise the impact of garbage collection, and the Java community has gone as far as creating a separate Real-Time Specification for Java (RTSJ). However, all other things being equal, code that is running in a managed environment will tend to incur more latency problems than code that is directly under OS control. In this case you need to make a trade-off between the improved stability (and possibly faster development cycles) provided by Java/C# and the improved latency of something like C/C++.
Application Latency
Phew – all that latency already in the system and we haven’t even talked about your application yet! Thankfully, application latency is one piece of the equation that’s mostly within your own control, and it also tends to be introduced through a small number of mechanisms.
One common theme in IT systems is that things slow down by at least an order of magnitude when you have to access disks rather than memory – database access is a prime example of this. The reason is simply down to the mechanical nature of disks, as opposed to electronic memory. Designing applications to minimise disk access is therefore a common pattern in low-latency systems – in fact, most high-frequency or flow-based applications will have no databases in the main flow, deferring all data persistence to post-execution. This also tends to mean that applications are very memory hungry – data has to be stored somewhere while it’s being processed, and you don’t want it hitting the disk. Where database access is required and latency has to be minimised many developers are now turning to in-memory databases which, as the name suggests, store all of their data in memory rather than on disk. The increasing penetration of solid-state disks (SSDs – devices which appear to the computer to be a standard magnetic disk, but use non-volatile solid-state memory for storage) provides a possible compromise between in-memory DBs and standard DBs using magnetic disks.
Inter-process communication (IPC) is another area that can have a substantial impact on application performance. Typically, trading applications have multiple components (market data acquisition, pricing engines, risk, order routers, market gateways and many more) and data has to be passed between them. When the processes concerned are on the same server this can be a relatively efficient exchange; when (as is often the case) they are on different servers, then the communication can incur significant latency penalties, as it hits all the OS and protocol overheads discussed previously. Remote Direct Memory Access (RDMA) is a combined hardware/software approach that bypasses all of the OS-related latency penalties by allowing the sending process to write data directly into the memory space of the destination process.
Latency, n. The delay between the receipt of a stimulus and the response to it.
Network Latency
Whether local area, wide area, or metropolitan, owned or managed, lit or dark, your network is the piece that physically joins your components together, transporting bits from A to B. Networks and their associated components (switches, routers, firewalls and so on) tend to introduce three types of delay between stimuli and responses: serialisation delay, propagation delay, and queueing delay.
Serialisation delay is the time it takes to put a set of bits onto the physical medium (typically fibre optic cable). This is determined entirely by the number of bits and the data rate of the link. For example, to serialise a 1500 byte packet (12,000 bits) on a 1Gbit/second link takes 12 us (microseconds, or millionths of a second). Bump the link speed up to 10Gb/s and your serialisation delay drops by an order of magnitude, to 1.2 us.
But serialisation only covers putting the bits into the pipe. Before they can come out, they have to reach the other end, and that’s where propagation delay comes into play. The speed of propagation of light in a fibre is about two-thirds that in a vacuum, so roughly 200 million kilometres per second (which is still pretty fast, to be fair!). To put it another way, it takes light in a fibre about half a microsecond to travel 100 meters.
Pause and think about the relative sizes of serialisation and propagation delay for a moment. Over short distances (e.g. in a LAN environment) the former is much larger than the latter, even at 10 Gb/s link speeds. That’s one of the reasons why moving from 1 Gb/s to 10 Gb/s can have a big impact on network latency in LANs. This advantage diminishes significantly with distance, however – on a 100km fiber, propagation delay is going to be on the order of 500 us, or half a millisecond. At 1Gb/s link speeds serialisation plus propagation delay for a 1500 byte packet is about 512 us; moving up to 10 Gb/s reduces this to about 501.2 us – hardly a massive improvement.
There’s not a lot you can do about propagation latency – it’s a result of physical processes that you can’t change. The only real option is to move your processing closer to the source of the data – basically what’s happening with the move towards collocation. Even this has challenges though – if your trading strategy depends on data from multiple exchanges, where should you collocate? An interesting recent study from MIT suggests the optimal approach may be a new location somewhere between the two!
The final contribution to network latency is queuing delay. Consider two data sources sending data to a single consumer which has a single network connection. If both send a packet of data at the same time, one of those packets has to be queued at the switch which connects them to the consumer. The length of time for which any packet is queued is dependent on two factors: the number of packets which are in the queue ahead of it, and the data rate of the output link – this is the other reason why increasing data rates helps reduce network latency, because it reduces queuing delays. Note that there’s a crucial difference between serialisation delay, propagation delay and queuing delay – the first two are deterministic, the latter is variable, being dependent on how busy the network is.
Protocol Latency
Network links just provide dumb pipes for getting bits from A to B. In order to bring some order to these bits, various layers of protocols are used to deal with things like where the bits should go, how to get them in the right order, deal with losses, and so on. In the vast majority of cases these protocols were designed with the goal of ensuring the smooth and reliable flow of data between endpoints, and not with minimising the latency of that flow, so they can and do introduce delays through a variety of mechanisms.
The first such mechanism is protocol data overhead. Each protocol layer adds a number of bytes to the packet to carry management information between the two endpoints. In a TCP/IP connection (the most common type of connection) this overhead adds 40 bytes to each packet. This is additional data that is subject to the serialisation delay discussed above. The relevance of this overhead is very much dependent on the size of data packets – for example, if data is being sent in 40-byte chunks, the TCP/IP overhead will double the serialisation delay for each chunk, whereas if it’s being sent in 1500-byte chunks, TCP/IP will only increase serialisation delay by around 3%.
It is possible to enable header compression on most modern network devices. This can have a significant impact on packet sizes, reducing the 40-byte TCP/IP header down to 2-4 bytes – for small data payloads this might halve the packet size. However, in latency terms this is unlikely to have much impact as the reduction in serialisation delay would be offset by the time taken to compress and decompress the header at each end of the link.
Clearly, then, if you have a lot of data to send it’s preferable to send it in one large packet rather than lots of small ones. However, it obviously doesn’t make sense from a latency perspective, to delay sending one piece of data until you have more available, just so you can fill a larger packet. Unfortunately, that’s exactly what TCP does in some configuration - using a process called Nagle’s algorithm, a TCP sender will delay sending data until it has enough to fill the largest packet size as long as it still has some data waiting to be acknowledged by the receiver. Thankfully for those with latency-sensitive applications this option can be turned off in most implementations.
One of the uses of data that contributes to protocol overhead is to implement a mechanism called congestion control. In order to prevent networks becoming congested with data that can’t be delivered to a destination, each TCP connection has a data ‘window’ which is the number of bytes that the sender is allowed to transmit before it must wait for permission from the receiver to send more. This permission flows back from the receiver to the sender in the form of acknowledgement or ACKs, which are typically sent when a packet is correctly received. In a well-functioning network, with constant data flow in both directions, this mechanism works extremely well.
There are circumstances, however, where it can introduce problems. Imagine a window size set to 15,000 bytes, and a data producer sending 1,500 byte packets. If the producer is able to send ten packets before the consumer receives the first packet (that is, if the ‘pipe’ can accommodate more than ten packets at a time), then it will have to stop after the tenth and wait until it gets permission to go again. This can add significant latency to some packets. In order to avoid this ‘window exhaustion’ it is necessary to configure the parameters of the TCP stack to align with the network connections – specifically, the window size has to be greater than the delay-bandwidth product, the number you get when you multiply the one-way propagation delay by the data rate.
As an example, a 100 Mb/s link from New York to Chicago with a one-way latency of 12 ms (milliseconds) requires a TCP window size of 1.2 Mbits, or 150 KB. If you had a connection like this, and were using a more standard window size of 64 KB, then your data transfer could stall due to window exhaustion every 5 ms (time taken to send 64 KB at 100Mb/s), with each stall introducing 12 ms of additional latency.
The final way in which protocols can introduce latency is through packet loss. As mentioned earlier, there are occasions when packets need to be queued at switches or routers before they can be forwarded. Physically, this means holding a copy of the packet in memory somewhere. Since network devices have finite memory, there is a limit to the number of packets which can be queued. If a packet arrives and there is no more space available, it will be discarded. In this situation, TCP will eventually detect the missing packet at the receiver, and a re-transmission will occur; however, the delay between original transmission and re-transmission is likely to be on the order of a least three round trip times (RTTs), or six times the one-way propagation delay. As a result, packet loss can be one of the biggest contributors to network latency, albeit on a sporadic basis.
In a network which you own and control the likelihood of packet loss can be minimised by ensuring appropriate capacity is provisioned on all links and switches. If your network includes managed components, especially in a WAN, this is much more difficult to achieve, although your service provider will likely provide some SLAs on packet loss.
It’s worth noting that all of the preceding discussion is focussed on TCP, the Transmission Control Protocol. This protocol was designed to ensure guaranteed delivery of data between applications, rather than timely delivery. Many trading applications use UDP (the User Datagram Protocol) as an alternative to TCP. UDP does not guarantee delivery, so it doesn’t use any of the windowing or retransmission discussed above. As a result, UDP is less latency sensitive, but it is subject to unrecoverable packet loss – if data is discarded due to queuing, there is no way for it to be re-transmitted, of the for the sender to be aware that this happened.
Operating System (OS) Latency
When you’re deploying a trading application, the code has to run on something. That something is typically a set of servers, and those servers have operating systems that sit between your code and the hardware. These OSes are typically designed to provide functionality which makes it easy to run multiple applications on almost any type of hardware; in other words, like TCP, they’re optimised for flexibility and resilience rather than speed. As a result, they can introduce latency through a number of mechanisms.
All modern operating systems are multi-tasking, meaning they can do multiple things ‘simultaneously’. Since servers generally have more things to do than CPUs to do them on, this generally means that any running code can be suspended by the OS to allow another piece of code can be run. This pre-emptive scheduling can introduce variable delays into code. Note that this can be a problem even if your server is only running one application. The reasons is that the application still depends on various inputs and outputs (to users via a keyboard/monitor, to databases, to other servers via the network), and the OS has to make sure the drivers that manage all that I/O get some CPU time. In addition, the OS occasionally has to do some housekeeping work and in some cases this work is non-preemptable, which locks your application out of running until the OS has finished. All told, these types of OS operations can add tens of milliseconds latency to transactions being processed by your application. And, to make matters worse, this latency can be highly intermittent, making it very difficult to address
There are some things you can do to alleviate the problem of OS latency, but most of them are OS-specific – for example, in Windows processes can be set to Realtime priority, or in Linux the kernel can be compiled to allow pre-emption of OS tasks. Neither of these approaches will completely remove OS-latency, but they can reduce it to the millisecond or sub-millisecond region, which may be acceptable for your application. To get beyond this really requires the use of a real-time operating system (RTOS) such as QNX or RTLinux – these are more typically found in embedded systems and are beyond the scope of this article.
Many trading applications are written in programming languages that are executed in a runtime environment that creates yet another layer of complexity between the code and OS. These runtime environments include, for example, Java Virtual Machines (JVMs) for code written in Java, and Windows .Net Common Language Runtime (CLR) for code written in C#. These environments are designed to make code more stable and secure by, among other things, eliminating common programming problems. One of the most common functions of these runtime environments is automatic memory management or garbage collection. This refers to a mechanism whereby the runtime environment monitors the memory being used by an application and periodically tidies it up, reclaiming any memory which the application no longer needs. While this process improves program stability (memory management being one of the most common categories of coding defects), it has a latency cost because the application must be temporarily suspended which garbage collection takes place.
Some work has been done in these environments to minimise the impact of garbage collection, and the Java community has gone as far as creating a separate Real-Time Specification for Java (RTSJ). However, all other things being equal, code that is running in a managed environment will tend to incur more latency problems than code that is directly under OS control. In this case you need to make a trade-off between the improved stability (and possibly faster development cycles) provided by Java/C# and the improved latency of something like C/C++.
Application Latency
Phew – all that latency already in the system and we haven’t even talked about your application yet! Thankfully, application latency is one piece of the equation that’s mostly within your own control, and it also tends to be introduced through a small number of mechanisms.
One common theme in IT systems is that things slow down by at least an order of magnitude when you have to access disks rather than memory – database access is a prime example of this. The reason is simply down to the mechanical nature of disks, as opposed to electronic memory. Designing applications to minimise disk access is therefore a common pattern in low-latency systems – in fact, most high-frequency or flow-based applications will have no databases in the main flow, deferring all data persistence to post-execution. This also tends to mean that applications are very memory hungry – data has to be stored somewhere while it’s being processed, and you don’t want it hitting the disk. Where database access is required and latency has to be minimised many developers are now turning to in-memory databases which, as the name suggests, store all of their data in memory rather than on disk. The increasing penetration of solid-state disks (SSDs – devices which appear to the computer to be a standard magnetic disk, but use non-volatile solid-state memory for storage) provides a possible compromise between in-memory DBs and standard DBs using magnetic disks.
Inter-process communication (IPC) is another area that can have a substantial impact on application performance. Typically, trading applications have multiple components (market data acquisition, pricing engines, risk, order routers, market gateways and many more) and data has to be passed between them. When the processes concerned are on the same server this can be a relatively efficient exchange; when (as is often the case) they are on different servers, then the communication can incur significant latency penalties, as it hits all the OS and protocol overheads discussed previously. Remote Direct Memory Access (RDMA) is a combined hardware/software approach that bypasses all of the OS-related latency penalties by allowing the sending process to write data directly into the memory space of the destination process.
13 March 2011
Overflow detection during java arithmetic
Joda's FieldUtils class has code that attempts to detect overflow
07 March 2011
Chronon... A 'flight data recorder' for Java programs
Introducing Chronon: The Time Travelling Debugger for Java | Javalobby
Chronon is a revolutionary new technology that consists of:
* A 'flight data recorder' for Java programs which can record every line of code executed inside a program and save it to a file on the disk. The file can be shared among developers and played back in our special time travelling debugger to instantly find the root cause of an issue. This also means that no bugs ever need to be reproduced!
* A Time Travelling Debugger, with a novel UI that plugs seamlessly into Eclipse, which allows you to playback the recordings. It not only step back and forward but to any point in the execution of your program.
Chronon marks the begining of the end of 'Non-Reproducible bugs'. A Chronon recording can be shared among all the members of your team and they can debug an issue in parallel. We see Chronon being used all the way from Development, QA to ultimately running full time in Production.
Custom AST transformations with Project Lombok
Custom AST transformations with Project Lombok describes an interesting hybrid between practices such as build-time code generation and runtime bytecode enhancement. Probably wouldn't use it in production myself but interesting nonetheless.
...Lombok doesn't just generate Java sources or bytecode: it transforms the Abstract Syntax Tree (AST), by modifying its structure at compile-time...
...By modifying (or transforming) the AST, Lombok keeps your source code trim and free of bloat, unlike plain-text code-generation. Lombok's generated code is also visible to classes within the same compilation unit, unlike direct bytecode manipulation with libraries like CGLib or ASM...
...Lombok doesn't just generate Java sources or bytecode: it transforms the Abstract Syntax Tree (AST), by modifying its structure at compile-time...
...By modifying (or transforming) the AST, Lombok keeps your source code trim and free of bloat, unlike plain-text code-generation. Lombok's generated code is also visible to classes within the same compilation unit, unlike direct bytecode manipulation with libraries like CGLib or ASM...
Test Readability
When over half your codebase is composed of unit-tests its important to apply good coding practices to your unit-tests as much to non-test code.
Making Test Driven Development Work: Test Readability | Javalobby
Making Test Driven Development Work: Test Readability | Javalobby
A key characteristic of TDD that works is test readability. By focusing on test readability, a developer is forced to think about the object under test in a way that will promote good design and provide valuable documentation for the components of the system.
19 February 2011
How type erasure can be avoided
How type erasure can be avoided: Using TypeTokens to retrieve generic parameters - JQuantLib
13 February 2011
12 February 2011
Strategies Against Architecture & Interactions Over Processes and Tools | Architects Zone
Strategies Against Architecture & Interactions Over Processes and Tools | Architects Zone
Although it is a simple value, the idea that individuals and interactions are more significant than processes and tools is overlooked perhaps more often than it is valued. Of course, processes and tools make a difference –– sometimes a very big difference –– but what determines whether a process or tool is effective is related to the individuals and interactions. To best achieve agility you need to start with the current context and understand how people actually behave in response to their environment, their beliefs and one another.
07 February 2011
09 January 2011
Subscribe to:
Posts (Atom)