estimation – Joe Blogs

Lego Flow Game

We run regular Delivery Methodology sessions for a mixture of Delivery Managers and other folk involved in running Delivery Teams. It’s the beginning of a Community of Practice around how we deliver.

One of the items that someone added to our list for discussion recently was about how we forecast effort, in order to predict delivery dates. Straight away I was thinking about how we shouldn’t necessarily be forecasting effort, as this doesn’t account for all of the time when things spend blocked, or just not being worked on.

Instead we should be trying to forecast the flow of work.

We’d been through a lot of this before, but we have bunch of new people in the teams now, and it seemed like a good idea for a refresher. My colleague Chris Cheadle had spotted the Lego Flow Game, and we were both keen to put our Lego advent calendars to good use, so we decided to run this as an introduction to the different ways in which work can be batched and managed, and the effect that might then have on how the work flows.

Lego Advent Calendar

The Lego Flow Game was created by Karl Scotland and Sallyann Freudenberg, and you can read all of the details of how to run it on Karl’s page. It makes sense to look at how the game works before reading about how we got on.

We ran the game as described here, but Chris adapted Karl’s slides very slightly to reflect the roles and stages involved in our delivery stream, and he tweaked the analyst role slightly so they were working from a prioritised ‘programme plan’.

Round 1 – Waterfall

Maybe we’re just really bad at building Lego, but we had to extend the time slightly to deliver anything at all in this first round! Extending the deadline, to meet a fixed scope, anyone?

The reason we only got two items into test and beyond was that the wrong kits were selected during the ‘Analysis’ phase for three items. The time we spent planning and analysing these items was essentially wasted effort, as we didn’t deliver them.

The pressure of dealing with a whole batch of work at that early stage took it’s toll. This is probably a fairly accurate reflection of trying to do a big up-front analysis under lots of pressure, and then paying the price later for not getting everything right.

It was also noticeable that because of the nature of the ‘waterfall rules’, people working on the later stages of delivery were sat idle for the majority of the round – what a waste!

Our Cumulative Flow Diagram (CFD) for the Waterfall Round looked like this –

You can see how we only delivered two items, and these weren’t delivered until 7:00 – no early feedback from the market in this round!

CFDs are a really useful tool for monitoring workflow and showing progress. I tend to use a full CFD to examine the flow of work through a team and for spotting bottlenecks, and a trimmed down CFD without the intermediate stages (essentially a burn-up chart) for demonstrating and forecasting progress with the team and stakeholders.

Round 2 – Time-boxed

We did three three-minute time-boxes during this round. Before we started the first time-box we estimated we’d complete three items. We only completed one – our estimation sucked!

In the second time-box we estimated we’d deliver two items and managed to deliver two, just!

Before the third time-box we discussed some improvements and estimated that we’d deliver three again. We delivered two items – almost three!

Team members were busier in this round, as items were passed through as they were ready to be worked on.

The CFD looks a bit funny as I think we still rejected items that were incorrectly analysed (although Karl’s rules say we could pass rejected work back for improvement)

The first items were delivered after 3:00 and you can the regular delivery intervals at 6:00 and 9:00, typical of a time-boxed approach.

Round 3 – Flow

During the flow round, people retained their specialisms, but each team member was very quick to help out at other stages, in order to keep the work flowing as quickly as possible.

Initially, those working in the earlier stages took a little getting used to the idea of not building up queues, but we soon got the hang of it.

The limiting of WIP to a single item in each stage forced us to swarm onto the tricky items. Everyone was busier – it ‘felt faster’.

We’ve had some success with this in our actual delivery teams – the idea of Developers helping out with testing, in order to keep queue sizes down – but I must admit it’s sometimes tricky to get an entire team into the mindset of working outside their specialisms, ‘for the good of the flow’.

Here’s the CFD –

The total items delivered was 7, which blows away the other rounds.

You can see we were delivering items into production as early as 2:00 into the round. So not only did we deliver more in total, but we got products to market much earlier. This is so useful in real life as we can be getting early feedback, which helps us to build even better products and services.

The fastest cycle time for an individual item was 2:00

A caveat

Delivering faster in the final round could be partly down to learning and practice – I know I was getting more familiar with building some of the Lego kits.

With this in mind, it would be interesting to run the session with a group who haven’t done it before, but doing the rounds in reverse order. Or maybe have multiple groups doing the rounds in different orders.

What else did we learn

* Limiting WIP really does work. The challenge is to take that into a real setting where specialists are delivering real products.

* I’ve used other kanban simulation tools like the coin-flip game and GetKanban. This Lego Flow Game seemed to have enough complexity to make it realistic, but kept it simple enough to be able to focus on what we’re learning from the exercise.

* Identifying Lego pieces inside plastic tubs is harder than you’d think.

Overall a neat and fun exercise, to get the whole team thinking about how work flows, and how their work fits into the bigger picture of delivering a product.

Sizing, Estimation and Forecasting

The story so far

Over the last few years we’ve tried a variety of estimation and planning techniques. We’ve suffered from our fair share of Estimation Anti-patterns and tried various approaches to avoid these.

I thought it’d be useful to outline some of the approaches we’ve tried, the problems we’ve encountered, and how we’ve reacted to those in order to get to where we are now.

2010

Back in 2010 estimates were forced to fit a previously agreed plan:

“What’s the estimate”

“60 days”

“It needs to be 30, go away and re-estimate it”

This is a cross between the Target Estimation and Comedy-driven Estimation anti-patterns, and obviously it’s just a big farce – what’s the point in estimating in the first place if you’re just going to have a fixed time, scope and resource all imposed on you.

This approach led to teams and individuals being put under a great deal of pressure, and generated bad feeling between the people who imposed the ‘estimates’ and those who had to stick to them.

Of course corners were cut in order to meet the fixed estimates, which led to further technical debt, which just exacerbated the whole problem for future projects – the Done-driven estimation anti-pattern.

2011

During 2011 we gradually moved away from ‘fixed’ estimates. We introduced a few fairly standard ideas –

Estimating in ideal days

We started estimating in ideal days, to take into account of the fact that a Developer doesn’t get to spend their entire day dedicated to the estimated item that they’re currently working on.

This worked okay, once we finally hammered out the exact definition of an ideal day…

“Does an ideal day include meetings?”

“But what if the meeting relates to the story they’re working on?”

Having the people who are going to do the work doing the estimation

We tried to throw out the idea that a single individual could estimate a project more accurately and precisely than the developers who were familiar with the codebase, and who were about to do the work.

Estimates would still be questioned by people who weren’t going to do the work. We’d get Architects or Managers questioning why Developers thought something would take, for example, 3 days –

“That story’s just a few lines of code isn’t it”

This was frustrating, and we probably did waste more than a few hours justifying estimates to people outside the team.

Planning poker to derive estimates from the group, not individuals

The introduction of planning poker was quite good fun to start with. It bought the team together and helped to alleviate some of the discussions and justification that we had to go through.

However, it sometimes did feel like a bit like a negotiation – with some people deliberately going in low to try to bring an estimate down.

Velocity – planning based on past performance

We introduced the standard idea of velocity from Scrum –

Take the number of ideal days you complete in an iteration, and then plan your next iterations based on that.

This was sound, but unfortunately it was described by whoever sold the concept to senior management as being a percentage measure. So if a team got 30 ideal days of stories completed in an iteration of 40 elapsed developer-days, the team had achieved a ‘75% velocity’ – this was really ugly, and came to hurt us, as you’ll read below.

We struggled a bit with the idea of the team committing to a sprint goal. There were a lot of dependencies on other teams that we just didn’t account for, so we could never really meet what felt like reasonable goals.

Relative Estimates

We started to estimate work based on it’s relative size, compared to work we’d done previously. After all, this seemed like the quickest and generally most reliable way to estimate. If you ask a decorator to quote for painting a room, they can usually give you a rough quote without measuring up, because they’ve already painted lots of rooms of roughly the same size before.

This approach for us was pretty successful – if we’d tackled a similar size project in the same product area, we could look up the actual effort we expended on the previous project and use that to guide our estimate for the new project. When the newer project was complete, looking at the actuals showed us that this was a fairly accurate method.

It helped us to resolve the Fractal Estimation anti-pattern that we’d suffered from in the past, because we were now looking at sizing the project as a whole to start with, as opposed to trying to break it up and estimate each constituent part.

The problem was when we had to estimate something that wasn’t really similar to anything we’d built before.

Overall things improved during 2011 – the people doing the work had more control, and we had a method by which to size things, and plan work. But then things started to unravel…

2012

It gradually became clear that some of the things that we thought were working, weren’t really…

Story Points

The business didn’t understand the concept of Ideal days, so we re-branded them as Story Points, where a story point equates to an ideal day. This didn’t really help though as we never built a shared understanding that Story Points are a relative measure of size, as opposed to an exact measure of time taken to do something.

“How big is the project?”

“30 points”

“You have five developers, so it’ll be done in six days?”

“Erm…maybe…”

What’s velocity?

The concept of Velocity was never been well understood by the business either. It became seen as a measure of efficiency, or utilisation. To paraphrase:

Managing Director: “What’s velocity?”

Programme Manager: “It’s the time that developers aren’t working – like when they go for lunch or a p*ss”

and so the percentage thing came back to bite us – velocity was used as a stick to beat the teams with –

“The Developers are only working at 60%, we need to get them to work at 70%”

Targets

We moved away from planning based on past performance and trying to improve on that, to planning based on fixed targets per developer. The Target Estimation anti-pattern again.

To increase speed; targets were set for developers to develop a certain amount of work each week.

The planning was based on one big resource pool of developers (only), with individual targets aggregating up into one giant target.

The focus was on individual developer productivity rather than actual throughput of developed and tested stories. This led to a bad working environment, much frustration, and undesirable behaviours.

Some teams adopted the Velocity-driven estimation anti-pattern in order to get around the targets they were set. But it didn’t mean they were delivering any more work – it just meant that Story Points became even more meaningless…

Budgets

A positive thing we introduced in 2012 was the idea of budgets for pieces of work. This was the starting point for turning the question around and establishing what each piece of work is worth to the business –

“How long will this project take you?”

“We’re not sure yet. How long would you like us to spend working on it?”

Developers-only

As you’ll have picked up from the story so far – the vast majority of the focus was on Developers, and only Developers. They were widely regarded as the limiting ‘golden’ resource, and as such theirs was the only work that needed estimating – everything else that needed to be done like story-writing, deployment and testing would just fall into place.

This is partly the Done-done-driven Estimation anti-pattern. The problem with focussing on just Developers is that they cannot deliver work in isolation. There are many inter-dependencies on other roles such as BAs, EAs, Testers, Infrastructure, DBAs and so on.

It is the team that delivers work, not individuals. You can try to estimate the effort that a Developer alone will have to put in to deliver a story, but that really is only a part of the work needed to deliver end-to-end.

2013

As part of the more focussed agile transformation process, we decided to have a complete re-think about how we estimate and plan at the team-level.

Principles

We came up with some principles by which we wanted to base our estimation and planning. These are based on the experience of the team, and tied in with the feedback that we received from some external consultants we were working with.

Plan based on past performance
Track the whole cycle, not just development
Estimates are not exact quotes
Plan at a team level and scale up, not the other way around
Limit work in progress
Separate the methodologies used for planning, from that used for performance management

What matters

We considered having another crack at using story point estimation and velocity as it was intended, but decided that there were already too many misconceptions around this for it to be a success.

Instead we opted to try some of the more empirical techniques associated with Kanban, which tied in nicely with our move away from iterations to more of a flow-based delivery model.

The beauty of these techniques is that they focus on what matters – the question that our colleagues and management want an answer to is generally

“When will we get this product?”

not

“How much effort will it take?”

We started focussing on the elapsed time that it took to deliver things, as opposed to how much effort a particular role puts in to get it there.

Efficiency

An eye-opening aspect of this is to look at Business Process Efficiency (BPE) – which is the ratio of the time that a piece of work is actively being worked on (by anyone), to the total time that it takes to deliver that piece of work.

Many organisation are working with a typical BPE of just 15%. So for the vast majority of the time it takes to deliver something, that thing is just sat waiting to be worked on – perhaps at a handover between roles or teams. So all the work we put in to estimate effort was really only focussed on a very small portion of the time it takes to deliver – and focussing on developers only magnified this even more!

The here and now

Flow and Forecasting

Where we are now is that the teams aim to split stories up nice and small. They then count the number of stories in each state of their kanban system each day. We use this to track each teams’ flow.

We generate Cumulative Flow Diagrams (CFD) and record team throughput. Both of these can be used to forecast future delivery. The great part is that this is not based on anyone’s judgement of the size of a piece of work – it is based on the actual empirical figures for how long it takes to deliver.

Cycle Time

We track the Cycle Time for stories – this is the time it takes to deliver a story end-to-end. It is currently surprisingly high, and we’re challenging teams to see what they can do to reduce their cycle times – the quickest win for this is to reduce the time that stories sit in a particular state waiting for someone to pull them into the next. We can improve on this by limiting the number of things that we work on at any one time.

Sizing

When we set out with this method of using empirical data to forecast, instead of estimating, we were concerned about the disparity in the size of stories. If we’re just counting stories what would happen if we delivered all of the smaller stories first, and were left with all the bigger ones – it’d look like we were way further head than we really were.

To counter this we sized stories small, medium or large. We had one person per team doing this to generate some consistency, and it was a quick process that was done as part of the story’s refinement.

We then tracked CFDs for both story count, and a kind of ‘weighted count’ that took the relative size into account e.g. a medium is twice a small, and a large is twice a medium.

So this took differences in story size into account, but what we found was that over time the slope of the CFD’s accepted state was roughly the same for the weighted and non-weighted count. A forecast based on story count alone should be as accurate as the forecast that takes story size into account.

For this reason, we’ve stopped sizing altogether and now just count stories. What’s key is that we aim to get a reasonable consistency of small stories.

Time-boxes

Back when we introduced budgets we started to turn around the question of how long something would take, to how long did the business want us to spend on it – what is it worth?

We’ve extended this to reinforce the idea of fixing time and flexing scope, by planning time-boxes. A project has an assigned delivery time-box during which the team pull stories from the backlog for that project. Once that time-box is over the team finish any unfinished stories off, but start pulling new stories from the next project time-box to which they’re assigned. Essentially the time-box controls which project the team pull new stories from – or in Kanban terms – where they replenish their system from.

Project cycle-time

The question that remains is what is a reasonable length of time-box to plan in for a project.

At a higher-level what we need to do next is start tracking the cycle time of overall projects. We can then use this to plan sensible time-boxes for delivery of future projects of a similar nature.

The Future

It’s been a long and sometimes frustrating journey – but it feels like we are now in a better place. We now spend a lot less time sizing and estimating things – practically none in fact.

In future we aim to widen the gathering of metrics look for further patterns to see what impacts on delivery. There are still challenges ahead as we embark on newer, bigger pieces of work, but I think we are better equipped to give honest, accurate forecasts of what can be delivered, and by when.

PS. If you’d asked me to estimate how long it’d take me to write this blog post I’d have said a couple of days. It took a bit longer…

photo credit: lemonad via photopin cc

photo credit: eatmorechips cc

photo credit: bensutherland cc

Delivery, Delivery, Delivery

When I started this job it was towards the end of a big release. I witnessed a long and painful bug-fixing period, and got to thinking about what improvements could be put in place to make the next release smoother. It soon became apparent though, that the releases all year had been late, and as such a backlog of work had built up. What also became apparent was that all of this work was contractually required to be delivered by the end of 2010. My first full release was certainly going to be interesting, if not smooth…

According to the PMs all of the required work would fit into the time we had, but unfortunately the estimates that this assertion was based on had all been provided by individuals who would not actually deliver the work, or by developers who had been forced to ‘estimate’ to a specific figure. In my mind these so-called estimates are pretty worthless, as the whole point of estimating is to be able to plan well (in many environments it’s also to cost things up, but in our case the costs are already fixed by the overall contract), but more on estimation in a future post…

So we ended up in a situation with the resource, time and scope were effectively fixed – not ideal.

We mitigated this to a degree by ensuring we worked on the right things first. Although the overall scope was fixed, there are usually ‘nice-to-have’ features that the business can truly live without. The business owners weren’t used to having to prioritise in this way – we had to gain their trust, and explain that we weren’t planning to drop their features, rather we needed to avoid a situation where if the sh*t really did hit the fan, we wouldn’t be left with critical features not implemented, based on their advice. This seemed to work okay, and we had more confidence that we were working on the right things in the right order. We also tightened the testing feedback loop by getting the testers to test everything in an earlier environment. This reduced the total cycle time to deliver bug-fixed requirements.

Even after those minor improvements, it was a tough release. The team worked a lot of overtime, something I hope to avoid in future. We worked late nights and we worked from home some weekends. When we worked in the office at the weekend we had to get portable heaters in as it was so cold that our fingers were seizing up, and when it really started snowing we booked people into hotels so they could carry on working instead of leaving early.

And we delivered. We got the release out on time, and we partied when it was all over. Would I want to do another release like that again – no way… However there was something positive about the team pulling together to beat the odds. It was a time when we worked hard and played hard together, and it’s still one of the releases that some of those involved talk about with a wry smile.