May 21st, 2007
Estimation: Time or Size?
It surprises me, but I have recently come across a few people in the Agile field who prefer estimating in “real time” over estimating in size (e.g. story points); I have even heard statements such as: the most advanced implementations of Agile use real time estimates, because it offers the most powerful benefits. My gut feeling has always told me otherwise, but I found that I didn’t have a well-thought out argument beyond, erm… read Mike Cohn’s book, which in the event of a real-time discussion is not very helpful, and even so does not address all of the arguments I have been hearing against size-based estimation. This blog post is my attempt to clarify my own thoughts on this matter. I focus on four specific arguments in favour of time measurement, and attempt to counter those arguments.
Argument 1: Estimating in hours allows a developer to measure his estimate against his actual, and use that data to improve his estimates in future.
This is true, and it works very well if a project has only a single developer working on it, or has more than one developer but they are all essentially working alone, on different parts of a system — a common model in a “build-by-layers” approach to system development.
Problems arise however when we introduce the concept of cross functional teams working on “slices” of a system. What is a team-hour? Is it the whole team working for one hour together, or is it the average time it would take one team member to do the work? Already we are beginning to talk in abstract time, not real time. One team member may be twice as fast as most others, through familiarity with the system and/or greater experience; does one hour to him mean the same as it means to others? Clearly not. “One hour” becomes a confusing unit of measurement in such cases. It is also the case that different team members have different responsibilities. How do we find a unit of time appropriate across documentation, UI design, coding and testing. How does each team member measure the time other team members will take for their part of the work on the story? You’ll see that this quickly becomes completely unmanageable.
Some Scrum/Agile teams use hour estimates at the task level, with the assumption that a single developer will work on each task; this seems reasonable as the tasks are divided across skill boundaries, such as server-side, database, testing, etc. In this case Argument 1 seems to hold good: developers should estimate in hours at a task level and then improve their estimates by measuring estimates against actuals. Unfortunately this argument breaks down again as soon as we recognize that the best code creation, and the best testing is done in pairs. Again, what does one hour mean? A pair hour or an hour for one of the pair? What if we don’t know in advance (as we probably should not) who will pair with whom and on what? Who gives the hour estimates? Everybody? Just one person? Which person?
In general, I find the practice of estimating task time to be massive, wasteful overhead. Tasks should be small; they should move across the task board in a single working day. End of story. The focus of estimation should be at the story level.
Argument 2: Our customers/product owners don’t understand story points; they need to know how many hours developers are working so they know how much the work will cost.
I hope anyone reading this can see the immediate flaw in this argument: micro-management. Product Owners have no business knowing how many hours each developer is working. It breaks encapsulation and it breaks self-organization. Actually, it breaks pretty much everything. Customers are not buying developer hours, they are buying software. Their focus needs to be on how much software they can expect in an iteration; their measurement needs to be on comparing actual software delivered against expected (promised?) delivery.
Story points help set expectations here. Very soon after beginning iterative development using a size measure to estimate, customers will be able to see the capacity of the team: this team can deliver 40 units of software per iteration, therefor let me prioritize 40 units’ worth of features for the next iteration, with a few extra units in reserve. It is beautifully simple. A customer is welcome to map story points to cost (it will be approximate). He should not care about how many hours it took to make the software.
Argument 3: Story Points will map to time anyway, very soon we’ll see that (e.g.) one story point is worth 2.5 hours, so it is better to skip the intermediate step and just measure in hours.
This is a flawed argument, and assumes a team never gets better. Truly self-organized teams always get better. When a new team starts, it may be safe to say that (e.g.) one story point is worth 2.5 hours, but as the team improve their development and teamwork skills, as the code base is salvaged from the big mud hole it has been in for the past months or years and rises majestically to a state of elegance, and as the team use regular reflection to improve their process, the value of a single story point will change. It may come to be valued at only 2 hours, because time is being used more efficiently. Now instead of 40 story points in an iteration the team can take on 50 story points.
More story points per sprint means that the customer will be getting more software. The cost of this software will not go down; the team building it will be making greater profits for the company. That’s business. The customer is happy — happier, in fact — he is getting the software he needs at the price he expects to pay, but getting it faster. Everyone is happy.
Imagine if the customer was measuring time. He would expect his software to get cheaper as the team got better; what used to take twenty hours to build now only takes ten hours, because the developers have put time into creating a good infrastructure and developing good practices. This does not seem right. The benefits, at the very least should be equally shared between provider and customer. A time-based estimation model will not allow for that without (apparent) steep price increases.
A measure of size allows a team to show they are creating more value for money. We can draw big visible charts to show this. A measure of time will never show this, because there are always a fixed number of working hours in an iteration.
Argument 4: Story Points don’t allow you to improve your estimation techniques.
Actually, they do. Sometimes when clients ask me how this works, or why it works, I tell them it is magic. This is not so far from the truth. I have never really figured out quite why or how it works. But it does. Teams do get better at estimating. A simple velocity chart that maps two points for each iteration, i.e. estimated points and actual points, will begin to show the points converging over time, and not that much time either. Of course the shorter the iterations, the quicker this improvement will be apparent. I always recommend iteration lengths of one week or two weeks. Never longer.
And to be clear, the two points don’t just get closer together because teams are committing to less and less each time. The average value of points delivered goes up over time. This latter, it may be argued, is because teams begin making bigger and bigger estimates. I can see that would be a temptation, but because the team starts collecting historical data (sometimes even just at a gut-level) they self-correct for this, using their data as a baseline for future measurement. Remember, good software developers want to build good software, and they want to do it fast and efficiently. Suspicion doesn’t have much of a home in an Agile environment.
I am sure there are other (and perhaps better) arguments to favour measures of size over measures of time, and I’d love to hear them here. I’d also be interested to hear opposition to my arguments; healthy debate on this topic is welcomed.
May 21st, 2007 at 5:24 pm
Hi, Tobias.
Interest topic indeed. I’m on the other side of the street. I almost always estimated real-time and never really got to do well with points.
My first answer to why would be: because “it is the simplest thing that works”, and if you don’t fall in the usual traps of time estimation, it really does.
Let me review your counter-arguments and offer my counter-counter-view on them:
Argument 1: Estimating in hours allows a developer to measure his estimate against his actual, and use that data to improve his estimates in future.
Part of the learning experience for me is actually learning to estimate as a team, not as individuals. Roughly the same “rock-paper-scissors” game you do for points I do for hours. Of course one hour is different to a pair than it is to another (if you have solo devs, even worst), but this also happens with points, after all. When gut-estimation is still very divergent after debate I normally suggest the two people at the extremes to pair for this task.
Argument 2: Our customers/product owners don’t understand story points; they need to know how many hours developers are working so they know how much the work will cost.
Completely agree with you in this one. All they should care is how much stories are committed for the iteration/sprint. No chicken allowed in our mud.
Argument 3: Story Points will map to time anyway, very soon we’ll see that (e.g.) one story point is worth 2.5 hours, so it is better to skip the intermediate step and just measure in hours.
I think this is true, and of course teams get better, and then estimation gets better too. Velocity can be calculated anyway (I do use story points first, and hour-estimation for tasks later). Also, I saw some overhead trying to stick to points for task because people’s brain think in clock terms, so I constantly see people guessing hours and then converting to points, even unconsciously.
Regards,
/ms
May 21st, 2007 at 5:34 pm
What I highly recommend is a balance of both techniques:
Story point estimation for the product backlog.
Real time estimation for each sprint.
Product backlog estimation is long term, so we need to be able to calibrate it as the team gets more effective or shrinks or expands.\
All the benefits of real time estimation can be obtained just by doing it on a sprint by sprint basis. Velocity is still in story points to support release planning, but the time estimate each sprint provides a sanity check on whether the story points are in line with velocity for any particular sprint. When it isn’t, no need to painc and re-estimate the backlog, just use the real time estimates for the sprint planning and move on.
Steven Gordon
May 21st, 2007 at 6:56 pm
I find myself leaning towards Steven… I’ve played with Story Points in the Sprint and the team members tend to get confused on relating Story Points to actual time.
For my latest project I am pushing the idea of “ideal” hours for Sprint estimation and we still use Story or Feature points for putting LOE towards the backlog entry.
There still runs the pitfall of people questioning why there are a lot less than 40 ideal hours per week of a Sprint, but education of purpose and principles behind the measure seem to win out the day. The team estimation has proven to be fairly consistent once on project for a couple of Sprints as demonstrated by our Velocity.
My Two Cents.
May 22nd, 2007 at 1:28 am
The first three commenters (above) all support the concept of measuring sprint tasks using hours. In response, I’ll repeat a paragraph from my original post:
“In general, I find the practice of estimating task time to be massive, wasteful overhead. Tasks should be small; they should move across the task board in a single working day. End of story. The focus of estimation should be at the story level.”
I see little value in estimating tasks, in hours or anything else, and I certainly do not advocate measuring tasks in Story Points; tasks are not stories, after all, so it would simply be confusing.
May 22nd, 2007 at 5:02 am
I’d have to say that I favor Tobias’s view here.
I think estimating in points is actually simpler than hours, although it is harder to get started with.
It’s simpler because talking about hours forces you to deal with all sorts of factors. What’s the difference between an ideal hour and a real one? How much time do we spend in meetings? How do we count it when we pair on a task? How do we count training, mentoring, code reviews, fixing the build server, and all of the other things that need to get done? What if somebody else does the story? How do we deal with a boss who asks us to work 70 hours a week?
Points are simple. You estimate in an arbitrary but consistent unit, you measure what you get done, and then you repeat.
Personally, I’ve seen a number of teams have great success with points, but I have never seen a team that can do reliable short- and long-term estimates in clock time.
May 29th, 2007 at 11:00 pm
Greetings,
I once led a team that used a variation of the XP idea of ‘ideal hours’ to estimate stories, though they preferred the term ‘feature.’ We limited our granularity to Fibonacci Numbers (…3, 5, 8, 13, 21, 34…). In fact, each developer had a small deck of note cards with a single Fibonacci number on each. At release planning time, we would discuss each feature until the group seemed satisfied that they understood what was involved. Then, at the same time, we would each hold up the card with our estimate. Usually, we quickly had consensus. When our estimates didn’t agree, we would dig deeper to find out why. Release planning didn’t take long, and we walked out with a plan that each person understood and believed in.
When we planned an iteration. We would have one developer ‘own’ each feature. That developer would do their own estimate in terms of ideal hours for them. Letting the person who was going to be responsible to deliver the feature own the estimate allowed us to better plan the iterations by taking into account things like experience level, new information that had come to light, and so forth.
The good
The team intuitively understood the idea of ideal hours and they liked it. Both release planning and iteration planning went smoothly and produced accurate estimates. The team maintained high morale and delivered a series of releases on time, feature-complete, with verifiably high quality.
The Bad
The team understood that ideal hours were not real hours, but others around the team did not understand this. I had to field the inevitable question: ‘How come Bob is only doing 12 hours of work a week?’ many times.
Cheers,
Chris Sims
May 30th, 2007 at 9:06 am
Tobias,
Thanks for another good post.
I agree with your thesis that story points are (far) better than hours (real or ‘ideal’). For the nay-sayers, I’d advocate they try it – and not just for 1 sprint – before junking it.
We use planning poker with story points for estimating the product backlog. We use tasks (just the number of tasks) to communicate daily work and monitor progress within the sprint. Thanks to Boris and, perhaps you, for this idea. Both of these techniques save hugely on time and increase trust (no Big Brother counting hours).
Measuring velocity (actual versus planned) provides more learning to the team than any analysis we ever did on hours in our pre-Scrum days.
Slightly off-topic: I’m not sure I share your view on iterations never being longer than 2 weeks. We do find 2 weeks useful for new Scrum teams to ‘get’ the flow, but then usually move to 3- or 4-week sprints. We find they give more ‘flow’ time, allow teams to tackle ‘meaty’ stories with the need to dis-aggregate them to artificial levels. We have also tried 1-week sprints and found them disruptive.
Peter
June 29th, 2007 at 9:02 am
I became a believer in story points + velocity after being a programmer on a team that switched from hours to story points. Apart from the initial paradigm shift it’s simpler, easier, and gives the Product Owner better feedback about how to adjust scope to hit a delivery date.
We could argue about theory all day, but my experience has convinced me story points are better for most situations.
–mj
June 29th, 2007 at 2:58 pm
In the Scrum Master training, my team was exposed to the planning poker technique, but we could not take the leap of faith, and got stuck with estimating in hours.
I agree to the benefits of story points, as a better way to measure ‘value’ being generated, but this point seems (to me) the crux of the whole argument:
“A measure of size allows a team to show they are creating more value for money. … A measure of time will never show this, because there are always a fixed number of working hours in an iteration.”
July 26th, 2007 at 6:06 am
I believe estimation, weather based on time / effort or size or anything else, remains an estimation. The only way to plan ahead is to use velocity (historical, if available, or assumed – as a part of bootstrapping the project). Velocity – the actual number of estimation units achieved in the past iteration(s), irrespective of the nature of units – will enable you to plan the amount of work for the next iteration. The key is to understand that estimates and estimates – and not actuals / real – and to ensure that the estimation is consistent across stories (can be ensured by triangulation or other methods).
September 17th, 2007 at 1:36 pm
I personally like story points – because estimates in hours can set the wrong expectation. However the argument that I have heard in favor of estimating in hours is that it would make it easier to roll up status when the organization is running multiple agile projects. There tends to be more variation about what a story point is compared to what 16 hours is.
January 12th, 2008 at 8:34 pm
I have always worked with days, i.e. ideal days. Not hours. I have not yet tried story points because I feel it’s difficult to describe and understand. Why do you work with hours instead of days? I mean it is better to be roughly right than exactly wrong in time estimations.
March 3rd, 2008 at 12:59 am
For my first 2 sprints, I used ideal hours and never thought it was a problem doing so.
However at the end of Sprint 2, I wanted to compare work delivered in Sprint 1 vis-a-vis Sprint 2 and then realised that I could not baseline them on hours estimated because the complexity of work delivered was different and hours don;t talk complexity because they include the capability of the resource (or the pair) who worked on the story.
Estimation in hours should be done as a pair, but you don’t know what mix of resources you will have. One resource in the pair may be better than the other.
I have started believing that Story points will be a better approach as you move to a level where your capacity to deliver is estimated at a team level and not on the type of resource you will deploy, which is the case with estimation in hours.
It is more like estimating project work in Function Points/Use Case points.
The question however is how do we define story points and since the technique is not known most of us go with time estimation. Any dope on defining Story Points?
April 15th, 2008 at 9:09 am
I’ve become a believer in the story points school of thought as well. The point Tobias makes that there are a finite number of hours in a sprint is a compelling argument.
Point-based story estimation is relatively stable. Sure, as teams get better at estimating, a “medium” story may be a “small” one if it were estimated after 5-6 sprints. It’s still a pretty stable measuring stick. Contrast to hour-based estimation, which is constantly a moving target. As team skill and productivity grows, estimates for comparable stories will shrink. There’s no way you can measure project velocity that way. It’s like trying to figure out how well your diet is going by weighing yourself on a different scale every week.
April 30th, 2008 at 9:54 am
I claim that part of the inspiration for this post was a series of conversations I had with Tobias in the first half of last year. I was, and remain, a big believer that only real things are real. Story Points are an illusion.
At the time he directed me to this post, but in spite of his urging that I write my thoughts down and post them here, I didn’t have time to organize that.
I’m now on a gig at Aggiorno, and I was involved in a similar discussion, which led to the conclusion that the time had come to be more precise about why I believe that the process of estimating is simultaneously completely broken and useless, and of foremost importance.
The article I wrote attempts to describe the two things that I find most egregious about story points: firstly, the huge amount of reality that they ignore (complexity is so much just one dimension) and the fact that they have no meaning to the most important people on the project: the customers and users.
June 21st, 2008 at 5:33 pm
It is truly amazing to see how we have become so religious about the estimation techniques we use. The estimation debate is nearing the ridiculous pitch of the language and platform feuds that grew so wearisome a decade ago. There’s even a small band of rebellious bloggers now who claim that their agile teams are so “mature” that they need not waste time estimating or planning sprints. I guess that’s one way to quiet the discussion.
The objective of a sprint is to deliver a potentially-shippable increment of functionality that (a) consists of as much of what was most valuable to the business at the time of sprint planning as possible, and that (b) the team can commit to. The fundamental idea then, is that the team will deliver as much business value as it can every sprint, and by extension, every release. It is assumed that teams are self-motivated, and that they will strive to conscientiously commit user stories into the sprint backlog and then to deliver them (in other words, no sandbagging and no “self-management” cop outs).
If a team is consistently delivering what is the “next most important” functionality to the business, does it really matter what units are used for estimation? If something works for the team within the context of its particular organizational environment, then it is the right thing to do.
I would, however, insist that your team doesn’t use a “toy” language like VB, and make sure it uses a “real” Oracle database…
February 6th, 2009 at 4:08 am
Hi, Can someone provide me steps to use story point size estimation.
From Product Backlog, to sprint backlog to burn downs.
Would need to be story points?
What about the availability of the team in terms of man hours? Where would that be accounted?
Finally when we say velocity is points per iteration, should it also factor in the team availability per sprint?