Requirements: Accuracy and Precision.


For reference:

http://en.wikipedia.org/wiki/Accuracy_and_precision

Requirements are interesting in software development. It is used as the contract that the developer has to fulfill to succeed, if there is something missing then the requirements are incomplete, missing, wrong. This is surprising, because we expect the Business analysts or whichever groups writes the requirements, to envision how something that doesn’t exist would work and interact in great detail. In effect developers are asking for precision, we want to know how things work exactly so that we can fulfill the contract. Accuracy really does not have much value to developers until they find out that what they build was not what was needed and they have to re-work it. On the other hand the Business want accuracy.

Precision has an interesting effect on the development life cycle. First of the developer will have fewer questions as they have the details they need. This will reduce the communication between the developer and the business since there is no reason to initiate by the developer. In the ideal scenario the developer would implement the entire feature without asking a question. Deliver the feature and succeed, hand it over and …. Get bugs.

These “bugs” are then often referred to as bad requirements, missing requirements etc. The developers are annoyed because they succeeded at fulfilling the contract and then a fast one was pulled on them. The contract changed and they are held accountable to the new terms after meeting the original ones.

Good times.

If there is no accuracy then the developer will be handed some piece of paper that will function as little more than a conversation topic to have with the business. A requirement that states that the application needs to “add X’s to customers” is useless for building the feature because there is no information about where this feature should be, any of the details of the relationship like one to many, many to many, etc. The only thing this will do is create a bunch of questions that the developer needs to ask before wring a line of code.

This is because software can’t be un-precise. Software is the act of locking actions down into a concrete system that will be absolutely precise. To be un-precise is to have randomness, and that is one of the hardest software problems to solve.

For a successful project it is therefore important that the direction is accurate because the precision will be forced by the translation to software.

This direction comes from the requirements. Requirements should be accurate, if you don’t know exactly how something should work, don’t write it down. Let the process find the accuracy during implementation.

Accuracy is more valuable than precision; not only that, precision becomes a cost when accuracy is not there. The later in the process a change is found the higher the cost. And the more precise the requirements the later in the project the communication will be started.

Advertisement
Posted in Architecture, Software Engineering | Tagged , , | Leave a comment

Reframing the bug backlog


A lot of places where I end up have a bug backlog; usually around 200-800 defects features and other assorted stuff in there. But mostly bugs.

This creates an interesting dynamic, first of all trying to attack the backlog leads to analysis paralysis. Trying to go through 200+ items to find the most important one is not that feasible and without some order the developers will spend more time browsing for their next bug than fixing it. Setting a goal of resolving the backlog is usually meaningless because it can’t be translated into concrete useful steps. Developers ask for a code freeze for a month to just fix bugs, and business is usually not very interested in that. In the end it steadily grows and few and people come to accept it.

I think that a lot of this has to do with how the backlog is framed. First of it is accepted that you have one, it is even expected that you have software that helps you track them, and that there are reports, meetings and even departments dedicated to managing this backlog. It is also framed as a point in the now, it is a point measurement that often has no future or past reported with it.

A bug backlog to me is like carrying a balance on your credit card. It is the result of actions and gives you a point in time measurement of where you are at financially or in developer terms Stability. So the steps for getting out of bugs are similar to getting out of debt.

The most important thing is change.

How did the total chance over time, put it into excel and put some regressions on it to predict the future. How long until you would double your current troubles, how long until you hit the next nice & big round number. Try and create a prediction of the future and use that prediction as your baseline to improve against.

This means that if your group generates 20 bugs a weeks and you go to 15 bugs a week there is an accomplishment that can be celebrated. It isn’t good, but it’s better.

There is an important reason to attack the rate rather than the total. It is like a car traveling in the wrong direction at 65 mph (km/h) and you set a goal of going backwards 100 ft. putting the car in reverse isn’t really an option, you first need to break, change gears to reverse and then speed up again. The same needs to happen to the bug backlog. You first need to generate fewer bugs, then you need to make sure that bugs that are fixed stay fixed, and then you can comb through the back log to get them resolved.

Focusing on the rate focuses on behaviors. It focuses on evolution, sustained change rather than revolution.

Revolution takes a lot of energy, and quite often you need that energy to keep getting the new requests done.

Celebrate that a team prevented 500 bugs in the last 6 months as compared to their previous track record.

As to how to change the behaviors, chances are that the team already knows. Find the members that have good output and low bug rates and find how they do it. Every team is different, every company is different. There is no new technology that will fix this, there is no magic pill that someone can sell you. Mind you that this won’t stop people from selling them to you, an on occasion they might even work. But in general it faces “adoption” problems and falls short of expectations.

Find what already works and embrace it.

Posted in Software Engineering | Tagged , , | Leave a comment

Adam Smith on software


Adam Smith famously wrote of ‘a man of humanity in Europe’ who would not ‘sleep tonight’ if ‘he was to lose his little finger tomorrow’ but would ‘snore with the most profound security’ if a hundred million of his Chinese brethren were ‘suddenly swallowed up by an earthquake, “because” he had never seen them.

We might argue this statement these days, and the earth quake in Japan in 2011 provides compelling evidence. And that was something that took 20,000 lives as of the latest estimates. But back when the statement was originally written this would likely be very true. If an earthquake did hit China or any other far away land the information would arrive in Europe about a month later, and it would be closer to a myth like Atlantis sinking into the sea than what we have these days of video footage of a wall of water swallowing a town and the images of what it looked like after the water retreated.

That is because things that can’t be seen, heard or felt are not real to the emotional self.

To the emotional self the closer it is to the self the more real it is. There is the self, the things that the self sees and experiences directly, and the further it is removed, and the fewer senses are engaged the less real it becomes. This about watching a sad movie, and compare it to hearing a sad account of a friend of a friend. Chances are that you may have cried during the movie, but when you are just hearing an account that is much harder. That is because it is less real, and harder to emotionally bond with.

Enter software in the organization.

How far is software from the emotional self; for a developer you might argue that it is extremely close. A developer pours a little of themselves into every line they write. It is something they create, their baby if you will. They care a lot.

But if you look over the whole company people likely don’t care. Even of the people that know of the project, or the people that are going to be end users of the product, the level of caring will not be substantial. They will care about the problem it is trying to solve, but as for caring about the actual thing that has to do the work, likely not so much. And this makes sense. A side effect of this is that people that think the software is real will communicate at a very different level than the people who have not had that experience yet. Think of it as a teacher handing you the class statistics on a sheet vs. the statistics on your child. They will put much more effort into the delivery of the second set of statistics because they know you care a lot about your child. This can lead to unfortunate side effects where you get adversarial behavior between departments because of the large discrepancy between caring in the people involved.

Software is not real until people get an experience with it. And this could mean a demo, a video a screenshot, provide a forum to have it feed on itself; all of these things will make it more real, and make people care more. In effect a software project should make a dedicated effort to do public relations work in the company on behalf of the project. Make it real, make people care, and bridge the gap between the developers and the rest of the company. This will help a lot in communications, and this caring can be used to motivate people across the organization.

There is one sub industry of software that does do this. The video game industry, most video games that are in support or under development will have tidbits of information coming out just about every day of the week; taking a look at Star Wars : The Old Republic, Rift, Guild Wars 2, Mass Effect 3, Diablo 3, etc. All of these games have not been released (as of writing this) but you will find rabid fans (granted the IP helps a lot). But these games are real to a lot of people who have never played them.

There is a risk in taking this approach, you do raise the stakes. If you make people care, make it real; they will also care if things go wrong. And in software that is more the norm than the exception. But I think part of the reason why things go badly so often is because it is expected. The bar is set low, and to change this you often run into motivational issues; which happen because people have a hard time caring about the software that they can’t experience as real.

If software wants to change we have to change our consumers who will then give us the drive to change. Because once we make it real to our consumers; failure is a far less attractive option and success will have to become the norm.

Posted in Software Engineering | Tagged | Leave a comment

Some syntactic sugar around locking & threading


One of the things that make debugging threading easier is to reduce the amount of code that you are debugging. This makes syntactic sugar actually quite important when it comes to this problem space.

The lock keyword and sections are very valuable when it comes to this, but it falls short when it comes to more complex threading problems. The drawback is that the lock is always exclusive, there is no option for read & write locks. This in turn makes the read lock a bottleneck; but often worth it due to the simplicity.

The area where they start to fall short is in scenarios where there are many readers and only a few writers. In this scenario we would want to allow concurrent reads while locking writes are exclusive. This becomes even more important as people enjoy Linq and the foreach statements. Both of these would crash if the collection would change during the iteration of the collection.

The following code tries to get the best of both worlds, having the syntax of a lock and power of a more complex construct.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;

namespace ReaderWriterLock
{
    /// <summary>
    /// Wrapper for the ReaderWriterLockSlim
    /// </summary>
    public class ReaderWriterLockManager
    {
        private ReaderWriterLockSlim lockHost = new ReaderWriterLockSlim();

        /// <summary>
        /// Use with a Using statement, While in the using you have a read lock
        /// </summary>
        /// 
        public ReaderLock ReadLock() { return new ReaderLock(lockHost); }

        /// <summary>
        /// Use with a Using statement, while in the using you have a write lock
        /// </summary>
        /// 
        public WriterLock WriteLock() { return new WriterLock(lockHost); }

        /// <summary>
        /// Use with a Using statement, while in the using you have a upgradable lock
        /// </summary>
        /// 
        public UpgradableLock UpgradeLock() { return new UpgradableLock(lockHost); }

        /// <summary>
        /// Syntax helper for the ReaderWriterLockSlim
        /// </summary>
        public class ReaderLock : IDisposable
        {
            /// <summary>
            /// Reader Lock
            /// </summary>
            /// The ReaderWriterLockSlim
            public ReaderLock(ReaderWriterLockSlim host)
            {
                lockHost = host;
                lockHost.EnterReadLock();
            }

            private ReaderWriterLockSlim lockHost = new ReaderWriterLockSlim();

            /// <summary>
            /// IDisposable implementation
            /// </summary>
            public void Dispose() { lockHost.ExitReadLock(); }
        }

        /// <summary>
        /// Syntax helper for the ReaderWriterLockSlim
        /// </summary>
        public class WriterLock : IDisposable
        {
            /// <summary>
            /// Writer lock
            /// </summary>
            /// The ReaderWriterLockSlim
            public WriterLock(ReaderWriterLockSlim host)
            {
                lockHost = host;
                lockHost.EnterWriteLock();
            }

            private ReaderWriterLockSlim lockHost = new ReaderWriterLockSlim();

            /// <summary>
            /// IDisposable implementation
            /// </summary>
            public void Dispose() { lockHost.ExitWriteLock(); }
        }

        /// <summary>
        /// Syntax helper for the ReaderWriterLockSlim
        /// </summary>
        public class UpgradableLock : IDisposable
        {
            /// <summary>
            /// Creates an upgradable list
            /// </summary>
            /// The ReaderWriterLockSlim
            public UpgradableLock(ReaderWriterLockSlim host)
            {
                lockHost = host;
                lockHost.EnterUpgradeableReadLock();
            }

            private ReaderWriterLockSlim lockHost = new ReaderWriterLockSlim();
            private bool isUpgraded = false;

            /// <summary>
            /// Upgrade the lock
            /// </summary>
            public void Upgrade()
            {
                if (isUpgraded) return;

                lockHost.EnterWriteLock();
                isUpgraded = true;
            }

            /// <summary>
            /// IDisposable implementation
            /// </summary>
            public void Dispose()
            {
                if (isUpgraded) lockHost.ExitWriteLock();

                lockHost.ExitUpgradeableReadLock();
            }
        }
    }
}

A sample for a read / write scenario

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ReaderWriterLock
{
    class SafeReporting
    {
        ReaderWriterLockManager dataLock = new ReaderWriterLockManager();
        private Queue _data;

        void AddRecord(object record)
        {
            using (dataLock.WriteLock())
            {
                _data.Enqueue(record);
            }
        }

        object GetRecord()
        {
            using (dataLock.WriteLock())
            {
                return _data.Dequeue();
            }
        }

        string RunQueueReport()
        {
            using (dataLock.ReadLock())
            {
                return _data.Select(x=>x.ToString()).Aggregate((a, x) => a + ", " + x);
            }
        }
    }
}

A sample for an upgrade scenario

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ReaderWriterLock
{
    public class SimpleCache
    {
        ReaderWriterLockManager dataLock = new ReaderWriterLockManager();
        private object _data;

        public object Data
        {
            get
            {
                using (var l = dataLock.UpgradeLock())
                {
                    if (_data == null)
                    {
                        l.Upgrade();
                        if (_data == null)
                        {
                            _data = new object();
                        }
                    }
                    return _data;
                }
            }
        }
    }
}

There were a few design decisions that were made in here. First of the calls to enter the locks are methods. My original idea was to do it with properties to make it even cleaner; but this created an interesting bug. The bug will not actually manifest when you have the application run normally, but it does show up when you run it in debug mode. This happens when the IDE evaluates the property when you hover over it. The property creates a new lock, and the lock does not get disposed. The end result can cause some impressive head scratching.

The end result work well, and creates some nice clean code.

Posted in C#, Syntax | Tagged , , | Leave a comment

Gaming the IDE


So people have a lot of ideas about how software should be written; how code should look; what should be tested; and just about everything else.

We also know we can measure most of these things, cyclomatic complexity, code coverage, method length, methods per interface, and just about anything else. As we are talking about code we know that we can measure most of this, fxcop, and many source code visualization tools are out there.

These tools sometimes are also used in reference to automated build tools.

But they don’t yet seem to make it into to the IDE at the point of check in.

What would be the effect if a developer had a small message that popped up upon check-in that gave a score for the changes that they were about to commit. What would happen if there was a leader board? How would a code base change, and how fast would it happen?

There are some important questions to answer before you turn this on in a company. First and foremost make sure that you award things you actually want to see. And lastly, never underestimate the ability of people to game a system; they will find the loophole and they will crawl through it and they will likely get upset if you tell them that what did was against the spirit of the game.

So maybe this isn’t a good idea, but it is still interesting to muse on the concept.

Posted in Software Engineering | Tagged , , | Leave a comment

The Builder analogy is right


So there has been some critique on the builder analogy.

And a lot of it is deserved.

But one place where it is right; and not often referenced is how the builder comparison applies to current builders.

Current builders are not really what software likes to compare itself to; the software profession prefers to have a more romanticized view of builders. The comparison tends to be done with the builders tends to be done with builders from the middle ages. Where there were craftsmen, guilds, architects and cathedrals were being build. Not with the current builders where apartment buildings are being put up in under 6 months by the cheapest labor around and a result that looks very similar to a building 3 blocks over.

So lets looks at current builders

  • Chances are that everything is subcontracted from the HVAC, to architecture to cabinets & floors
  • Chances are that the builder, and almost everyone that is involved in the project is not an employee of the final occupant
  • Chances are that the building is constructed from many prefabricated components
  • Chances are that the building is created from a relatively standard architecture

So lets compare that to current software construction

  • Chances are that everything is done in house
  • Chances are that the developers are an employees of the builder
  • Chances are that the project is constructed from many custom made components
  • Chances are that the projects is created from a custom home made architecture

Why did builders switch from this craftsmen mentality to a subcontracting mentality? Not only that, why did builders make this change while it is much harder for construction to make this switch than for software. First of all buying sub components still requires raw resources for each unit bought; no matter how many cabinet doors you buy, the cost can never fall below the cost of the wood it takes to make the door. Whereas for developers the cost of replicating sub components can approach 0 when volume is high enough. The cost of replicating an architecture is already near 0 as most can be found on the internet at no cost.

An interesting side note is that one part of the industry does seem to follow the concept of current builders. And surprisingly it is the game industry, the game industry. Just read a current project and see all the different studios that do small pieces of a big project on behalf of a publisher. Some studios do level design, another renders hair and a following one provides a physics engine. And the funny thing is; we think it’s normal.

Posted in Software Engineering | Tagged , | Leave a comment

On Craftsmen


People have always had the most interesting problem of what developers are.

One of the first attempts I know of has been the Mythical Man-Month, it communicated the concept that some developers had 10 times the output of other developers. Ten times the output is completely unheard of in manufacturing, really in most of the engineering fields. This idea spawned the idea that some developers must be artist, since that is an area there the 10 to one ration makes great sense. There are after all many artists, and we only hear of a few, there are the greats and the rest.

The artist concept is interesting but completely unmanageable for an industry.

  • How can you price a developer who is ten times the value of an average developer?
  • What would you do when the developers catch on?
  • How can you replace these types of developers ?

Industry is there to make things, so the search was on for a way to make developers.

So we move on to the Pragmatic Programmer, and the introduction of the craftsman. Craftsmen seems much more manageable, craftsmen can be trained. The concept that there is a core body of knowledge and any person who masters it becomes a developer. Sure talent is in there somewhere but the important thing is the technologies. Software engineering was born, organizations sprang up, acronyms were created, levels defined and a bright new path was set that allowed organizations to tame their software problems.

That seems to be where we are at these days; Developers are craftsmen who care about honing their trade to perfection. If you want better results adopt these technologies and it will happen; and if it fails then you probably didn’t implement enough or did it wrong. Go to a presentation on Agile, Scrum, or some other methodology and they will all tell you that the only people that they knew that abandoned the technology were those that didn’t implement it right. The good news is that the guy presenting is probably a consultant who can help you implement them right.

And this idea is convenient for the developers as well, it gives them this nostalgic concept of being the creators. Part of a profession in the making, there are organizations springing up to certify developers. The rules are just being created and developers are eager for certifications and titles, for all that aspire want to one day be called the architect.

It seems that the craftsmen metaphor is working, and the development industry is rapidly shaping itself to fit it’s new mold.

Although there are some gaps for starters engineering disciplines are very rarely in a position where measurements are a massive problem. Like how do you measure the output of a developer, how do you predict the output of an architecture, how do you quantify the difference between two architectures. Sure we can call this part of being a new discipline but it goes a little bit deeper. And that is weird, everything we create is already in a format that is friendly for measurements, it is even discrete.

But measuring fails. We can measure lines of code, but keeping all else equal we prefer less to more. Less is easier to comprehend, easier to maintain, likely faster to write, likely less buggy and in the end cheaper. That is why we use higher level languages, abstractions, code generation and most other popular techniques. The same goes for features, more buttons does not make an application more valuable.

Another issue comes from the inherit problem of the craftsmen concept. It goes as follows. Envision a craftsman. What is he/she doing?

They are probably making something.

And that is the problem. If you ask a craftsman for something their response is to make something. The question rarely matters much as the craftsmen focus on it as an opportunity to make something. And quite often this creates the problem. No one is complaining that developers are not creating things, but they are complaining that they are creating the wrong thing.

It is the beautiful monster that was created when the developers all got convinced that they were craftsmen and started to act accordingly. They are creators, and the creation is done because that is what defines them. They learn how to create more, better, faster and sometimes closer to the right thing. After all, if it was wrong it was probably bad requirements.

Requirements are funny, it is like an insurance policy against being wrong. We have some poor bastards write it out before hand and then the developers have a shield to hide behind. Sure some methodologies are trying to get the requirements out of the picture, or at least more flexible. But this is still done under the guise of being craftsmen and creating a more perfect creation. And that is why it doesn’t work all that well.

As long as developers are considered craftsmen the most important element of software development will stay in the background.

The customer.

That is because craftsmen and customers don’t traditionally meet. It is the sales people that are the bridge. The craftsmen create, and the sales people find those who’s needs the product meets.

In software this is not really as easy, for consumer software there is a lot more competition, and for industry software there is often only one customer. So for success to be possible the product has to suit the customers quite closely, and then the sales people can do the rest.

To bridge this gap there is the introduction of the BA; and that would be the poor bastards that get to play the telephone game between the developers and the customers in a desperate attempt to make the product suit the customer. They get to be the glue between customers and development. And as long as they are there developers won’t have to take ownership in front of a customer.

But let’s imagine what would happen if tomorrow all BA’s get fired (no, I don’t hate all BA’s)

The only way that companies would be successful is if the developers start talking to the customers, and that the developers become responsive to the customers’ needs.

The only way that organizations will survive is for developers to become service sector workers.

 

 

Oh, and when we assume that developers are service sector workers measuring becomes pretty easy. I’m pretty sure everyone has filled out some sort of customer satisfaction survey before and that seems to be working pretty well for the industry.

 

Posted in Possum Labs, Software Engineering | Tagged | 2 Comments

On Data


On Data

I’m not a DBA, I’m also not a data architect. But I spend a lot of my time in databases; so this is written from a developers perspective.

The three stages of data

It seems that data lives in three stages, first the data is raw, it then gets normalized and finally Aggregated.

Raw data tends to be de-normalized and segregated by source, and gets updated. The source of this data could be a data feed, a service, user activity on the site, or sensors. This data tends to make up the majority of the records in databases that you’ll find in applications. And in many cases one of the data sources will be the system you are implementing.

The second stage of data, the Normalized Data, is where it gets normalized. This is where the different data sources come together and the data gets normalized. This is where records from different sources get linked together by business rules and where mappings occur between enumerations of different sources.

The third stage of the data, the Summarized Data, is where data gets summarized and molded for consumption. An example of this would be when title, first name, last name get combined into a display name. This is where the data for an account is summarized or content ratings get computed.

There once was a database in a company far, far away ..

Most systems will have examples of all of these concepts, but the concepts are comingled in the database. A common pattern tends to be that people take their first data feed and make that the database.  The application retrieves the data trough stored procedures that aggregate the data.

Then a second data source comes along.

The second data source will offer some data that the first data source doesn’t offer; so the data model gets augmented to support the second source. A few columns get added to the existing model to support the new data and the new data gets merged in with the original data. The procedures get updated to account for the changes in the model, and the things get up and running.

So far so good, or at least so it seems. There was something lost; the system is now in a transient stage. Depending on the order of the arrival of the data sources the behavior would be different. The system has also an all or nothing approach, when a data source is added the entire system needs to be aware of this. The data source will impact the procedures that retrieve data for the UI as well as impacting the data load for the initial source.

So let’s look a bit further in the life cycle of the application. More sources are added, more data is added, there are more users and they are complaining that the system is getting “slow”. Load drives the next generation changes, the summarization of data. Some complex common queueries get precomputed, some times in their own tables, some times as columns added to existing tables. The high usage areas of the application get updated to use the new queueries.

And this brings us to the point where most applications end up, with a database in a transient state and production support DBAs getting midnight calls as the order of data sources manage to get themselves in a new and interesting state.

Conclusion

Most systems will fall in a range between a segregated approach and an integrated approach. And of the systems that I’ve worked on the more segregated systems were the ones that were easiest to work on. Some of the benefits that resulted from this segregation were as follows.

  • All normalized data and aggregated data could be deleted and recomputed. When the business logic for aggregation on normalization changed it had only change in one place and there was no “legacy” data.
  • There less code, although much more data. If any information had to be computed it was precomputed and stored in a table. The procedures to retrieve data were simple joins and no data manipulation was needed. In the transition from a classic system to a segregated system we reduced the lines of sql by more than 60%.
  • The system is fast, everything is a simple select with a few possible joins
  • The system was easy to learn and fairly self documenting. Each summary table has procedure(s) that populates it, the data retrieval procedures build upon the summary tables.
  • Easy to verify, the system can be inspected in each of the stages to you only need to verify small pieces of logic at a time.

As for the drawbacks

  • This system will increase data size between X 2 and X 5, table lengths are not impacted. All data is copied once to the normalization tables, and possibly to the summary tables
  • The system does not accommodate systems with frequent updates. Each update cascades to the normalized and summary tables. It is ideal for systems that have a concept such as batch or nightly imports.

Considering the reduction in the cost of redundant storage, and the advent of big table solutions like seen in many could solutions the major deficit of the solution (storage) will be negated. The secondary deficit is still applicable.

Posted in Architecture | Tagged , , , | Leave a comment