What is the Structure of an Empirical Estimation Model?

An empirical estimation model's structure typically involves a formula derived from historical data, using estimated variables (like Lines of Code or Function Points) to predict effort (e.g., person-months).

Breakdown of an Empirical Estimation Model

Empirical estimation models in software development leverage past project data to predict future project effort. They are fundamentally based on observation and experience, rather than theoretical first principles. The basic structure can be understood as follows:

Core Components

Effort (E): The dependent variable, usually expressed in person-months or hours. This is what the model predicts.
Size (ev): An independent variable that represents the estimated size of the software project. Common metrics include:
- Lines of Code (LOC): The number of lines of code expected in the finished project.
- Function Points (FP): A measure of the functionality delivered by the software.
Empirically Derived Constants (A, B, C, ...): These are parameters derived from analyzing historical project data. They represent the relationships between size and effort in past projects and are the core of the "empirical" nature of the model. Different models use different numbers of constants.

General Formula Structure

The general form of an empirical estimation model often looks like this:

*E = A (ev)^B + C**

Where:

E = Effort (in person-months)
A, B, and C = Empirically derived constants
ev = Estimated variable (LOC or FP), representing size

Explanation of the Formula Components:

A (Multiplier): A constant that scales the overall effort based on the project's size.
ev^B (Size Exponent): The estimated size (LOC or FP) raised to the power of B. The exponent B is crucial because it reflects the non-linear relationship between size and effort. As project size increases, the effort required typically increases at a faster rate due to increased complexity and communication overhead. B is often a value between 1 and 2.
C (Additive Factor): An adjustment factor that adds a fixed amount of effort, regardless of the project's size. This might represent fixed costs like project initiation activities, training, or infrastructure setup. It can sometimes be zero.

Examples of Empirical Estimation Models

Different models exist, each with its own set of constants A, B, and C, derived from different datasets:

COCOMO (Constructive Cost Model): A well-known family of models. Its basic form fits the general structure described above. COCOMO distinguishes between different development modes (organic, semi-detached, embedded), each with its own set of constants.
Function Point Analysis-based Models: Use function points as the 'ev' variable and have empirically derived constants adjusted to fit Function Point data.
SEER-SEM: A commercial estimation tool that uses a complex empirical model.

How Empirical Models are Developed

Collect Historical Project Data: Gather data from past projects, including actual effort, size (LOC or FP), and other relevant factors.
Analyze the Data: Use statistical techniques (e.g., regression analysis) to determine the values of the constants (A, B, C) that best fit the historical data.
Validate the Model: Test the model's accuracy by comparing its predictions to the actual effort of projects not used in the model's development.
Refine the Model: Continuously update the model with new project data to improve its accuracy over time.

Advantages of Empirical Estimation Models

Data-Driven: Based on actual project experience, making them more realistic than purely theoretical models.
Objective: Reduce subjective biases in effort estimation.
Repeatable: Provide a consistent and repeatable estimation process.

Limitations of Empirical Estimation Models

Dependence on Historical Data: Accuracy depends on the quality and relevance of the historical data used to develop the model.
Domain-Specific: Models developed for one type of project or organization may not be accurate for others.
Calibration Required: Need to be calibrated to reflect the specific development environment and practices.
Not a Replacement for Expertise: Still require expert judgment to estimate the size (LOC or FP) and to interpret the model's results.

askvity