Everything You Need to Know about ITIL Problem Management

By Andy Marker | January 25, 2019 (updated October 25, 2021)

IT departments deal with an enormous number of complaints and problems daily. Jammed printers, internet outages, email errors, and requests for new laptops are just a few of the issues that can impact staff productivity. In order to keep technology working and operations running, business can implement frameworks such as the IT Infrastructure Library (ITIL) to support and streamline IT processes.

In this article, we will discuss the ITIL framework and its associated processes. We’ll also dive into the workflow, benefits, key performance indicators, and best practices associated with problem management.

The Origins of ITIL

ITIL emerged in the late 1980s when the British government became dissatisfied by the quality and focus of IT services. The government tasked the former Central Computer and Telecommunications Agency (CCTA) with the development of a framework that would ensure high-quality and cost-effective IT services that focused on client needs. ITIL’s original name was Government Information Technology Infrastructure Management (GITIM).

European government agencies and large nongovernment organizations began adopting the framework in the 1990s. In 2000, the CCTA and the Office of Government Commerce (OGC) merged. As one, they reworked and released ITIL version 2 in 2001, ITIL v3 in 2007, and an update, ITIL 2011, in 2011. ITIL became the most widely adopted IT service management (ITSM) framework around the world. Axelos, set up by the U.K. government in 2014, is now charged with managing methodologies, including ITIL, that were formerly owned by the OGC. The next version, ITIL v4, will be released in Q1 2019.

Five core publications detail the entire ITIL service lifecycle:

  • ITIL Service Strategy

  • ITIL Service Design

  • ITIL Service Transition

  • ITIL Service Operation

  • ITIL Continual Service Improvement

In this article, we will focus on the ITIL service operation publication, specifically the problem management process.

Service operation focuses on ensuring IT services are delivered successfully and efficiently. The processes within service operation include the following :

  • Access management

  • Application management

  • Event management

  • Facilities management

  • Incident management

  • IT operations control

  • Problem management

  • Request fulfilment

  • Technical management

See how Smartsheet can help you be more effective

Watch the demo to see how you can more effectively manage your team, projects, and processes with real-time work management in Smartsheet.

 

Watch a free demo

What Is Problem Management?

Each ITIL publication and its associated processes focus on supporting the ultimate goal of ITIL: to improve the way IT delivers and supports essential IT services. The problem management process identifies problems quickly, provides end-to-end management, and diagnoses the underlying root cause. The plan is to prevent problems from occurring, thus eliminating recurring incidents. If an incident does occur, problem management helps minimize the impact on the business.

That said, it’s virtually impossible to avoid all IT problems. Some of the most common IT complaints include recurring network outages, hardware failures, nonfunctional database queries due to integration issues, software bugs, and data backup errors.

Below are important terms associated with ITIL problem management:

  • Problem: The cause of one or more incidents, such as a recurring internet outage.
  • Error: The failure of an IT service due to a design flaw or system failure.
  • Known Error: A problem that is documented with a workaround.
  • Root Cause Analysis: The analysis or systemic investigation performed to identify the fundamental cause of a problem. Organizations use various techniques to perform root cause analyses. Depending on the problem, they may be used alone or in conjunction with one another. Some of these techniques include the following:
    • Brainstorming
    • Flowcharting
    • Affinity Diagram
    • Chronology of events
    • Fault detection and isolation
    • Rapid Problem Resolution (RPR)

The following techniques also include a free, downloadable template to help you get started.

Pareto Analysis Template

 

The Pareto analysis can be used to determine the frequency of problems occuring in a process. This template includes a Pareto diagram, bar chart, and line graph for analysis.

‌ Download Pareto Analysis Template

The 5 Whys

 

The 5 Whys root cause analysis template is used to ask a series of questions until the root cause can be uncovered. Use this template as a framework for asking “why” questions and noting corrective actions to prevent recurrence.

Download The 5 Whys Template

Excel | Word | PDF  | Smartsheet

Kepner Tregoe Rational Model

 

Decision Matrix Weighted Template

Created by Charles Kepner and Benjamin Tregoe, this model provides a method for gathering, evaluating, and prioritizing information to identify the root cause of a problem and prevent it in the future. There are four major steps - appraise the situation, analyze the problem, analyze decisions, and analyze potential problems. When analyzing decisions, it is important to identify alternatives and perform a risk analysis for each by using a weighted decision matrix.

‌ Download Kepner Tregoe Decision Matrix Template

Ishikawa Fishbone Diagram

 

Fishbone Diagram Template

Also known as a cause and effect diagram, the fishbone diagram is a visual compilation of information that helps teams brainstorm to find the cause of an issue.

‌ Download Ishikawa Fishbone Diagram Template

Six Sigma DMAIC

 

DMAIC focuses on incrementally improving existing processes. It stands for the five phases of a Six Sigma improvement cycle: Define, Measure, Analyze, Improve, This template can be used as a DMAIC roadmap.

Download Six Sigma DMAIC Template

Excel | Word

 

Problem Management Process Workflow

Many times, an organization detects a problem when users report the same or similar incidents to the service desk in a short time frame. For example, if one user reports that their email is not working, it’s likely an isolated incident or user error that can be quickly resolved. However, if the service desk receives five reports of email errors within a 30 minutes, it’s likely a more impactful problem that requires analysis to resolve.

The problem management process includes the following workflow stages:

  • Problem Detection: The organization identifies problems from a user’s incident report, analysis of existing incidents, or an automated event monitoring solution.
  • Problem Logging: The organization logs problems with all relevant details, including the reporting user’s information, date and time, category, priority, severity, description, configuration item impacted, linked incidents, and resolution.
  • Problem Investigation: The service desk team examines the root cause of the problem. The service desk typically investigates problems based on their priority (high-priority issues have the greatest impact on IT services).
  • Problem Diagnosis: The business identifies the cause of the problem based on the results of the investigation.
  • Workaround: The team takes temporary measures to restore services until the problem is resolved.
  • Known Error Creation: They log the problem as a known error (in the known error database, or KEDB) so future related incidents can be linked to and addressed quickly.
  • Problem Resolution: When a business addresses the underlying cause of the problem and restores normal service operation, it prevents recurring incidents and the problem is considered resolved.
  • Problem Closure: Once the problem is confirmed effectively resolved, the business can close the problem and associated incidents. The problem management process includes the following workflow stages:
  • Problem Detection: The organization identifies problems from a user’s incident report, analysis of existing incidents, or an automated event monitoring solution.
  • Problem Logging: The organization logs problems with all relevant details, including the reporting user’s information, date and time, category, priority, severity, description, configuration item impacted, linked incidents, and resolution.
  • Problem Investigation: The service desk team examines the root cause of the problem. The service desk typically investigates problems based on their priority (high-priority issues have the greatest impact on IT services).
  • Problem Diagnosis: The business identifies the cause of the problem based on the results of the investigation.
  • Workaround: The team takes temporary measures to restore services until the problem is resolved.
  • Known Error Creation: They log the problem as a known error (in the known error database, or KEDB) so future related incidents can be linked to and addressed quickly.
  • Problem Resolution: When a business addresses the underlying cause of the problem and restores normal service operation, it prevents recurring incidents and the problem is considered resolved.
  • Problem Closure: Once the problem is confirmed effectively resolved, the business can close the problem and associated incidents.

Reactive vs. Proactive Problem Management

Problem management takes on different forms depending on the organization culture, technology resources, and skill set of the IT team. Most ITIL-focused teams take both a reactive and proactive approach to problem management.

Reactive problem management takes place after the incident has been reported. It is a reaction to a problem that already exists and follows the workflow stages described in the previous section. 

Proactive problem management is a preventative approach that aims to thwart incidents from occurring in the first place by identifying IT infrastructure weaknesses. Proactive problem management can be difficult for many organizations because it requires both the resources and skill set to perform extensive trend analysis to identify an incident before it even occurs. Preventative activities include ongoing maintenance (especially for hardware and software reaching the end of their lifecycle), regulatory audits, automated performance monitoring, capacity planning, disaster recovery and service continuity planning, release management planning and testing, change management process, and a documented security management policy.

One activity that may aid in proactive problem management is major problem review. Organizations classify major problems based on their impact to the business. By reviewing major problems, organizations can help identify what they did correctly and incorrectly, as well as areas of improvement that, when fixed, can improve overall problem management and prevent the recurrence of problems.

Problem Management Team Members

Who is involved in the problem management process? The answer varies, but most large organizations that follow the ITIL framework employ the following team members:

  • Problem manager/process owner

  • Analysts

  • Network operations staff

  • Engineers

  • Change management team

  • Configuration management team

  • Service desk technicians (first-, second-, and third-level support)

  • Call center staff

The problem manager owns the problem management process, but also relies on other IT staff. For example, an engineer may analyze a problem in order to identify the root cause, and a change management team member will work to implement the fix.

Benefits of Problem Management

When an organization succeeds at problem management, the entire business benefits from fewer technology issues. Unfortunately, the constantly changing technology landscape means 100 percent protection from downtime is impossible, but problem management minimizes the disruptions that occur. Additional benefits of problem management include the following:

  • Reduced downtime and disruption

  • Improved service availability

  • Decreased workload and stress on service desk staff

  • Improved customer satisfaction (CSAT)

  • Decreased costs associated with downtime

  • Decreased incident impact

  • Improved service quality

  • Faster resolution time and increased first call resolution rate (FCR)

  • Fewer high-priority incidents

  • Decreased business disruption

  • Increased staff productivity

  • Improved training and learning documentation for new and existing staff

  • Prevents incident recurrence

  • Supports ISO/IEC 20000 certification requirements

The Challenges of Problem Management

As with any new or existing process, day-to-day problem management can cause a headache for the process manager, staff, and even ancillary business associates. Below are some of the challenges that may arise:

  • Lack of Knowledge and Training: ITIL can be extremely helpful to an organization, but that comes with the challenge of understanding the framework and its terminology. This is where ITIL training and certification can help, especially for team members who will be directly involved in the problem management process.

  • Leadership Buy-In/Commitment and Resistance to Change: Change is difficult, and without the buy-in of upper management, team members may be reluctant to take on a new process workflow.

  • Competing Priorities: Different IT managers may not have the same priorities when dealing with a service disruption.

  • Missing Information: Problem management depends on thorough documentation. If the organization doesn’t collect essential information during the initial incident report, problem managers may have difficulty resolving the problem quickly. Thus, the service desk is responsible for receiving and documenting all incident reports in order to ensure technicians have full access to critical information – for example, assigning the proper priority and severity help to determine the true impact to the organization.

  • Reliance on Other Teams for Accountability, and Review: The problem management process must rely on team members across IT during the analysis and resolution phase. It is difficult for problem managers to hold staff that do not report to him/her accountable. In addition, KPIs must be applied and managed across all teams involved in the process.

How to Choose a Problem Management Software Solution

ITIL-verified software is a great way to guide and manage your problem management process. Consider the following features as you evaluate software solutions:

  • Customer self-service portal

  • Service-level agreements

  • Reporting and key performance indicators (KPIs)

  • Workflow automation

  • ITIL-verified

  • Internal and/or third-party knowledge management with the ability to search

  • Customer satisfaction surveys

  • Assignments and escalations to individuals and/or teams

  • Historical audit log

  • Templates for recurring issues

  • Unique identifier assigned to each record

  • Time tracking

The ideal problem management software solution should also have the following abilities:

  • Adhere to the appropriate problem management process (create, categorize, edit, resolve, and close)

  • Classify priority and severity to determine impact

  • Be configurable to meet organization’s unique requirements (processes, forms, categories, fields, user permissions, etc.)

  • Integrate with other ITIL processes (incident, change, knowledge, and configuration management)

  • Link incidents to problems and resolve all incidents when the problem is resolved

  • Differentiate, but allow links between incidents, problems, known errors, knowledge articles, and changes

  • Document root cause

  • Integrate with configuration management for easy visibility into impacted configuration items (CI)

Key Performance Indicators and Critical Success Factors in Problem Management

In addition to the above-mentioned features, the ability to measure performance is critical to problem management success. KPIs and critical success factors (CSF) are specific to each organization, but when applied, they help identify areas for improvement. Some common KPIs and CSFs include the following:

  • Number of problems per time frame/category/department/user

  • Number of incidents per problem

  • Reduced time to problem resolution

  • Met and breached SLAs

  • Average time from incident report to problem root cause

  • Reduction/growth of problem backlog

  • Reduced costs associated with problem management

  • Root cause analysis trends

  • Service quality improvements

  • Increase in proactive change submission

  • Decrease in number of incidents over time

  • Reduced problem impact

  • Reduction in problem backlog

  • Increase first call resolution rate

Tips for Implementing Effective ITIL Problem Management

Implementing ITIL processes is not simple, but it can make a huge difference in IT success. To implement any ITIL process, start by gaining a thorough understanding of the ITIL methodology. This will help team members understand the ITIL processes and the business value. The goal of ITIL is not to offer a prescriptive, step-by-step implementation process, but a flexible framework to guide IT departments in improving their processes. There is no requirement to implement all processes. Rather, you want to choose the processes that best fit the needs of the organization.

 

Erika flora

Erika Flora, certified ITIL Expert and founder of Beyond20, works with global organizations as an ITSM and Agile/Scrum consultant, instructor, and coach. When asked about ITIL problem management implementation tips, she shared a customer story. “We worked with one of our customers to form something called a Problem Management Committee that meets for an hour each week to discuss open problems, assign ownership to each high-priority problem, drive the highest-priority problems to conclusion, and report successes to leadership and colleagues. This format turned out to be especially effective because it instilled discipline in the organization to regularly and proactively discuss and resolve problems.”

 

Josh Green

Another important tip comes from Joshua Green, Vice President, Customer Success, Strategy, and Operations at Vision Critical: “Problem management is critical. Be sure and staff it properly. An understaffed problem team will more than likely fail to ask enough questions, deep enough questions, and may skimp on documenting the answers properly during root cause analysis. A successful problem team will prevent recurring incidents, but they need to be able to focus and dive deep.”

Although ITIL requires a time and resource commitment, it does not have to bring business to a halt. Third-party organizations offer training and certification classes around the world. Send a dedicated representative to a training course to begin the implementation process. As you progress, you have the option to train the entire team.

Teams implementing ITIL problem management can also follow these tips:

  • Gain IT and senior leadership support.

  • Create a clear vision and purpose with clearly defined processes and goals.

  • Define the relationship between various processes, specifically incident, problem, and change management.

  • Take a process-by-process approach to minimize disruption and gain quick wins.

  • Dedicate resources to both reactive and proactive problem management.

  • Take a preventative approach to problem management.

  • Communicate successes with the entire organization.

  • Implement incident management prior to implementing problem management.

  • Implement a software solution that supports your specific problem management requirements.

  • Dedicate time to problem management efforts.

  • Train and educate service desk staff.

IT success is not simply a result of keeping computers and printers running within your organization, but also of aligning IT with the goals of the business as a whole. IT is the heart and soul of business and a key contributor to revenue and competitive differentiation, and ITIL provides the framework to reach those goals.

ITIL is constantly evolving as technology becomes more complex and business needs change. More than 150 industry experts have been involved in ITIL version 4, which will be released in Q1 2019 and is expected to integrate with DevOps, Agile, and Lean.

Improve ITIL Problem Management with Smartsheet for IT & Ops

Empower your people to go above and beyond with a flexible platform designed to match the needs of your team — and adapt as those needs change. 

The Smartsheet platform makes it easy to plan, capture, manage, and report on work from anywhere, helping your team be more effective and get more done. Report on key metrics and get real-time visibility into work as it happens with roll-up reports, dashboards, and automated workflows built to keep your team connected and informed. 

When teams have clarity into the work getting done, there’s no telling how much more they can accomplish in the same amount of time. Try Smartsheet for free, today.

 

 

Discover why over 90% of Fortune 100 companies trust Smartsheet to get work done.

Get free Smartsheet templates Get a Free Smartsheet Demo