The Origins of ITIL
ITIL emerged in the late 1980s when the British government became dissatisfied by the quality and focus of IT services. The government tasked the former Central Computer and Telecommunications Agency (CCTA) with the development of a framework that would ensure high-quality and cost-effective IT services that focused on client needs. ITIL’s original name was Government Information Technology Infrastructure Management (GITIM).
European government agencies and large nongovernment organizations began adopting the framework in the 1990s. In 2000, the CCTA and the Office of Government Commerce (OGC) merged. As one, they reworked and released ITIL version 2 in 2001, ITIL v3 in 2007, and an update, ITIL 2011, in 2011. ITIL became the most widely adopted IT service management (ITSM) framework around the world. Axelos, set up by the U.K. government in 2014, is now charged with managing methodologies, including ITIL, that were formerly owned by the OGC. The next version, ITIL v4, will be released in Q1 2019.
Five core publications detail the entire ITIL service lifecycle:
-
ITIL Service Strategy
-
ITIL Service Design
-
ITIL Service Transition
-
ITIL Service Operation
-
ITIL Continual Service Improvement
In this article, we will focus on the ITIL service operation publication, specifically the problem management process.
Service operation focuses on ensuring IT services are delivered successfully and efficiently. The processes within service operation include the following :
-
Access management
-
Application management
-
Event management
-
Facilities management
-
Incident management
-
IT operations control
-
Problem management
-
Request fulfilment
-
Technical management
See how Smartsheet can help you be more effective
Watch the demo to see how you can more effectively manage your team, projects, and processes with real-time work management in Smartsheet.
What Is Problem Management?
Each ITIL publication and its associated processes focus on supporting the ultimate goal of ITIL: to improve the way IT delivers and supports essential IT services. The problem management process identifies problems quickly, provides end-to-end management, and diagnoses the underlying root cause. The plan is to prevent problems from occurring, thus eliminating recurring incidents. If an incident does occur, problem management helps minimize the impact on the business.
That said, it’s virtually impossible to avoid all IT problems. Some of the most common IT complaints include recurring network outages, hardware failures, nonfunctional database queries due to integration issues, software bugs, and data backup errors.
Below are important terms associated with ITIL problem management:
- Problem: The cause of one or more incidents, such as a recurring internet outage.
- Error: The failure of an IT service due to a design flaw or system failure.
- Known Error: A problem that is documented with a workaround.
- Root Cause Analysis: The analysis or systemic investigation performed to identify the fundamental cause of a problem. Organizations use various techniques to perform root cause analyses. Depending on the problem, they may be used alone or in conjunction with one another. Some of these techniques include the following:
- Brainstorming
- Flowcharting
- Affinity Diagram
- Chronology of events
- Fault detection and isolation
- Rapid Problem Resolution (RPR)
The following techniques also include a free, downloadable template to help you get started.
Pareto Analysis Template
The Pareto analysis can be used to determine the frequency of problems occuring in a process. This template includes a Pareto diagram, bar chart, and line graph for analysis.
Download Pareto Analysis Template
The 5 Whys
The 5 Whys root cause analysis template is used to ask a series of questions until the root cause can be uncovered. Use this template as a framework for asking “why” questions and noting corrective actions to prevent recurrence.
Download The 5 Whys Template
Excel | Word | PDF | Smartsheet
Kepner Tregoe Rational Model
Created by Charles Kepner and Benjamin Tregoe, this model provides a method for gathering, evaluating, and prioritizing information to identify the root cause of a problem and prevent it in the future. There are four major steps - appraise the situation, analyze the problem, analyze decisions, and analyze potential problems. When analyzing decisions, it is important to identify alternatives and perform a risk analysis for each by using a weighted decision matrix.
Download Kepner Tregoe Decision Matrix Template
Ishikawa Fishbone Diagram
Also known as a cause and effect diagram, the fishbone diagram is a visual compilation of information that helps teams brainstorm to find the cause of an issue.
Download Ishikawa Fishbone Diagram Template
Six Sigma DMAIC
DMAIC focuses on incrementally improving existing processes. It stands for the five phases of a Six Sigma improvement cycle: Define, Measure, Analyze, Improve, This template can be used as a DMAIC roadmap.
Download Six Sigma DMAIC Template
Problem Management Process Workflow
Many times, an organization detects a problem when users report the same or similar incidents to the service desk in a short time frame. For example, if one user reports that their email is not working, it’s likely an isolated incident or user error that can be quickly resolved. However, if the service desk receives five reports of email errors within a 30 minutes, it’s likely a more impactful problem that requires analysis to resolve.
The problem management process includes the following workflow stages:
- Problem Detection: The organization identifies problems from a user’s incident report, analysis of existing incidents, or an automated event monitoring solution.
- Problem Logging: The organization logs problems with all relevant details, including the reporting user’s information, date and time, category, priority, severity, description, configuration item impacted, linked incidents, and resolution.
- Problem Investigation: The service desk team examines the root cause of the problem. The service desk typically investigates problems based on their priority (high-priority issues have the greatest impact on IT services).
- Problem Diagnosis: The business identifies the cause of the problem based on the results of the investigation.
- Workaround: The team takes temporary measures to restore services until the problem is resolved.
- Known Error Creation: They log the problem as a known error (in the known error database, or KEDB) so future related incidents can be linked to and addressed quickly.
- Problem Resolution: When a business addresses the underlying cause of the problem and restores normal service operation, it prevents recurring incidents and the problem is considered resolved.
- Problem Closure: Once the problem is confirmed effectively resolved, the business can close the problem and associated incidents. The problem management process includes the following workflow stages:
- Problem Detection: The organization identifies problems from a user’s incident report, analysis of existing incidents, or an automated event monitoring solution.
- Problem Logging: The organization logs problems with all relevant details, including the reporting user’s information, date and time, category, priority, severity, description, configuration item impacted, linked incidents, and resolution.
- Problem Investigation: The service desk team examines the root cause of the problem. The service desk typically investigates problems based on their priority (high-priority issues have the greatest impact on IT services).
- Problem Diagnosis: The business identifies the cause of the problem based on the results of the investigation.
- Workaround: The team takes temporary measures to restore services until the problem is resolved.
- Known Error Creation: They log the problem as a known error (in the known error database, or KEDB) so future related incidents can be linked to and addressed quickly.
- Problem Resolution: When a business addresses the underlying cause of the problem and restores normal service operation, it prevents recurring incidents and the problem is considered resolved.
- Problem Closure: Once the problem is confirmed effectively resolved, the business can close the problem and associated incidents.
Reactive vs. Proactive Problem Management
Problem management takes on different forms depending on the organization culture, technology resources, and skill set of the IT team. Most ITIL-focused teams take both a reactive and proactive approach to problem management.
Reactive problem management takes place after the incident has been reported. It is a reaction to a problem that already exists and follows the workflow stages described in the previous section.
Proactive problem management is a preventative approach that aims to thwart incidents from occurring in the first place by identifying IT infrastructure weaknesses. Proactive problem management can be difficult for many organizations because it requires both the resources and skill set to perform extensive trend analysis to identify an incident before it even occurs. Preventative activities include ongoing maintenance (especially for hardware and software reaching the end of their lifecycle), regulatory audits, automated performance monitoring, capacity planning, disaster recovery and service continuity planning, release management planning and testing, change management process, and a documented security management policy.
One activity that may aid in proactive problem management is major problem review. Organizations classify major problems based on their impact to the business. By reviewing major problems, organizations can help identify what they did correctly and incorrectly, as well as areas of improvement that, when fixed, can improve overall problem management and prevent the recurrence of problems.
Problem Management Team Members
Who is involved in the problem management process? The answer varies, but most large organizations that follow the ITIL framework employ the following team members:
-
Problem manager/process owner
-
Analysts
-
Network operations staff
-
Engineers
-
Change management team
-
Configuration management team
-
Service desk technicians (first-, second-, and third-level support)
-
Call center staff
The problem manager owns the problem management process, but also relies on other IT staff. For example, an engineer may analyze a problem in order to identify the root cause, and a change management team member will work to implement the fix.
Benefits of Problem Management
When an organization succeeds at problem management, the entire business benefits from fewer technology issues. Unfortunately, the constantly changing technology landscape means 100 percent protection from downtime is impossible, but problem management minimizes the disruptions that occur. Additional benefits of problem management include the following:
-
Reduced downtime and disruption
-
Improved service availability
-
Decreased workload and stress on service desk staff
-
Improved customer satisfaction (CSAT)
-
Decreased costs associated with downtime
-
Decreased incident impact
-
Improved service quality
-
Faster resolution time and increased first call resolution rate (FCR)
-
Fewer high-priority incidents
-
Decreased business disruption
-
Increased staff productivity
-
Improved training and learning documentation for new and existing staff
-
Prevents incident recurrence
-
Supports ISO/IEC 20000 certification requirements
The Challenges of Problem Management
As with any new or existing process, day-to-day problem management can cause a headache for the process manager, staff, and even ancillary business associates. Below are some of the challenges that may arise:
-
Lack of Knowledge and Training: ITIL can be extremely helpful to an organization, but that comes with the challenge of understanding the framework and its terminology. This is where ITIL training and certification can help, especially for team members who will be directly involved in the problem management process.
-
Leadership Buy-In/Commitment and Resistance to Change: Change is difficult, and without the buy-in of upper management, team members may be reluctant to take on a new process workflow.
-
Competing Priorities: Different IT managers may not have the same priorities when dealing with a service disruption.
-
Missing Information: Problem management depends on thorough documentation. If the organization doesn’t collect essential information during the initial incident report, problem managers may have difficulty resolving the problem quickly. Thus, the service desk is responsible for receiving and documenting all incident reports in order to ensure technicians have full access to critical information – for example, assigning the proper priority and severity help to determine the true impact to the organization.
-
Reliance on Other Teams for Accountability, and Review: The problem management process must rely on team members across IT during the analysis and resolution phase. It is difficult for problem managers to hold staff that do not report to him/her accountable. In addition, KPIs must be applied and managed across all teams involved in the process.
How to Choose a Problem Management Software Solution
ITIL-verified software is a great way to guide and manage your problem management process. Consider the following features as you evaluate software solutions:
-
Customer self-service portal
-
Service-level agreements
-
Reporting and key performance indicators (KPIs)
-
ITIL-verified
-
Internal and/or third-party knowledge management with the ability to search
-
Customer satisfaction surveys
-
Assignments and escalations to individuals and/or teams
-
Templates for recurring issues
-
Unique identifier assigned to each record
-
Time tracking
The ideal problem management software solution should also have the following abilities:
-
Adhere to the appropriate problem management process (create, categorize, edit, resolve, and close)
-
Classify priority and severity to determine impact
-
Be configurable to meet organization’s unique requirements (processes, forms, categories, fields, user permissions, etc.)
-
Integrate with other ITIL processes (incident, change, knowledge, and configuration management)
-
Link incidents to problems and resolve all incidents when the problem is resolved
-
Differentiate, but allow links between incidents, problems, known errors, knowledge articles, and changes
-
Document root cause
-
Integrate with configuration management for easy visibility into impacted configuration items (CI)
Key Performance Indicators and Critical Success Factors in Problem Management
In addition to the above-mentioned features, the ability to measure performance is critical to problem management success. KPIs and critical success factors (CSF) are specific to each organization, but when applied, they help identify areas for improvement. Some common KPIs and CSFs include the following:
-
Number of problems per time frame/category/department/user
-
Number of incidents per problem
-
Reduced time to problem resolution
-
Met and breached SLAs
-
Average time from incident report to problem root cause
-
Reduction/growth of problem backlog
-
Reduced costs associated with problem management
-
Root cause analysis trends
-
Service quality improvements
-
Increase in proactive change submission
-
Decrease in number of incidents over time
-
Reduced problem impact
-
Reduction in problem backlog
-
Increase first call resolution rate
Tips for Implementing Effective ITIL Problem Management
Implementing ITIL processes is not simple, but it can make a huge difference in IT success. To implement any ITIL process, start by gaining a thorough understanding of the ITIL methodology. This will help team members understand the ITIL processes and the business value. The goal of ITIL is not to offer a prescriptive, step-by-step implementation process, but a flexible framework to guide IT departments in improving their processes. There is no requirement to implement all processes. Rather, you want to choose the processes that best fit the needs of the organization.
Although ITIL requires a time and resource commitment, it does not have to bring business to a halt. Third-party organizations offer training and certification classes around the world. Send a dedicated representative to a training course to begin the implementation process. As you progress, you have the option to train the entire team.
Teams implementing ITIL problem management can also follow these tips:
-
Gain IT and senior leadership support.
-
Create a clear vision and purpose with clearly defined processes and goals.
-
Define the relationship between various processes, specifically incident, problem, and change management.
-
Take a process-by-process approach to minimize disruption and gain quick wins.
-
Dedicate resources to both reactive and proactive problem management.
-
Take a preventative approach to problem management.
-
Communicate successes with the entire organization.
-
Implement incident management prior to implementing problem management.
-
Implement a software solution that supports your specific problem management requirements.
-
Dedicate time to problem management efforts.
-
Train and educate service desk staff.
IT success is not simply a result of keeping computers and printers running within your organization, but also of aligning IT with the goals of the business as a whole. IT is the heart and soul of business and a key contributor to revenue and competitive differentiation, and ITIL provides the framework to reach those goals.
ITIL is constantly evolving as technology becomes more complex and business needs change. More than 150 industry experts have been involved in ITIL version 4, which will be released in Q1 2019 and is expected to integrate with DevOps, Agile, and Lean.
Improve ITIL Problem Management with Smartsheet for IT & Ops
Empower your people to go above and beyond with a flexible platform designed to match the needs of your team — and adapt as those needs change.
The Smartsheet platform makes it easy to plan, capture, manage, and report on work from anywhere, helping your team be more effective and get more done. Report on key metrics and get real-time visibility into work as it happens with roll-up reports, dashboards, and automated workflows built to keep your team connected and informed.
When teams have clarity into the work getting done, there’s no telling how much more they can accomplish in the same amount of time. Try Smartsheet for free, today.