Incident Management

Incident Management manager Sun, 12/05/2021 - 14:46

Definition and scope

Definition and scope

An Incident is considered to be an item, that is a disturbance of the business continuity/services in the broadest sense of the word. This could be a malfunctioning of a system, an outage, a blocked access, a non-availability of any kind of system (infra, app, telco).

The Incident Management Process is an end-to-end process, handling:

  • receiving
  • capturing
  • classifying
  • resolving
  • closing

of incidents.

The Incident Management Process relates to the following other processes managed by heathdata.be:

  • Request Management Process
  • Change Management Process
  • Problem Management Process
  • Configuration Management Process
  • SLO/SLA Management Process

As for the relation with Request Management Process:

  • Requests that are entered in the system, but are in fact Incidents, are transferred from Request to Incident.

As for the relation with Change Management Process:

  • If the resolution of an Incident requires to implement a Change, the Incident will go into phase “Awaiting Change” and the Change Management Process will be invoked. After completion, the Incident Management Process will be resumed.

NOTE: the linkage to Problem, Configuration, SLO/SLA Management Process will be accomplished once these processes are defined and implemented.

The Incident Management Process is implemented in the following applications used by healthdata.be:

  • ServiceNow

The Incident Management Process is interfacing with the following applications used by healthdata.be:

  • DB2 Reporting Process/Tool
  • ServiceNow ServicePortal

The Incident Management Process is owned by:

  • Team lead “Services & Support” of healthdata.be.
manager Sun, 12/05/2021 - 14:47

Overall Process

Overall Process manager Sun, 12/05/2021 - 14:48

Diagram

Diagram

This diagram describes the major process related activities, for each of the major steps. For each step, the responsibilities of all roles applicable are explained in a RACI matrix. Each step is explained as well in the next paragraphs.

manager Thu, 01/13/2022 - 11:01

Roles & Responsibilities

Roles & Responsibilities
RoleDescription
Incident OwnerThe Incident Owner is responsible for ensuring that all activities defined within the process are undertaken and that the process achieves its goals and objectives.
Incident CoordinatorThe Incident Coordinator is responsible of managing all incidents that are assigned to his group, within the SLA defined.
Incident ManagerThe Incident Manager is responsible for process design and for the day to day management of the process. The manager has authority to manage Incidents effectively through First, Second, Third Level Support.
End UserThe End User is the person using an IT resource. This role is responsible to report all Incidents and make all IT requests and contacts through the Service Desk.
Service Desk AgentThe Service Desk Agent is responsible for the day to day communication with all End Users and to facilitate the resolution and fulfillment of Incidents.
L2/L3 Incident AnalystThe Incident Analyst is responsible for implementing and executing the Incident process as defined by the Incident Owner/Manager, and to be a point of contact for escalated issues, questions, or concerns.
Major Incident TeamThe Major Incident Team is a group of individuals brought together to manage a Major Incident. This team includes the Service Desk function, the IT organization, and Third-Party companies.

Implementation of the major roles in the healthdata.be team:

RoleHealthdata function
End Useranybody who is not part of the healthdata organization, but is a user of its services (scientists, Sciensano staff, hospital/laboratory staff, …)
Service Desk AgentIs part of the role of Support Engineer/Service Desk Officer in the Services & Support Team
L2/3 Incident AnalystIs part of engineer/developer functions in all HD teams- IAT, DC, DWH, SOB; as well as the DPO, EA, other architects
Incident CoordinatorIs part of the role in all HD teams
Incident ManagerIs part of the role of Incident management in the Services & Support Team
manager Sun, 12/05/2021 - 14:54

RACI matrix

RACI matrix
RefFunctional Process ItemEnd UserService Desk AgentL2/3 Incident analystIncident CoordinatorIncident Manager
I01Incident IdentificationIRRR, A
I02Incident LoggingIR   A
I03Incident Categorization & PrioritizationIR  R A
I04Known error R  R A
I05Incident diagnosisR R A
I06Investigate & resolve R I A
I07Consult end-userCR R A
I08Set incident as resolved R R A
I09L2 needed ? R R A
I10Assign incident to L2/L3 RI R A, I
I11Investigate incident IR I A
I12Change required ?  R R A
I13Incident resolved ?CIRRA
I14Reassign ticket to Service Desk CRIA, I
I15Apply workaroundCRICA, I

RACIDescription
A = AccountableThe single owner who is accountable for the final outcome of the activity.
R = ResponsibleThe executor(s) of the activity step.
C = ConsultedThe expert(s) providing information for the activity step.
I = InformedThe stakeholder(s) who must be notified of the activity step.
manager Sun, 12/05/2021 - 14:55

Process activity steps

Process activity steps

I01. Incident Identification

Input(s)A ticket can be initiated by phone call, email, portal, walk-in or via a monitoring event to the Service Desk.
Output(s)Incident ticket is identified in Service Now
StatusNew
DescriptionThe End-user can initiate an incident via :
Portal : preferable way of reporting an incident. Email : a user send a mail to Support.Healthdata@sciensano.be. The Service Desk has one day to pick this up and create a ticket on behalf of that user. Phone : a user contacts the Service Desk by phone to report an incident. The Service Desk will immediately, while on the phone, create a ticket on behalf of that user. Walk-in : a user can visit physically the Service Desk to report an incident. The Service Desk will immediately create a ticket on behalf of that user. Monitoring event : an alert or event can initiate the automatic creation of an incident.

I02. Incident Logging

Input(s)Details gathered from the End User is added to the ticket
Output(s)Ticket is enriched with information in the work notes.
StatusWork in Progress
DescriptionThe Service Desk will perform a first analysis of the incident ticket : It is not an incident, but a request : the incident ticket will be closed and the Service Desk will create an Service Request It is an incident : the ticket will be enriched with the first analysis.

I03. Incident Categorization

ObjectiveTo categorize every new Service Desk Record for assignment, diagnosis, and reporting purposes.
Input(s)Open Incident Record
Output(s)Categorized Incident Record
StatusWork in Progress
DescriptionThe Service Desk will verify or modify the category on which the incident has been opened :

I03. (2) Incident Prioritization

ObjectiveTo set an appropriate Priority for scheduling and handling the Incident.
Input(s)Open, Categorized Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionThe Service Desk will verify or modify the priority on which the incident has been opened. The priority is defined, conform the Master Service Agreement of the healthdata.be platform, by both Impact and Business Importance. The impact is defined based upon the following table.
ImpactSituation
HighThe incident affects all end-users
MediumThe incident affects a group of end-users
LowThe incident affects one or a limited number of end-users
NoneNo degradation of the Service
Description (cont.)When the situation changes over time, Impact and Priority will be adapted accordingly The priority is calculated as follows:
Business Importance LevelImpact
HIGHMEDIUMLOWNONE
GOLDPriority 1 (P01)Priority 2 (P02)Priority 8 (P08)Priority 40 (P40)
SILVERPriority 2 (P02)Priority 4 (P04)Priority 16 (P16)Priority 40 (P40)
BRONZEPriority 4 (P04)Priority 8 (P08)Priority 40 (P40)Priority 40 (P40)

I04. Known error ?

ObjectiveTo identify if a solution for the incident is already known.
Input(s)Open, Categorized and Prioritized Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionThe Service Desk will try to detect if a solution is already known in the knowledge base or in a problem record. If found, the Service Desk will apply this solution.

I05. Incident Diagnosis

ObjectiveTo define whether an incident can be solved by the Service Desk or not.
Input(s)Open, Categorized and Prioritized Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionIncident diagnosis will be carried out after the first analysis (I02)

I06. Investigate and resolve

ObjectiveTo resolve as many incidents as possible at the Service Desk.
Input(s)Open, Categorized and Prioritized Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionIncident investigation will be carried after the first analysis (I02) using all tools, skills, and techniques made available to the Service Desk. This may include matching to similar Incident Records, matching to Known Errors and Work-Arounds, use of knowledge bases and Frequently Asked Questions (FAQ) documents.
 Resolving the incident is the final step after the investigation.

I07. Consult end-user

ObjectiveTo have the confirmation of the end-user that the solution applied solves the incident.
Input(s)Open, Categorized and Prioritized Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusAwaiting caller information
DescriptionThe Service Desk will contact the end-user, preferably by phone If not available by phone, the Service Desk will send a mail. The SOP ‘Manage awaiting tickets’ will apply

I08. Set incident as resolved

ObjectiveTo set the incident ticket to status ‘resolved’ after confirmation of the end-user.
Input(s)Open, Categorized and Prioritized Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusResolved
DescriptionOnce the Service Desk has changed the status of the incident ticket to ‘resolved’, the end-user has 5 working days left to re-open the ticket. After 5 working days, the ticket will be automatically set to the status ‘closed’. The end-user will not be able to re-open the ticket, and has to create a new incident ticket.

I09. L2 needed?

ObjectiveTo determine, after diagnosis (I05), whether the Service Desk can solve the incident or a L2-group has to manage the incident.
Input(s)Initial Diagnosed Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionIf the Service Desk is able to solve the incident, the ticket will remain in the group and continue with I06. If Service Desk cannot resolve the incident, the ticket will be assigned to L2-group.

I10. Assign incident to L2/L3

ObjectiveTo assign the incident ticket to a L2/L3 group.
Input(s)Investigated and Diagnosed Incident Record
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionOnce the ticket is assigned to a L2/L3-group, the incident coordinator of that particular group has to manage this ticket, under control of the incident manager.

I11. Investigate incident

ObjectiveTo solve an incident, which could not be fixed by the Service Desk, as soon as possible.
Input(s)Investigated, diagnosed and documented Incident Record by Service Desk
Output(s)Open, Categorized and Prioritized Incident Record
StatusWork in Progress
DescriptionWith the investigation and diagnosis of the Service Desk, the L2-group will further investigate the incident, with possible help of an L3-group (external party)

I12. Change required ?

ObjectiveTo determine whether a change is required to solve the incident.
Input(s)Documentation from Level 2/3
Output(s)Fully Updated Incident Record
StatusWork in Progress/On hold/ Awaiting change
DescriptionAfter further investigation, the L2-group has to decide if a change is needed (functional or infrastructure related) to solve the incident. If not, the next step I13 is applicable, and the status remains ‘Work in Progress’. If yes, the change management process starts, and status is ‘On hold’, ‘Awaiting change’.

I13. Incident resolved ?

ObjectiveTo ensure that L2 was able to resolve the incident.
Input(s)Fully Updated Incident Record
Output(s)Updated Incident Record
StatusWork in Progress/On hold, Awaiting problem
DescriptionIf the incident is solved, the incident ticket will be updated. If no change is required, but the incident is still not solved : problem management process starts the status is set to ‘On hold’, ‘Awaiting problem’ a workaround has to be found to solve the incident temporarily

I14. Reassign ticket to Service Desk

ObjectiveTo ensure that the solution is validated by the end-user.
Input(s)Fully Updated Incident Record
Output(s)Updated Incident Record
StatusWork in Progress
DescriptionWhen the solution (permanent or via workaround (I15) is applied, the incident ticket will be reassigned to the Service Desk, who will continue with step I07..

I15. Apply workaround

ObjectiveTo ensure that L2 was able to resolve the incident.
Input(s)Fully Updated Incident Record
Output(s)Updated Incident Record
StatusWork in Progress/On hold, Awaiting problem
DescriptionIf the incident cannot be solved permanently : A workaround has to be applied to solve the incident temporarily The ticket is reassigned to the Service Desk (I14)
manager Thu, 01/13/2022 - 11:52