Wednesday, July 4, 2012

Network Operation Center (NOC) Best Practices – Part 1: Tools

Today, Network Operation Centers (NOC’s) are under a great pressure to meet their IT organization’s demands. However, many NOCs struggle to meet these demands with insufficient tools, knowledge or skills.

In this 3 parts blog post series, we will provide tips on how to ensure you have the right tools, knowledge and processes in place to improve and manage your NOC’s performance and response time.

This first part of our ‘NOC best practices’ series is dedicated to tools, which are an essential element in NOC management and a key feature for improvement.

A ticketing system

A ticketing system will enable you to keep track of all open issues, according to severity, urgency and the person assigned to handle each task. Knowing all pending issues will help you to prioritize the shift’s tasks and provide the best service to your customers.

Knowledge-base system

Keep a one centralized source for all knowledge and documentation that is accessible to your entire team. This knowledge base should be a fluid information source to be continuously updated with experiences and lessons learned for future reference and improvements.

Reporting and measurements

Create reports on a daily and monthly basis. A daily report should include all major incidents of the past 24 hours and a root cause for every resolved incident. This report is useful and essential for the shift leaders and NOC managers. It also keeps the rest of the IT department informed about the NOC activities and of major incidents. Compiling the daily reports into a monthly report will help measure the team’s progress. It will also show areas where improvements can be made or indicate any positive or negative trends in performance.

Monitoring

There are two types of monitoring processes relevant to NOC:
(1) Monitoring infrastructure and (2) User experience.

A monitoring infrastructure can consist of the servers, the network or the data center environment. User experience monitoring involves the simulation of user behavior and activities in order to replicate problems and find the most effective solutions. Implementing a service tree model that connects the monitoring infrastructure with an affected service will allow your team to alert other areas that may be affected by the problems experienced.

IT Process Automation

Implementing IT Process Automation significantly reduces mean time to recovery (MTTR) and helps NOCs meet SLA’s by having a procedure in place to handle incident resolution and to consistently provide high quality response regardless of complexity of the process. IT Process Automation empowers a Level-one team to deal with tasks that otherwise might require a Level-two team. Some examples include password reset, disk space clean-up, reset services etc. IT Process Automation is also a major help with reducing the number of manual, routine IT tasks and free up time for more strategic projects. 

No comments:

Post a Comment