We had a need to put in place a vulnerability management system for our servers, and it needed to contain a ton of different data from multiple systems, bringing it all together in a way that was relatable in order to provide a “scorecard” for each server that could be rolled up by business unit.
So we built it.
I want to document a bit of this, partially so I can remember how we did it, but also so that others can hopefully learn from our mistakes.
When the process first started, I was approached with a request to build a “health check” report for our servers. It was practically impossible for us to understand the overall security status of a particular server, considering all of the variables and different systems that held part of the data. In order to understand the “health” of a server, we need to be able to know:
- What high-level business applications run on it?
- What software is installed on the server to support that business application?
- Does the application fall in scope of any of our security and regulatory compliance programs (e.g. S-Ox, PCI, PII, GLBA)? And if so, what are the algorithms that determine whether this server falls into scope?
- What basic tools does the server need installed for day-to-day management and monitoring?
- What additional tools does the server need installed for compliance and regulatory compliance (e.g. HIDS for PCI)?
- Are those tools reporting correctly, and are they configured in the right way?
- Are all the tools reporting conflicting information? For example, is the software asset management tool reporting an installation of a monitoring tool, but the console for that tool has not received any communication from that agent? That can imply misconfiguration (or simple disabling) of a particular tool.
- What vulnerabilities exist on the server? And are they:
- missing patches
- configuration file issues
- missing tools
- incorrect group memberships
At the end of the day, there are two outputs from collecting and understanding this pile of data
- The “health check” report, which can algorithmically be converted into a “risk score” for each server
- The “activity list” report, which is the list of things that need to be done to this server to reduce the “risk score”.
To build this, we leveraged:
- MS SQL (database to store all the collected data)
- SQL Reporting Services (to produce the two reports listed above, as well as a metric buttload of other reports)
- SQL Integration Services (to import and aggregate all the data from the multiple sources)
- Iron Speed Designer (for the interface)
All of this to bring in data from (currently)
- Our Application Portfolio Manager (to understand the relationship between servers and business apps, and the scopes for those applications)
- Service Center (the quasi-CMDB and server asset management tool, to get basic data on the servers themselves)
- Our event logging tool
- Our HIDS tool
- Multiple A/V tools (including different versions of McAfee and Symantec agents)
- The database monitoring and encryption tool)
- Multiple vulnerability management and patch deployment systems
- Our internal vulnerability assessment tools, which assign categories and overall security severities and importance to the discovered vulnerabilities
- The software asset management tools
- The reporting tools from the supplier/vendor supporting the server hardware itself
- Several other smaller utilities and consoles to provide additional required data: financial, business unit ownership, responsibility and ownership hierarchies
More details in coming posts.