We had a need to put in place a vulnerability management system for our servers, and it needed to contain a ton of different data from multiple systems, bringing it all together in a way that was relatable in order to provide a “scorecard” for each server that could be rolled up by business unit.

So we built it.

I want to document a bit of this, partially so I can remember how we did it, but also so that others can hopefully learn from our mistakes.

When the process first started, I was approached with a request to build a “health check” report for our servers.  It was practically impossible for us to understand the overall security status of a particular server, considering all of the variables and different systems that held part of the data.  In order to understand the “health” of a server, we need to be able to know:

  • What high-level business applications run on it?
  • What software is installed on the server to support that business application?
  • Does the application fall in scope of any of our security and regulatory compliance programs (e.g. S-Ox, PCI, PII, GLBA)?  And if so, what are the algorithms that determine whether this server falls into scope?
  • What basic tools does the server need installed for day-to-day management and monitoring?
  • What additional tools does the server need installed for compliance and regulatory compliance (e.g. HIDS for PCI)?
  • Are those tools reporting correctly, and are they configured in the right way?
  • Are all the tools reporting conflicting information?  For example, is the software asset management tool reporting an installation of a monitoring tool, but the console for that tool has not received any communication from that agent?  That can imply misconfiguration (or simple disabling) of a particular tool.
  • What vulnerabilities exist on the server?  And are they:
    • missing patches
    • configuration file issues
    • missing tools
    • incorrect group memberships

At the end of the day, there are two outputs from collecting and understanding this pile of data

  1. The “health check” report, which can algorithmically be converted into a “risk score” for each server
  2. The “activity list” report, which is the list of things that need to be done to this server to reduce the “risk score”.

To build this, we leveraged:

  • MS  SQL (database to store all the collected data)
  • SQL Reporting Services (to produce the two reports listed above, as well as a metric buttload of other reports)
  • SQL Integration Services (to import and aggregate all the data from the multiple sources)
  • Iron Speed Designer (for the interface)

All of this to bring in data from (currently)

  • Our Application Portfolio Manager (to understand the relationship between servers and business apps, and the scopes for those applications)
  • Service Center (the quasi-CMDB and server asset management tool, to get basic data on the servers themselves)
  • Our event logging tool
  • Our HIDS tool
  • Multiple A/V tools (including different versions of McAfee and Symantec agents)
  • The database monitoring and encryption tool)
  • Multiple vulnerability management and patch deployment systems
  • Our internal vulnerability assessment tools, which assign categories and overall security severities and importance to the discovered vulnerabilities
  • The software asset management tools
  • The reporting tools from the supplier/vendor supporting the server hardware itself
  • Several other smaller utilities and consoles to provide additional required data: financial, business unit ownership, responsibility and ownership hierarchies

More details in coming posts.

Advertisements