I’ve been updating some of our security documentation explaining what we do to ensure our product is suitable for the security models in regulated industries, such as finance and healthcare. Talking to our security guys, I was flabbergasted to find out that there are monitoring products out there that go against what is not only an industry best practice but also not the right thing to do: agents that open and listen on fixed TCP ports!
Manager/Agent Architecture is Widely Used
Most monitoring products use a manager-agent architecture with either TCP or UDP protocols being used for communication. TCP is connection-oriented and guarantees delivery of messages while UDP is connection-less and does not offer guaranteed delivery.
SNMP, one of the most widely used protocols for network management for years, uses UDP, but most monitoring systems for applications, servers, and storage today rely on the TCP protocol.
In the monitoring system architecture, “agents” are installed on key servers/locations in the target infrastructure and collect performance metrics. For instance, an agent could be installed on a guest VM of a VMware environment and can collect metrics such as CPU usage of the VM, memory usage of the VM, disk activity on the disk drives of the VM, etc. Sometimes, monitoring can be agentless. For example, monitoring of network devices is using SNMP and a “proxy” agent or an “agentless data collector” is responsible for collecting metrics about the network devices.
Data collected by the agent is communicated to a “management server” for analysis, storage, and for alerting IT operations teams when abnormal situations are detected. Figure 1 shows the communication architecture of eG Enterprise. The management server can be installed on-premises, or it could be in the cloud and offered as SaaS.
There are several types of messages that must be communicated between the agents and the manager:
- The agent may perform discovery and indicate the types of applications, servers, and resources to be monitored.
- The manager tells the agent what metrics to collect, how often to collect these metrics, and the thresholds against which it should compare the metrics it collects.
- The agent needs to send metrics collected periodically to the manager.
Communications may also be needed for automatic upgrade of the agents. The manager may also issue commands to the agents based on administrator inputs. For example, if an administrator wants to stop and restart a service on the remote system, corresponding commands will be sent from the manager to the agents.
TCP Ports and Sockets Explained
Each system in a network is identified by its IP address. However, an IP address alone is not sufficient as a computer can run multiple applications and/or services. A port on a computer identifies the application or service running on the computer.
A port is a communication endpoint. At the software level, within an operating system, a port is a logical construct that identifies a specific process or a type of network service. A port is identified for each transport protocol and address combination by a 16-bit unsigned number, known as the port number.
Ports are used by agents to communicate between agents and the management server. While a server application listens to a pre-defined/well-known TCP port, a client application connects to the server application on this port and communicates with it. In this case, a connection is initiated by the client, but once the connection is initiated, communication can be in either direction.
Typically, server applications listen on well-known ports. For example, a web server listens on port 80 or 443 while a mail server listens on port 25 or 587.
Should a Pull or Push Model be Used?
Manager/agent architectures can use a push model or a pull model:
- In a pull model, the agent listens on a well-known TCP port(s) and waits for instructions from the manager. The manager can request metrics from the agent, or it can tell the agent to run a script, or the manager can instruct the agent to update the sampling interval used for metrics collection. The advantage of this model is that the manager has greater control over the agents.
- In a push model, the agent does not listen on any TCP ports. It connects to the manager on a well-known TCP port and gets instructions from the manager. So, in this case, the manager does not have as tight a control over the agents as it does in the pull model.
The challenge with a pull model is that TCP ports are opened on all the systems on which agents are installed.
An open port does not immediately mean a security issue. However, it can provide a pathway for attackers to applications listening on a port. Therefore, attackers can exploit shortcomings such as weak credentials, absence of two-factor authentication, or even vulnerabilities in the application itself, particularly if an agent is allowed to run scripts supplied from the manager. Monitoring and securing open ports are an overhead and as such it is a best practice to simply avoid unnecessarily open ports and avoid listening methodologies to minimize the attack surface available to malicious third parties.
As indicated by Lifars, it is important to monitor and rapidly patch applications that listen on TCP ports. This can be a significant overhead if you are monitoring hundreds of systems with such agents.
Open TCP ports could be a bigger issue if the systems being monitored are in the cloud/Internet. Attackers can use open ports as an initial attack vector. Furthermore, listening ports on a local network can be used for lateral movement. It is a good practice to close ports or at least limit them to a local network.
There are other considerations when evaluating a pull vs. push model for communication. A pull model requires that the manager have a direct communication path to the agents. If the agents are within a private network, the manager has to be in the same network as well. This is a severe restriction. To get over this, additional firewall rules need to be created to allow manager-to-agent communication, which will not please your IT security teams!
Because of the above considerations, eG Enterprise uses a push model for communication from the agents to the manager. Our agents do not listen on any TCP ports and are hence, highly secure. While the manager does not open connections to the agents (because the agents do not listen on TCP ports), it sends instructions to an agent whenever that agent connects to it (remember, communication can be bi-directional after a connection is established).
eG Enterprise’s Secure Architecture for IT Monitoring
Beyond choosing a product that avoids unnecessarily open ports, communication between the agents and the manager should be secured and use standard, secure, modern industry standards for protocols, authentication, and encryption. All of these are in-built elements of the eG Enterprise architecture. The key aspects of the eG Enterprise architecture are as follows:
- No listening ports open on the agents: eG agents “poll” the manager for information such as what tests the agent should run, how frequently it should run, etc. The agent itself does not listen on any specific port for requests. This minimizes the security risk to the systems on which the agents are deployed.
- 100% web-based communication with the eG manager: IT infrastructures typically include multiple demilitarized zones. From a security perspective, most IT infrastructure operators view SNMP and other proprietary protocols suspiciously. On the other hand, HTTP/HTTPS traffic is not a serious security threat. Consequently, the agent uses HTTP/HTTPS for all communications with the manager. HTTPS support is particularly useful for remote monitoring across multiple locations wherein the manager may be at a central location, and the agents at remote locations that use the Internet to communicate with the manager.
- Unidirectional connection initiation only – from the agent to the manager: All HTTP/HTTPS communications in the eG Enterprise architecture originate from the agent. Consequently, any firewalls between the eG agent and the manager need to allow HTTP/HTTPS connections to be initiated in one direction only – i.e., from the eG agent to the manager. The manager never attempts to communicate directly to the agents.
- Authenticated communication: The manager/agent communication protocol has authentication built in. When it receives a request from an agent, the manager validates the agent based on the IP address of the host from which it is communicating. This authentication mechanism ensures that only eG agents can communicate with the manager. Furthermore, the manager checks the list of managed servers to make sure that the agent connecting to it happens to be from one of these servers.
- Support for private networks, NATed environments: As the eG manager never initiates communication with the eG agents directly, the eG Enterprise system supports even private networks that may not be directly accessible from the eG manager system. For example, the network being managed could be behind a Network Address Translator (NAT) and the server being managed may have a private IP address. As the eG Enterprise architecture allows secure web-based communication, there is no explicit need to set up VPNs (Virtual Private Networks) just for the purpose of monitoring.
- Support for communication through web proxy servers: In some IT environments, administrators may require that all communications out of their network be passed through a central server. Because eG Enterprise uses web protocols for communication, the agents can be configured to communicate to the manager through a web proxy. These communications can be secured through standard web authentication mechanisms.
- Encrypted communication: The manager/agent communication can be encrypted using industry standard SSL technology. Encryption prevents any third-party from snooping and decoding the data transmitted between the manager and the agents.
- Failover capabilities: If the agent-to-manager communication link goes down, data is temporarily stored on the agent system for a pre-defined period of time. When the network link comes up again, the agent uploads the stored data back to the eG manager for storage and analysis. This ensures that the eG agent architecture is robust to sporadic network failures.
So, if you are evaluating any monitoring product, please do ask those key questions on port security and ask for port security overview and be wary, especially if two-way connection initiation is involved. An open port listening on every agent certainly isn’t wise, and moreover, is unnecessary as better architectural models exist.