Through this blog, we will discuss a recent and interesting emergency call from one of our customers at a large surface mine, highlighting the importance of proactive monitoring of your wireless network and implementation of a proper process around configuration management.
This customer, which is a large, well-connected and technologically advanced mine, had started noticing sudden network performance degradation, which appeared out of the blue on what had been an otherwise high-performing network.
Troubleshooting a typically high performing wireless network
The customer had been working at the problem for about 4 days when they phoned us unable to figure out the root cause. It was only getting worse, and they were beginning to lose significant amounts of production time.
The issue was presented as high utilization of all network links supplying the wireless network, and high frame utilization and throughput on all of the Point-to-Multipoint (PMP) bridges providing backhaul to the pit network. Ultimately, this resulted in missed and delayed communications between their Dispatch system and vehicles in the pit.
The initial tentative diagnosis was a network loop, and inspection of the switch logs showed that there were configuration problems – so the first place we looked was at their switch, access point, and endpoint configurations. The wireless network was configured well and was passing traffic (loads of it) just as it should. There was just way too much of it.
However, inconsistencies in their wired network configurations were noticed – things that, while not causing the immediate problem, were adding to the overhead of the network, and making troubleshooting difficult for the technicians on site. None of this really explained the high network utilization, but it wasn’t helping at best, and at worst it was contributing to the confusion as to the root cause.
The importance of controlling the configuration of your network devices
After an initial inspection of the configuration, symptom, and logs, our first question was “What has been changed or added to the network recently?”.
In our experience, sudden increases in network utilization on a previously well-performing network are almost certainly changes in configuration, or the addition of a misconfigured device to the network.
The customer team was aware of nothing new added to the network and there had been no communicated configuration changes, so we started to pull some packet captures from a few locations on the network.
In reviewing the packet captures pulled from the wired and wireless network, it was evident there was a device flooding the network with multicast packets. We identified the MAC address and asked the customer to see if that matched any of their records.
The customer sent an e-mail back shortly letting us know they had identified the rogue device as one that was plugged in on a haul truck unknowingly. It was misconfigured and causing most of their issues.
Access control: The key to guaranteeing reliability of your network
- – Edge network configuration is important, even if it is often outside of the scope of supporting a wireless network- because it directly influences traffic patterns and reliability of data transport.
- – Configuration consistency and governance is vital for any operation – but even more so in larger, advanced operations. Multiple teams are necessary to scale and operate in large environments, but multiple teams bring complex lines of communication and responsibility, increasing the potential for inconsistencies, misconfigurations, and lost production time.
- – Access control is necessary even on secured or air-gapped networks for consistent performance and reliability of troubleshooting. There can be little confidence about network topology or integrity unless there are access controls at the hardware and software level.
Access control is often thought of as a security function to protect against bad actors (and, it is), but it also protects the network from those with the best of intentions. The electrical team, operator, or technician who may see something unplugged and plug it in to help, or to troubleshoot.
At its core, this scenario is one that 3D-P’s Managed Services offering can help prevent.
Configuration management would have kept device and network configurations consistent and alerted us to any unapproved changes. Network monitoring would have alerted us to the presence of a multicast storm, and proper network access control would have prevented the rogue device from operating on the network in the first place.
To learn more about our Managed Services, contact us today.