This article will be troubleshooting the most common issues - if the device are staying offline, Cannot see any data/logs (PoE, CPU usage, data usage, logs etc.), configurations are not being pushed (firmware management), using the event logs.
We are very happy to see our Nebula Cloud Solution picking up momentum out in the field as well as being more and more a key-focus when it comes to expanding our portfolio. What has begun as a "spin-off"-solution has become, in alignment to our investments and predictions, a cornerstone of our operations and products. However, with this shift in focus, Nebula-capable equipment is being used more and more in professional networking environments. This shift bears the question for many of our customers "What can I do to analyze my issues, before reaching out to somebody else?"
This guide will provide some ideas and input on things to check in case your devices do not function properly - some of them are more of a generic nature (= best practice examples), some of them are a bit more specific to different technology fields.
As a pre-requisite, we already have a guide on how to access Nebula-capable firewalls via SSH, which already in conjunction with CLI Reference Guides from the might help you gain a bit of an insight into different possibilities on what to check for, when connecting via SSH to the respective Nebula devices:
Accessing your Nebula Firewall via SSH and doing a Packet Capture / Packet Trace from the internet
Note: Nebula-managed devices are not meant to be configured in any shape or form via console connection - please only use SSH access onto Nebula devices for monitoring and possibly, in cooperation with our support/R&D department, debugging measurements.
Devices are staying offline
Picture this - you have registered your device successfully, maybe even configured some settings to your liking and you are now in the deployment phase of your device. However, for reasons yet unknown to you, the device in question just simply does not want to come online both on site, operating, as well as fully showing being online and responsive.
This is one of the most common issues when deploying Nebula devices - the reason for this, in short terms, is that the devices by default try to resolve to either s.nebula.zyxel.com or d.nebula.zyxel.com. This bears two issues:
- A multiple times cascaded subdomain (e.g. this.that.andthis.alsothis.domain.com) can lead to issues for some DNS providers
- According to newer standards, a subdomain can also consist of a single character, while it used to be a minimum of three characters. Some DNS servers are having issues with this.
Very often, we see internet service providers DNS servers having issues with this. Our recommendation: change the default DNS server to either 8.8.8.8 or 1.1.1.1. More information to be found here:
I cannot see any data!
Sometimes, customers reach out to us, complaining that they have logged in for the first time since longer time and their data is either gone or "nothing is shown". It could be, that due to longer inactivity on the portal, the organization is set to Cloud-Saving Mode, which basically cuts off all collection of information after an idle period of 30 days. Cloud-Saving Mode should be indicated by a prompt and by a banner (light blue banner). The Solution: turn off Cloud Saving Mode and wait a few minutes, the data should be then back soon.
More information can be seen here: Cloud-Saving Mode (Nebula BASE Pack)
Configurations are not being pushed
You might have the issue, that your device is actually showing online, but changing the configuration seems to not apply? Check the firmware! Some changes might require the latest firmware to function properly, and sometimes changes are held back due to that:
This is a perfect example for what we would not want to see - in this case, the firewall firmware is outdated, and this holds back eventually new blocks of configuration bein applied, while all other stats (CPU, RAM usage, Client number, online/offline state) seem fine. The holdback of configuration chunks applies especially, but not exclusively, to new features implemented in most recent firmware. So that, apart from the "DNS-Issue" presented above, might be valuable tips.
Monitoring via Nebula - Useful insights at your fingertips!
The monitoring features provide real benefits and tools to your monitoring and analytics assets. Just to begin with something simple - a device is showing offline, but you have the feeling it's still communicating with the Nebula Cloud Center properly? Just go to the devices main page - for gateways, this would be:
Devices -> Firewall
and make a Ping-Test - if the ping test actually manages to successfully tell the device "Ping destination W.X.Y.Z" and the device actually receives this and can give feedback of successful pings, this means the device is in fact still communicating the the NCC and the offline state might be a browser cacheor cookie related issue.
Another real-life example: multiple switch-ports constantly show event log messages indicating a toggling between 10Mb/s connections and 1Gb/s connections, and you are uncertain if the switch is causing issues?
Check the MAC table via:
Devices > Switches > (select Switch) > Live Tools > Switch Tables > Run-Button
You know have information which might help you - in said real-life example, it turned out that all switch ports showing this behavior had a very specific MAC vendor, which in turns points rather toward a specific client type having this issue, and that meaning the switch had no issue - a possible solution here could be setting up a fixed speed on said clients and the respective switch ports.
There are many other examples on how the monitoring can work miracles for resolving errors, but I guess you get the point on this.
Using the event logs to get a better understanding of processes "under the hood"
The event logs, which each technology field has their own of, offer you a tremendous help in informing you on reasons why things do not work - especially in WiFi related issues.
Let's have a look at one example via a WiFi Network, showing three different entries:
Let's analyze the 3 different points on after another:
- This tells us that a certain station, with a certain MAC-Address, has connected to the respective Access Point. It has connected with a signal strength of -81dBm, which eventually is too weak (for reference check this tutorial: Improving WiFi roaming on my Nebula Access Points? )
- A station has left a specific SSID, AP, Channel due to a WiFi reason 3. WiFi reason codes are unified codes to identify and understand WiFi behavior. For reference, please check this article: What is the meaning of 802.11 Deauthentication Reason Codes?
WiFi Reason Code 3 simply states that the Station has proactively decided to leave the SSID. - There was a roaming process (= moving from one AP to another) of a certain client.
Now this is of course only a peak glimpse into what event logs can tell you, but it shows that event logs might be a powerful tool in understanding what is actually happening to your network. Specifically in regard to WiFi issues, there are additional tools to evaluate the WiFi quality, such as Wireless Health, Client lists etc. the following articles may give you more insight on this topic as well:
Comments
0 comments
Please sign in to leave a comment.