I am working on the JANOS v2.4.1 release candidate and hoping to add some capabilities to quantify and then diagnose network connectivity issues. You know, from the perspective of the JNIORs. I am wondering if you guys have any ideas?
Searches lead you to use PING. In fact in later versions of JANOS there is the PING -F flood command which basically pings until you stop leaving a trail of decimal points on the screen when pings or their responses are lost. That tests the connection but only at the moment. That doesn't help if you had a network issue a couple of hours ago. There is also PING -V which pings all of the configured servers (like the gateway) so you can see if you are connected and your configuration is good to go. It also pings us here to test Internet connectivity (or lack thereof). None of this helps with what may have happened in the past and will in the future. A BW test is obviously not the right thing either.
So far I have implemented 3 things. First while you can tell when the JNIOR reboots by checking the log, and you can see if that was specifically because of a reboot from that, there is no indication of periods of power down. Now there will be a table of the last 32 operational sessions. Why? Because if you can't reach the JNIOR over the network, is the network down or the JNIOR?
Note that the syslog tab in the WebUI displays events from most recent to oldest. These screen captures are from that. At the command line CAT displays from oldest to latest (unless you use the newer CAT -R command to reverse the line order).
shutdown.png
This was an update and the unit came right back up but if it were powered off for a while you will see:
powered_down.png
My development unit has seen some reboots, eh?
And if that isn't enough there is the (currently undocumented) command PS -H that displays the past. This shows boot time and duration of operation.
On the network side I am adding round-trip and retransmission information to the NETSTAT connection table.
Here 'rtt' is the SRTT (smoothed round trip time) and 'var' is the RTTVAR (variance) as defined by RFC 6298 both in milliseconds. Along with this there is a count of packet retransmissions for that connection (rtx). So you can see if something is slow or struggling. This has already been used to identify a poor network bridge and then have it replaced.
Um, this unit is using MODBUS (port 502) to monitor 4 solar inverters.
And, last (so far) but not least, I have added 4 new network availability and BW statistics to the NETSTAT -A adapter statistics. This includes a packet retransmit per hour rate. These new statistics are accumulated from the last 24 hours of operation. These are logged by the minute. The other adapter stats are since power-up.
There is a non-volatile bandwidth table for the past 24 hours by the minute. It seems that a plot of this would be helpful. You could see network down/dead times. Or, high RTX times? But there is no plot yet. Some of these statistics you will likely never get from any other TCPIP stack. Just saying.
So if you have any other ideas or thoughts I am open to suggestions.
Searches lead you to use PING. In fact in later versions of JANOS there is the PING -F flood command which basically pings until you stop leaving a trail of decimal points on the screen when pings or their responses are lost. That tests the connection but only at the moment. That doesn't help if you had a network issue a couple of hours ago. There is also PING -V which pings all of the configured servers (like the gateway) so you can see if you are connected and your configuration is good to go. It also pings us here to test Internet connectivity (or lack thereof). None of this helps with what may have happened in the past and will in the future. A BW test is obviously not the right thing either.
So far I have implemented 3 things. First while you can tell when the JNIOR reboots by checking the log, and you can see if that was specifically because of a reboot from that, there is no indication of periods of power down. Now there will be a table of the last 32 operational sessions. Why? Because if you can't reach the JNIOR over the network, is the network down or the JNIOR?
Note that the syslog tab in the WebUI displays events from most recent to oldest. These screen captures are from that. At the command line CAT displays from oldest to latest (unless you use the newer CAT -R command to reverse the line order).
shutdown.png
This was an update and the unit came right back up but if it were powered off for a while you will see:
powered_down.png
My development unit has seen some reboots, eh?
And if that isn't enough there is the (currently undocumented) command PS -H that displays the past. This shows boot time and duration of operation.
Code:
jrbarn /> ps -h Process History 0 07/28/23 14:14:39 1 Day 20 Hours 04:48.940 1 07/30/23 10:19:55 6 Hours 03:10.303 2 07/30/23 16:30:20 41:01.713 3 07/30/23 17:11:45 18 Hours 48:55.328 4 07/31/23 12:01:03 58:26.291 5 07/31/23 12:59:53 21 Hours 52:07.940 6 08/01/23 10:52:26 22 Hours 49:53.002 7 08/02/23 09:42:44 1 Hour 08:40.623 jrbarn />
Code:
jrbarn /> netstat LAN connection active (100 Mbps) Server/Connection count: 12 LocPrt RemPrt Remote IP rtt var rtx State 1: 21 ---- --------- LISTEN FTP 2: 9220 ---- --------- LISTEN JMP Service 3: 9200 ---- --------- LISTEN JNIOR Protocol 4: 80 ---- --------- LISTEN HTTP 5: 443* ---- --------- LISTEN HTTPS 6: 23 ---- --------- LISTEN Telnet 7: 80 38046 192.168.2.100 3 ESTABLISHED 8: 62623 502 192.168.2.91 18 1 ESTABLISHED 9: 64604 502 192.168.2.92 18 1 ESTABLISHED 10: 54604 502 192.168.2.93 18 3 ESTABLISHED 11: 50936 502 192.168.2.94 16 2 ESTABLISHED 12: 80 49247 209.195.188.17 152 61 ESTABLISHED * TLSv1.2 encrypted socket jrbarn />
Um, this unit is using MODBUS (port 502) to monitor 4 solar inverters.
And, last (so far) but not least, I have added 4 new network availability and BW statistics to the NETSTAT -A adapter statistics. This includes a packet retransmit per hour rate. These new statistics are accumulated from the last 24 hours of operation. These are logged by the minute. The other adapter stats are since power-up.
Code:
jrbarn /> netstat -a LAN connection active (100 Mbps) Connects : 1 Packets Received : 23481 Packets Sent : 19506 Packets Captured : 34021 Multicast Frames : 8951 Bytes Received : 4255.3 KB Bytes Sent : 2870.6 KB Ping Replies : 0 Receive Errors : 0 Overruns : 0 Availability : 99.9966 % Average BW : 9.0 Kbps Peak BW : 186.7 Kbps Retransmit Rate : 0.2 per hour jrbarn />
So if you have any other ideas or thoughts I am open to suggestions.
Comment