Thank you very much for that @Liz. I am eager to hear what comes of their work.
My system here had received a new radio earlier this month. Yesterday, I received and installed a new satellite modem. The jury is still out here on how well it is doing because we have been hit with some rough weather.
I wonder if you are familiar with the ethernet concept that handles packet collision. This little discussion is not related to satellite communications per se, just the transmissions between multiple stations connected by a single ethernet cable which daisy chains from one station to the next, to the next, etc. It is a typical setup that might be used for the network in an company with a lot of offices, for example. Assume there are multiple stations on an ethernet circuit and two of them decide to send a data packet at essentially the same moment. There is collision detection circuitry used to detect that multiple stations are sending at the same time. It then puts a long pulse onto the ethernet cable that blocks both messages from getting through. The transmitting stations each recognize this and respond by each one transmitting its packet again after a random delay period. The assumption is that the two stations will choose different delay periods so their retransmissions do not collide again.
I suspect that the HughesNet satellite communications system works in a similar manner with regard to the many ground stations sending packets to the satellite. As long as each ground station's transmitted packet does not overlap that of another ground station, all goes well. If multiple ground stations try to transmit at the same moment, the satellite senses the packet collision and transmits a signal telling those stations their packets failed to get through and to retransmit them after some suitable delays.
I run an application program much of the time that keeps sending PING messages and it keeps track of how many of them succeed and how many fail to make the round trip to and from a selected server on the internet. Under typical conditions when weather is not an issue and we are not in the midst of one of these outages that we have been fighting, a usual success rate for the PING messages hovers around 50%. It does vary in a rather random way, a little higher or a little lower for a minute or two at a time but generally hangs near that 50% mark.
Wednesday evening, we had a lot of heavy rain cells passing through the area, replete with several tornado warnings. Not surprisingly, we had spells where no PING messages were succeeding at all. My app flagged those periods as outages but it was clear that such outages were caused by the local weather.
However, I noticed something else that happened on at least one occasion when there was no rain in my location. There was a period when the success rate for the PING messages rose considerably above the usual rate, well above the 75% success rate, and stayed there for a while. Later, it returned to its usual behavior.
I kept wondering what could have caused that. Now I can propose a scenario about what caused this that is related to my theory about the satellite handling those packet collisions that I mentioned above.
I suspect some of the heavy rain that was around the state but not in my own area was blocking transmissions from many of the other HughesNet ground stations. That gave my PING messages a much better chance of getting through.
Of course, with many people having their attention drawn to the weather, perhaps they were not so likely to be using the internet in the first place. Either way, it is a theory.
@Matt_Is_My_Name, What do you think? Can you propose another explanation?
After at least 2 full months of poor service, and after forking out $125 to have my radio replaced and then replacing the modem and then the problem remaining just as bad as ever for several days ever since, just a few minutes ago my app that tracks success rates on PING messages started registering 100% success all the time! Somebody somewhere fixed something for sure. I suspect one of those engineers did something at the gateway that really made a difference.
Well now I am really confused. It turns out that my PING messages only enjoy a 100% success rate if I connect my Belkin WiFi router to the Ethernet circuit. Even though the computer itself is still connected by Ethernet (and not wirelessly) to the router, the router is doing something that causes the 100% PING success rate that I do not get when the computer is connected directly to the modem by Ethernet cable without the router involved at all. Perhaps the router provides a layer of retransmission protocol that covers up the dropped PING messages. Or perhaps the router is providing better Ethernet drivers or transmission cable terminations, thereby improving the success rate between my computer and the modem. In any case, I think I shall stop blaming HughesNet for my low PING message success rates, and I shall keep my router connected unless I need to remove it for a test.
I will still be monitoring for the long dropouts of HughesNet service. I have been gathering some information about timing that may help the engineers. Assuming the long dropouts continue, I will be back with that information soon.
Lets look at what a "ping" is:
It is a request for a response from your local computer or device to a specified server somewhere in the world that is connected to the Internet.
That is usualy a more or less straight forward thing when using a ground based ISP.
Path: yours to theirs:
Looks like this:
Simple isn't it. Your ISP is connected to the Internet backbone at its "head-end" and then your ping is routed through a number of routing switches to finally reach the desired server that responds. when it can depending on its load.
That constitutes a "loop". Lets call this Loop #3.
Lets look a little deeper:
Your connect wirelessly to your Router ... that connection is all "local" .... that is a "loop". Lets call this Loop #1. Loop #1 is all on the user end.
Your router router passes the info to the Hughes Modem
The Modem also has a loop. Modem>Satellite>Gateway>Headend. Lets call this Loop #2. Loop #2 is all on the Hughes end.
Your data then is passed onto Loop #3 ... the Real Internet.
You have total control of Loop #1 .... wireless with the exception of perhaps the wireless portion of a HT2000w
Loop #2 is totally Hughes.
Loop #3 is not under Hughes control.
The sever you task with a request for a responce is beyond Hughes's control.
You have to break things down to determine just where the issue is.
The Hughes Modem offer a Gateway Continuity test with the SCC at 192.168.0.1
This is a test of Loop #2 ... the one that has control of.
Saw something new in the event log. I have no idea what it means, but I figured I would share anyway.
|05/29/2017 12:41:59||12305||4||1||Command initiated ranging: Range every rate in the trajectory table|
|05/29/2017 12:42:07||12802||2||1||Start Range Process Sym 512 Code 2 Process id 8 Reason NOC INITIATED Frame 376866550 Group 25|
|05/29/2017 12:42:13||12001||2||1||Uplink DOWN | Started on frame 376866551 | Current State Code is 12.8.2 | SQF=111 Symcod=6 IGID=25|
|05/29/2017 12:42:13||101||2||1||System StateCode DOWN | Started at 05/29/2017 12:42:08 | Current State Code is 12.8.2 | SQF=111 Symcod=6|
|05/29/2017 12:42:20||12803||2||1||Ranging Final State: Frame 376866817 Group 25 SymRate 512 Code Rate 2 MinEsNo 46 TargetEsNo 70|
|05/29/2017 12:42:20||12803||2||1||Ranging Final State: InitialEsNo 152 FinalEsNo 70 Final BackOff 144 Outcome SUCCESS - MET TARGET ESNO|
|05/29/2017 12:42:20||12504||3||1||AIS/CLPC Filter Reset due to Ranging|
|05/29/2017 12:42:25||12002||2||1||Uplink RECOVERED | Recovered on frame 376866835 | Outage Time 000:00:00:12 | Last State Code was 12.8.3 | SQF=111 Symcod=6 IGID=25|
|05/29/2017 12:42:25||102||2||1||System StateCode UP | Recovered at 05/29/2017 12:42:20 | Outage Time 000:00:00:12 | Last State Code was 12.8.2 | SQF=111 Symcod=6|
|05/29/2017 12:42:50||12802||2||1||Start Range Process Sym 4096 Code 2 Process id 9 Reason BASELINE RATE SET Frame 376867488 Group 26|
|05/29/2017 12:42:55||12001||2||1||Uplink DOWN | Started on frame 376867489 | Current State Code is 12.8.2 | SQF=112 Symcod=21 IGID=26|
|05/29/2017 12:42:56||101||2||1||System StateCode DOWN | Started at 05/29/2017 12:42:51 | Current State Code is 12.8.2 | SQF=111 Symcod=21|
|05/29/2017 12:42:59||12803||2||1||Ranging Final State: Frame 376867700 Group 26 SymRate 4096 Code Rate 2 MinEsNo 47 TargetEsNo 54|
|05/29/2017 12:42:59||12803||2||1||Ranging Final State: InitialEsNo 145 FinalEsNo 51 Final BackOff 85 Outcome SUCCESS - MET TARGET ESNO|
|05/29/2017 12:43:00||12504||3||1||AIS/CLPC Filter Reset due to Ranging|
|05/29/2017 12:43:05||12002||2||1||Uplink RECOVERED | Recovered on frame 376867718 | Outage Time 000:00:00:10 | Last State Code was 12.8.3 | SQF=112 Symcod=10 IGID=25|
System StateCode UP | Recovered at 05/29/2017 12:43:01 | Outage Time 000:00:00:10 | Last State Code was 12.8.3 | SQF=111 Symcod=10
All I can do is speculate how the communications between the satellite and ground gateway stations is handled. I imagine the finer details are considered very sensitive information for any of the satellite Internet providers.
I do understand packet collision, but I'm not entirely sure that is something the satellite should have to worry about. I imagine communications are handled in a similar way as cell phones communicating with towers on the ground. Everything has specific time frames to get their data across. This would mean the data has to be sent early so it arrives at the tower at the correct time. I'm sure this would be far easier with stationary ground stations rather than cell phones travelling in cars and whatnot.
I would also guess that there is a lot of caching that goes on so the satellite's upstream and downstream don't have to run at the exact same speed.
Again these are just speculations, but I wouldn't be surprised if it operates basically like a $118-million router.
Good morning folks,
Our engineers have isolated the issue and will make adjustments for you to address your concerns. Once I get an ETA on the adjustments, I'll let you know.
Your patience and understanding are much appreciated.
In working on a reply to your post, I discovered something very important. I will get to that in a moment.
First, let me point out that during these tests, at no time have I used the wireless WiFi facilities of the Belkin router. I normally connect the router only so that we have WiFi access via a Samsung tablet elsewhere in the house. All of my PING tests run on my PC which is hard wired via an Ethernet cable to either the router or directly to the satellite modem, depending on the scenario being tested.
I would suggest that there are, in fact, 3 loops when my PC is connected directly to the modem and 4 loops when the PC is connected to the router which is then connected to the modem.
What I found hard to understand was that there are far fewer, if any, dropped PING messages when the router is added to the scenario. Therefore, I was postulating that perhaps there is something about the electrical connection between my PC and the satellite modem that is aided by having the router in between them.
While writing this reply, I thought more about that and performed a PING test between the PC and the satellite modem itself with them being directly connected via an Ethernet cable. Then the scenario contains only 1 loop.
To perform this test, I typed the following statement into a command box:
PING /n 1000 192.168.0.1
Of the 1000 PING messages sent, 37% were dropped, a really poor result for a direct hard wired Ethernet connection. More than a third of all PING messages couldn't even make the round trip just across the Ethernet cable to the satellite modem and back to the PC. By the way, the longest round trip time of those messages that did make the round trip successfully was 1 millisecond.
Then I repeated the test with the router in the circuit. This constituted a 2 loop test.
This time, 0% of the PING messages were dropped. Every PING message from the PC made the full round trip. However, while the average round trip time was still 1 millisecond, the maximum round trip time was 40 milliseconds. Some further testing showed that only 1.3% of the PING messages are ever taking longer than 1 millisecond in this 2 loop test.
So while it is desireable to remove my router to eliminate it as a source of problems when troubleshooting some issues, I find that my apparent overall performance is improved by having the router between my PC and the satelllite modem, even when I am not using the WiFi functionality the router provides.
I am already well aware that there is a connectivity test available for testing the link between the satellite modem and the gateway but that does not test the communications in a way that truly emulates normal traffic timing such as takes place during a PING test. It would be desireable to know the IP address of the server at the gateway which is serving my communication channel. It would be valuable to me to be able to perform PING tests between my PC and that gateway server, thereby eliminating the internet itself as a source of issues. Is there any chance I can get the IP address of that gateway server? Where would I look for it?
Thanks, @Liz. I will keep my app running that notes the longer dropouts and I will report when they appear to have stopped happening.
By the way, the one detail that I have noticed thatI have not yet reported is that the ability to bounce PING messages off of 22.214.171.124 stops roughly about a minute before the satellite modem logs any difficulties during a drop out. I first became aware of a discrepancy between how long a drop out appeared to be from the PC's point of view versus the duration of the drop out as logged by the satellite modem. So I started comparing some logged events with the times when the drop outs began as detected by failing PING messages. If it would be of any value, I could provide more detailed timing for a few drop outs.