Hughesnet Community

Intermittent System Outage returns.

cancel
Showing results for 
Search instead for 
Did you mean: 
Matt_Is_My_Name
Sophomore

Intermittent System Outage returns.

I am creating a new topic since I marked the last one as having a solution.. My previous topic still has the same information, but I will give a new consolidated version of what is happening and the steps which have been taken to resolve them.

 

I admit I was too quick to mark the thread as solved, and should have waited for the weather to clear up. Lately the weather has been great around here, and weather around the gateway (Albuquerque NM) does not seem to have an effect either. I waited two weeks to allow things to settle in, and the issue exists the same as before.

 

The issue:

  • The system goes offline once every 10 - 15 minutes. The outage usually lasts between one minute and five minutes.
  • An identical sequence of events happens every outage. First the system light on the modem goes off. When the system light comes back on the LAN light will then go off. Once the LAN light comes back on service is back to normal. The duration of the system light being out seems to vary, and the amount of time the LAN light is out is pretty constant at 10 or so seconds.
  • Every outage is shown as the terminal being disassociated from the gateway. The log entries state that this was caused due to the terminal missing keep-alive messages. Example from the logs: "ASSOC: Terminal Dis-Associated Reason=IPGW 'ALB23HNSIGW72A003' not reachable - missed keep alive messages"
  • Sometimes shows TCP acceleration is down or in a degraded state before the system light goes out.

The logs show that this occurs even when signal quality is good. From the recent history I can see that it has occured between 109 and 115 SQF. 

 

Steps taken:

 

By me

  • Removed router from the network. The PC is connected directly to the modem.
  • Attempted using a different computer. In addition to the Windows PC I tried with a Linux laptop, and a second Windows Laptop. All of which were directly connected to the modem, and the issue persists.
  • Used different ethernet cables. I even cut a section of CAT 6 Mohawk out of a box I was given and put new connectors on it. Although that was mainly to see if I could remember how to do it.

By Hughesnet

  • Sent a tech out to my house.
    • New modem
    • New outside radio
    • Realigned dish
    • Replaced coaxial connector at the ground block which had water infiltration.

 

I am not sure what else can be done. I have been noticing more topics about this issue showing up in the tech support forum.

 

Beam ID23
Outroute ID39
Gateway ID2

  

I know it's a long shot, but can anyone with these IDs confirm that their service is working normally?

 

Thanks in advance!

1 ACCEPTED SOLUTION

Good morning folks,

 

Just received an update that the network adjustments to address this concern have been implemented. Please let me know if the intermittent connectivity persists for you today.

 

If you have a tech or billing question and need help, please start a new thread in the appropriate board. Unsolicited Private Messages may not get replies.

Slow performance? Click me!

View solution in original post

26 REPLIES 26
Liz
Moderator
Moderator

Hi Matt,

 

Thanks so much for your well-organized post, this helps a lot. I've run diagnostics on your site and nothing of concern is jumping out at me so far. Please let me investigate further to see what we may do next.

 

Your cooperation, patience, and understanding are much appreciated.

 

If you have a tech or billing question and need help, please start a new thread in the appropriate board. Unsolicited Private Messages may not get replies.

Slow performance? Click me!

ebjoew
Sophomore

Matt,

 

This surely sounds a lot like what is happening on my system. However, due to the physical layout of my house, the location of the modem, and the location of my computer, I almost never get to see the lights on the modem as the problem occurs. (For one thing, I am 71 years old and I just don't move that fast any more.)

 

My system is a Gen4 system that is 3 years old. It seemed to work well for most of that time but perhaps around the early part of April, I began to be concerned about apparent occasional outages lasting typically about 4 minutes or so. I detect them about 3 or 4 times per day but they certainly seem to occur at random times.

 

Like you, I simplified the system by removing my WiFi router and reverting to a direct EtherNet cable directly from the modem to my computer which is a Windows 10 system.


I have become so fed up with this problem that I am right on the verge of throwing in the towel and just junking the whole thing. I have been fighting with tech support over this for days.

 

Very similar to your case, a service tech was sent to my house who replaced the transmitter/receiver electronics on the dish itself. Because it is an intermittent problem, I was unable to know if this resolved the issue or not but by the very next day, it was clear that the problem was still there. I called on the service tech for some advice. He suspects the modem is faulty but is unable to authorize its replacment himself. He also suggested that I try powering the modem from a different power circuit in the house and checking that all connections were good and secure. With the power off, I disconnected each connector several times to be sure good contact was being made. Then I restored the power.

One other thing that seemed very odd was that the service tech had suggesting termporarily substituting a barrel connector for the grounding block in the RG6 coax between the dish and the modem. It turns out there is no grounding block present. I suspect the original installer did not do this part of his job properly.

 

Attempts to get tech support to agree to provide a replacement modem always meet with stalling around. They resist replacing it until they find direct evidence themselves that the modem is faulty.

 

I know that HughesNet has launched a new satellite (EchoStar 19) which their new Gen5 uses. Since my system is a Gen4 system, it is using the aging EchoStar17 satellite. I am beginning to wonder if the outages are being triggered by something to do with the old EchoStar 17 satellite.

 

I have been satisfied with the speed and data limits imposed by my Gen4 service and have no wish to switch to Gen5. Even with the introductory offers, the cost per month in the long run will be more expensive. I have grown weary of everyone at HughesNet trying to sell me the upgrade to Gen5. I just want my system to work like it always has.

 

I am not aware of where to see a log of events recorded by the HughesNet modem. My modem is an HT1100 model. I have been told that the gateway I was assigned to is Albuquerque, NM. I have verified that when these outages occur, the weather is not the cause neither at my location nor at the gateway.

 

I get frustrated with the tech support folks because many of them seem to know very little about the use of Ping commands. If they do know of them, then tend to want to focus on the elapsed time associated with the Ping messages and direct me to think about how far a message must travel from my house up to the satellite, down to the gateway, through the internet to the destination, and the return trip backwards through the same path. I keep having to explain to them that I am not concerned about the elapsed time but rather the fact that more than half of them are being dropped altogether.

 

That is, if I go to a CMD window and enter a command such as this:

 

PING /n 1000 8.8.8.8

 

... a thousand Ping messages will be sent to 8.8.8.8 and each will be expected to return. This test takes a little while but when it is over, I find that typically more than 50% of all the messages get totally lost and never return.

 

My current method of detecting the beginning of these intermittent outages consists of a program that I wrote that triggers the sending of a Ping message every 5 seconds and waiting to see if a response returns or is dropped. When one of the outages occurs, Ping messages start getting dropped not some of the time but all of the time. When my program detects this, it announces over the computer speakers that I have been disconnected.

 

Having been shown by tech support how to see state codes for the HT1100, I have recorded the sequence of these codes that occur during the outage. This is a typical sequence of what I see...

 

0.0.0 Fully operational

0.0.0 Fully operational

0.0.0 Fully operational

0.0.0 Fully operational

23.1.4 TCP acceleration operating in degraded state

21.1.5 Connecting to gateway

(connection error - cannot display the web page)

21.1.4 Discovering a gateway

0.0.0 Fully operational

0.0.0 Fully operational

0.0.0 Fully operational

 

 Then communication is restored.

 

NOTE: I am not sure if the connection error occurs before or after the 21.1.4 Discovering a gateway code. (It is hard to hurriedly record these codes using pencil and paper.)

 

On one occasion, I was able to run to the modem when an outage was underway and see that the System light was out, the Power light was on steady, and the LAN, Transmit and Receive lights were on but blinking a bit. After a few seconds, the System light came back on as the system appeared to recover.

 

On one occasion, I had kept a CMD window, waiting for an outage to commence. I had a:

PING /n 40 8.8.8.8

 

.... command standing by ready to be intiated. When an outage began, I started the PING command running. I was a bit surprised to see this resulting sequence:

 

Request timed out  (26 times)

General Failure  (7 times)

Request timed out  (1 time)

Reply received  (4 times with times of 631, 786, 928, and 611 milliseconds)

Request timed out (1 time)

 

What do you suppose "General Failure" is? I rather suspect it corresponds to being unable to reach the modem for a state code in my state code list. I bet the Ping program cannot access the modem for a short time there.

 

So how do you see the event log for the modem?

 

I am glad that I am not the only person in the world who is fighting this problem!

 

 

Gwalk900
Honorary Alumnus

Hello ebjoew,

 

You really don't need to run to the modem to get a picture of what is going on. The Modem itself has a great internal diagnostic readout on several levels.

A couple of pointers before we begin:

If you post a screenshot of your SCC (System Control Center) readouts be sure ot blank out your SAN that is displayed near the top left center.

 

Also the modems most easy to read logs will get wiped out if you power off the modem so its a good idea to check those and get a screenshot before powering down the modem.

 

On the SCC.

You can open the modems internal System Control Center by entering 192.168.0.1 into your browsers address bar. That will open the SCC main page:

SCC Main page snip.PNG

 

If you look at the top center you will see icons that I have marked as numbers #1 & #2 (as well as the removed SAN just to the the left of and under icon #1)

The colors of these two icons will give you a quick visual of your system condition.

 

Clicking on icon #3 however will lead us to a more detailed area:

From the menu on the left, click on General, click on State Code Monitor:

That will provide the following:

SCC page 4 Advanced snip.PNG

 

A screenshot of that page will give us a much clearer picture of what is going on.

Please remember to crop out or obliterate your SAN ... usually starts with the numbers following DSS xxxxxxxx

 

 

 

Amanda
Moderator

Hello Matt and ebjoew,

 

Just a quick heads up on this subject. Our engineers believe they have pinpointed this to one particular gateway and are currently investigating. We will provide you updates as they come through.

 

Thank you,

Amanda

@ebjoew

It seems you like poking around and troubleshooting things as much as I do. But I think this issue is out of our hands! I am willing to guess that the issue may be with the Albuquerque, NM gateway, as that seems to be a common trend. 

 

At least we have a great community here to allow us to easily do some arm waving for attention. 

 

@Amanda

Thanks for the update! As always if I notice anything new which may help I will make sure to let you know.

@Matt_Is_My_Name

 

I also see that not only are we serviced through the same gateway (Gateway ID 2 which is Albuquerque, NM) but we are both on the same beam (Beam ID 23 which covers much of Ohio where we are both located). Perhaps the issue is with Beam 23 of the satellite.

 

I certainly hope things pan out the way @Amanda said they are expecting. I am hanging a lot of hope on her statement. I was right on the very edge of terminating my use of HughesNet. I think their tech support people need to have their scripts updated such that when the tech support people cannot provide good resolution to a recurring problem, the issue gets reported to the upper levels of technical support much more quickly. Intermittent problems can be just as disturbing emotionally to end users as solid failures. Solid failures tend to get fixed quickly because they are much easier to diagnose but intermittent issues nag at you with little hope of being found. The tech support people don't really want to believe you it seems. It may be a cultural difference thing.

 

I am eagerly waiting for the all-clear update from @Amanda.

@Matt_Is_My_Name,

Of course, we have seen nothing more about any changes to the gateway that might resolve our issue and I am still having the dropouts taking place as often as ever, both short and long ones. If anything, they are getting more numerous. I expect that you are still having the dropouts as well.

Tier Level 4 Tech Support called me today. After reviewing my situation, they decided to ship me a new modem. I will report on any differences that makes when it arrives here in a few days. Based on the fact that we are both experiencing the same problem, that you have had both you radio and modem replaced without it being resolved, and I have already had my radio replaced, I am not too optimistic about a new modem taking care of my issues. Either way, it will eliminate one more piece of older equiipment and add to our information about where the issue truly lies. I will report back when I have the new modem running.

Personally, I am still rather expecting the gateway issue which @Amanda mentioned to be the true source of our issues. As I have noted before, not only are we both on the same gateway, we are both on the same beam of the satellite as well. We are both on Gateway 2 and Beam 23.

@Matt_Is_My_Name and @Amanda,

 

Looking around on the forum here, I discovered that another user (@dtominski)  who is also having intermittent service issues is also on Gateway 2 and Beam 23, just like @Matt_Is_My_Name and me. I left a post on the conversation @dtominski has been in ( Connection drops intermittently ) inviting him to come join this one since so much was in common.

Hi folks,

 

Just got an update from engineering that they are still working on this, no ETA yet for resolution, but as soon as we have more news to share, we'll post back.

 

Your patience and understanding are much appreciated.

 

If you have a tech or billing question and need help, please start a new thread in the appropriate board. Unsolicited Private Messages may not get replies.

Slow performance? Click me!

Thank you very much for that @Liz. I am eager to hear what comes of their work.

 

My system here had received a new radio earlier this month. Yesterday, I received and installed a new satellite modem. The jury is still out here on how well it is doing because we have been hit with some rough weather.

 

@Matt_Is_My_Name,

 

I wonder if you are familiar with the ethernet concept that handles packet collision. This little discussion is not related to satellite communications per se, just the transmissions between multiple stations connected by a single ethernet cable which daisy chains from one station to the next, to the next, etc. It is a typical setup that might be used for the network in an company with a lot of offices, for example. Assume there are multiple stations on an ethernet circuit and two of them decide to send a data packet at essentially the same moment. There is collision detection circuitry used to detect that multiple stations are sending at the same time. It then puts a long pulse onto the ethernet cable that blocks both messages from getting through. The transmitting stations each recognize this and respond by each one transmitting its packet again after a random delay period. The assumption is that the two stations will choose different delay periods so their retransmissions do not collide again.

 

I suspect that the HughesNet satellite communications system works in a similar manner with regard to the many ground stations sending packets to the satellite. As long as each ground station's transmitted packet does not overlap that of another ground station, all goes well. If multiple ground stations try to transmit at the same moment, the satellite senses the packet collision and transmits a signal telling those stations their packets failed to get through and to retransmit them after some suitable delays.

 

I run an application program much of the time that keeps sending PING messages and it keeps track of how many of them succeed and how many fail to make the round trip to and from a selected server on the internet. Under typical conditions when weather is not an issue and we are not in the midst of one of these outages that we have been fighting, a usual success rate for the PING messages hovers around 50%. It does vary in a rather random way, a little higher or a little lower for a minute or two at a time but generally hangs near that 50% mark.

Wednesday evening, we had a lot of heavy rain cells passing through the area, replete with several tornado warnings. Not surprisingly, we had spells where no PING messages were succeeding at all. My app flagged those periods as outages but it was clear that such outages were caused by the local weather.

 

However, I noticed something else that happened on at least one occasion when there was no rain in my location. There was a period when the success rate for the PING messages rose considerably above the usual rate, well above the 75% success rate, and stayed there for a while. Later, it returned to its usual behavior.

 

I kept wondering what could have caused that. Now I can propose a scenario about what caused this that is related to my theory about the satellite handling those packet collisions that I mentioned above.

 

I suspect some of the heavy rain that was around the state but not in my own area was blocking transmissions from many of the other HughesNet ground stations. That gave my PING messages a much better chance of getting through.

 

Of course, with many people having their attention drawn to the weather, perhaps they were not so likely to be using the internet in the first place. Either way, it is a theory.

 

@Matt_Is_My_Name, What do you think? Can you propose another explanation?

 

 

@Matt_Is_My_Name@Amanda@Liz,

 

WooHoo!

 

After at least 2 full months of poor service, and after forking out $125 to have my radio replaced and then replacing the modem and then the problem remaining just as bad as ever for several days ever since, just a few minutes ago my app that tracks success rates on PING messages started registering 100% success all the time! Somebody somewhere fixed something for sure. I suspect one of those engineers did something at the gateway that really made a difference.

 

@Matt_Is_My_Name@Amanda@Liz,

 

Well now I am really confused. It turns out that my PING messages only enjoy a 100% success rate if I connect my Belkin WiFi router to the Ethernet circuit. Even though the computer itself is still connected by Ethernet (and not wirelessly) to the router, the router is doing something that causes the 100% PING success rate that I do not get when the computer is connected directly to the modem by Ethernet cable without the router involved at all. Perhaps the router provides a layer of retransmission protocol that covers up the dropped PING messages. Or perhaps the router is providing better Ethernet drivers or transmission cable terminations, thereby improving the success rate between my computer and the modem. In any case, I think I shall stop blaming HughesNet for my low PING message success rates, and I shall keep my router connected unless I need to remove it for a test.

I will still be monitoring for the long dropouts of HughesNet service. I have been gathering some information about timing that may help the engineers. Assuming the long dropouts continue, I will be back with that information soon.

 

Gwalk900
Honorary Alumnus

Lets look at what a "ping" is:

It is a request for a response from your local computer or device to a specified server somewhere in the world that is connected to the Internet.

That is usualy a more or less straight forward thing when using a ground based ISP.

Path: yours to theirs:

Looks like this:

Internet backbone snip.PNG

 

Simple isn't it. Your ISP is connected to the Internet backbone at its "head-end" and then your ping is routed through a number of routing switches to finally reach the desired server that responds. when it can depending on its load.

That constitutes a "loop". Lets call this Loop #3.

 

Lets look a little deeper:

Your connect wirelessly to your Router ... that connection is all "local" .... that is a "loop". Lets call this Loop #1. Loop #1 is all on the user end.

Your router router passes the info to the Hughes Modem

The Modem also has a loop. Modem>Satellite>Gateway>Headend. Lets call this Loop #2. Loop #2 is all on the Hughes end.

Your data then is passed onto Loop #3 ... the Real Internet.

 

Loops.png

You have total control of Loop #1 .... wireless with the exception of perhaps the wireless portion of a HT2000w

 

Loop #2 is totally Hughes.

 

Loop #3 is not under Hughes control.

The sever you task with a request for a responce is beyond Hughes's control.

You have to break things down to determine just where the issue is.

The Hughes Modem offer a Gateway Continuity test with the SCC at 192.168.0.1

This is a test of Loop #2 ... the one that has control of.

 

 

 

 

@Gwalk900,

 

(@Amanda and @Liz may find the latter part of this of interest as well.)

 

In working on a reply to your post, I discovered something very important. I will get to that in a moment.

 

First, let me point out that during these tests, at no time have I used the wireless WiFi facilities of the Belkin router. I normally connect the router only so that we have WiFi access via a Samsung tablet elsewhere in the house. All of my PING tests run on my PC which is hard wired via an Ethernet cable to either the router or directly to the satellite modem, depending on the scenario being tested.

 

I would suggest that there are, in fact, 3 loops when my PC is connected directly to the modem and 4 loops when the PC is connected to the router which is then connected to the modem.3 Loops.png

 

4 Loops.png4 Loops.png 

What I found hard to understand was that there are far fewer, if any, dropped PING messages when the router is added to the scenario. Therefore, I was postulating that perhaps there is something about the electrical connection between my PC and the satellite modem that is aided by having the router in between them.

 

While writing this reply, I thought more about that and performed a PING test between the PC and the satellite modem itself with them being directly connected via an Ethernet cable. Then the scenario contains only 1 loop.

 

1 Loop.png

To perform this test, I typed the following statement into a command box:

 

PING /n 1000 192.168.0.1

 

Of the 1000 PING messages sent, 37% were dropped, a really poor result for a direct hard wired Ethernet connection. More than a third of all PING messages couldn't even make the round trip just across the Ethernet cable to the satellite modem and back to the PC. By the way, the longest round trip time of those messages that did make the round trip successfully was 1 millisecond.

 

Then I repeated the test with the router in the circuit. This constituted a 2 loop test.
2 Loops.png2 Loops.png

This time, 0% of the PING messages were dropped. Every PING message from the PC made the full round trip. However, while the average round trip time was still 1 millisecond, the maximum round trip time was 40 milliseconds. Some further testing showed that only 1.3% of the PING messages are ever taking longer than 1 millisecond in this 2 loop test.

 

So while it is desireable to remove my router to eliminate it as a source of problems when troubleshooting some issues, I find that my apparent overall performance is improved by having the router between my PC and the satelllite modem, even when I am not using the WiFi functionality the router provides.

 

I am already well aware that there is a connectivity test available for testing the link between the satellite modem and the gateway but that does not test the communications in a way that truly emulates normal traffic timing such as takes place during a PING test. It would be desireable to know the IP address of the server at the gateway which is serving my communication channel. It would be valuable to me to be able to perform PING tests between my PC and that gateway server, thereby eliminating the internet itself as a source of issues. Is there any chance I can get the IP address of that gateway server? Where would I look for it?

@ebjoew

 

All I can do is speculate how the communications between the satellite and ground gateway stations is handled. I imagine the finer details are considered very sensitive information for any of the satellite Internet providers.

 

I do understand packet collision, but I'm not entirely sure that is something the satellite should have to worry about. I imagine communications are handled in a similar way as cell phones communicating with towers on the ground. Everything has specific time frames to get their data across. This would mean the data has to be sent early so it arrives at the tower at the correct time. I'm sure this would be far easier with stationary ground stations rather than cell phones travelling in cars and whatnot.

 

I would also guess that there is a lot of caching that goes on so the satellite's upstream and downstream don't have to run at the exact same speed.

 

Again these are just speculations, but I wouldn't be surprised if it operates basically like a $118-million router.

Good morning folks, 

 

Our engineers have isolated the issue and will make adjustments for you to address your concerns. Once I get an ETA on the adjustments, I'll let you know.

 

Your patience and understanding are much appreciated.

 

If you have a tech or billing question and need help, please start a new thread in the appropriate board. Unsolicited Private Messages may not get replies.

Slow performance? Click me!

Thanks, @Liz. I will keep my app running that notes the longer dropouts and I will report when they appear to have stopped happening. 

 

By the way, the one detail that I have noticed thatI have not yet reported is that the ability to bounce PING messages off of 8.8.8.8 stops roughly about a minute before the satellite modem logs any difficulties during a drop out. I first became aware of a discrepancy between how long a drop out appeared to be from the PC's point of view versus the duration of the drop out as logged by the satellite modem. So I started comparing some logged events with the times when the drop outs began as detected by failing PING messages. If it would be of any value, I could provide more detailed timing for a few drop outs.

 

Gwalk900
Honorary Alumnus

When you have the Belkin  connected is DHCP enabled  in the router?

Have you checked the computers adaptor power settings to insure the operating system doesn't  power down the adaptor?

 

Good morning folks,

 

Just received an update that the network adjustments to address this concern have been implemented. Please let me know if the intermittent connectivity persists for you today.

 

If you have a tech or billing question and need help, please start a new thread in the appropriate board. Unsolicited Private Messages may not get replies.

Slow performance? Click me!