Forum Discussion

Matt_Is_My_Name's avatar
8 years ago

Intermittent System Outage returns.

I am creating a new topic since I marked the last one as having a solution.. My previous topic still has the same information, but I will give a new consolidated version of what is happening and the steps which have been taken to resolve them.

 

I admit I was too quick to mark the thread as solved, and should have waited for the weather to clear up. Lately the weather has been great around here, and weather around the gateway (Albuquerque NM) does not seem to have an effect either. I waited two weeks to allow things to settle in, and the issue exists the same as before.

 

The issue:

  • The system goes offline once every 10 - 15 minutes. The outage usually lasts between one minute and five minutes.
  • An identical sequence of events happens every outage. First the system light on the modem goes off. When the system light comes back on the LAN light will then go off. Once the LAN light comes back on service is back to normal. The duration of the system light being out seems to vary, and the amount of time the LAN light is out is pretty constant at 10 or so seconds.
  • Every outage is shown as the terminal being disassociated from the gateway. The log entries state that this was caused due to the terminal missing keep-alive messages. Example from the logs: "ASSOC: Terminal Dis-Associated Reason=IPGW 'ALB23HNSIGW72A003' not reachable - missed keep alive messages"
  • Sometimes shows TCP acceleration is down or in a degraded state before the system light goes out.

The logs show that this occurs even when signal quality is good. From the recent history I can see that it has occured between 109 and 115 SQF. 

 

Steps taken:

 

By me

  • Removed router from the network. The PC is connected directly to the modem.
  • Attempted using a different computer. In addition to the Windows PC I tried with a Linux laptop, and a second Windows Laptop. All of which were directly connected to the modem, and the issue persists.
  • Used different ethernet cables. I even cut a section of CAT 6 Mohawk out of a box I was given and put new connectors on it. Although that was mainly to see if I could remember how to do it.

By Hughesnet

  • Sent a tech out to my house.
    • New modem
    • New outside radio
    • Realigned dish
    • Replaced coaxial connector at the ground block which had water infiltration.

 

I am not sure what else can be done. I have been noticing more topics about this issue showing up in the tech support forum.

 

Beam ID23
Outroute ID39
Gateway ID2

  

I know it's a long shot, but can anyone with these IDs confirm that their service is working normally?

 

Thanks in advance!

  • Liz's avatar
    Liz
    8 years ago

    Good morning folks,

     

    Just received an update that the network adjustments to address this concern have been implemented. Please let me know if the intermittent connectivity persists for you today.

     

  • Hi Matt,

     

    Thanks so much for your well-organized post, this helps a lot. I've run diagnostics on your site and nothing of concern is jumping out at me so far. Please let me investigate further to see what we may do next.

     

    Your cooperation, patience, and understanding are much appreciated.

     

  • Matt,

     

    This surely sounds a lot like what is happening on my system. However, due to the physical layout of my house, the location of the modem, and the location of my computer, I almost never get to see the lights on the modem as the problem occurs. (For one thing, I am 71 years old and I just don't move that fast any more.)

     

    My system is a Gen4 system that is 3 years old. It seemed to work well for most of that time but perhaps around the early part of April, I began to be concerned about apparent occasional outages lasting typically about 4 minutes or so. I detect them about 3 or 4 times per day but they certainly seem to occur at random times.

     

    Like you, I simplified the system by removing my WiFi router and reverting to a direct EtherNet cable directly from the modem to my computer which is a Windows 10 system.


    I have become so fed up with this problem that I am right on the verge of throwing in the towel and just junking the whole thing. I have been fighting with tech support over this for days.

     

    Very similar to your case, a service tech was sent to my house who replaced the transmitter/receiver electronics on the dish itself. Because it is an intermittent problem, I was unable to know if this resolved the issue or not but by the very next day, it was clear that the problem was still there. I called on the service tech for some advice. He suspects the modem is faulty but is unable to authorize its replacment himself. He also suggested that I try powering the modem from a different power circuit in the house and checking that all connections were good and secure. With the power off, I disconnected each connector several times to be sure good contact was being made. Then I restored the power.

    One other thing that seemed very odd was that the service tech had suggesting termporarily substituting a barrel connector for the grounding block in the RG6 coax between the dish and the modem. It turns out there is no grounding block present. I suspect the original installer did not do this part of his job properly.

     

    Attempts to get tech support to agree to provide a replacement modem always meet with stalling around. They resist replacing it until they find direct evidence themselves that the modem is faulty.

     

    I know that HughesNet has launched a new satellite (EchoStar 19) which their new Gen5 uses. Since my system is a Gen4 system, it is using the aging EchoStar17 satellite. I am beginning to wonder if the outages are being triggered by something to do with the old EchoStar 17 satellite.

     

    I have been satisfied with the speed and data limits imposed by my Gen4 service and have no wish to switch to Gen5. Even with the introductory offers, the cost per month in the long run will be more expensive. I have grown weary of everyone at HughesNet trying to sell me the upgrade to Gen5. I just want my system to work like it always has.

     

    I am not aware of where to see a log of events recorded by the HughesNet modem. My modem is an HT1100 model. I have been told that the gateway I was assigned to is Albuquerque, NM. I have verified that when these outages occur, the weather is not the cause neither at my location nor at the gateway.

     

    I get frustrated with the tech support folks because many of them seem to know very little about the use of Ping commands. If they do know of them, then tend to want to focus on the elapsed time associated with the Ping messages and direct me to think about how far a message must travel from my house up to the satellite, down to the gateway, through the internet to the destination, and the return trip backwards through the same path. I keep having to explain to them that I am not concerned about the elapsed time but rather the fact that more than half of them are being dropped altogether.

     

    That is, if I go to a CMD window and enter a command such as this:

     

    PING /n 1000 8.8.8.8

     

    ... a thousand Ping messages will be sent to 8.8.8.8 and each will be expected to return. This test takes a little while but when it is over, I find that typically more than 50% of all the messages get totally lost and never return.

     

    My current method of detecting the beginning of these intermittent outages consists of a program that I wrote that triggers the sending of a Ping message every 5 seconds and waiting to see if a response returns or is dropped. When one of the outages occurs, Ping messages start getting dropped not some of the time but all of the time. When my program detects this, it announces over the computer speakers that I have been disconnected.

     

    Having been shown by tech support how to see state codes for the HT1100, I have recorded the sequence of these codes that occur during the outage. This is a typical sequence of what I see...

     

    0.0.0 Fully operational

    0.0.0 Fully operational

    0.0.0 Fully operational

    0.0.0 Fully operational

    23.1.4 TCP acceleration operating in degraded state

    21.1.5 Connecting to gateway

    (connection error - cannot display the web page)

    21.1.4 Discovering a gateway

    0.0.0 Fully operational

    0.0.0 Fully operational

    0.0.0 Fully operational

     

     Then communication is restored.

     

    NOTE: I am not sure if the connection error occurs before or after the 21.1.4 Discovering a gateway code. (It is hard to hurriedly record these codes using pencil and paper.)

     

    On one occasion, I was able to run to the modem when an outage was underway and see that the System light was out, the Power light was on steady, and the LAN, Transmit and Receive lights were on but blinking a bit. After a few seconds, the System light came back on as the system appeared to recover.

     

    On one occasion, I had kept a CMD window, waiting for an outage to commence. I had a:

    PING /n 40 8.8.8.8

     

    .... command standing by ready to be intiated. When an outage began, I started the PING command running. I was a bit surprised to see this resulting sequence:

     

    Request timed out  (26 times)

    General Failure  (7 times)

    Request timed out  (1 time)

    Reply received  (4 times with times of 631, 786, 928, and 611 milliseconds)

    Request timed out (1 time)

     

    What do you suppose "General Failure" is? I rather suspect it corresponds to being unable to reach the modem for a state code in my state code list. I bet the Ping program cannot access the modem for a short time there.

     

    So how do you see the event log for the modem?

     

    I am glad that I am not the only person in the world who is fighting this problem!

     

     

    • Gwalk900's avatar
      Gwalk900
      Honorary Alumnus

      Hello ebjoew,

       

      You really don't need to run to the modem to get a picture of what is going on. The Modem itself has a great internal diagnostic readout on several levels.

      A couple of pointers before we begin:

      If you post a screenshot of your SCC (System Control Center) readouts be sure ot blank out your SAN that is displayed near the top left center.

       

      Also the modems most easy to read logs will get wiped out if you power off the modem so its a good idea to check those and get a screenshot before powering down the modem.

       

      On the SCC.

      You can open the modems internal System Control Center by entering 192.168.0.1 into your browsers address bar. That will open the SCC main page:

       

      If you look at the top center you will see icons that I have marked as numbers #1 & #2 (as well as the removed SAN just to the the left of and under icon #1)

      The colors of these two icons will give you a quick visual of your system condition.

       

      Clicking on icon #3 however will lead us to a more detailed area:

      From the menu on the left, click on General, click on State Code Monitor:

      That will provide the following:

       

      A screenshot of that page will give us a much clearer picture of what is going on.

      Please remember to crop out or obliterate your SAN ... usually starts with the numbers following DSS xxxxxxxx

       

       

       

  • Hello Matt and ebjoew,

     

    Just a quick heads up on this subject. Our engineers believe they have pinpointed this to one particular gateway and are currently investigating. We will provide you updates as they come through.

     

    Thank you,

    Amanda

    • Matt_Is_My_Name's avatar
      Matt_Is_My_Name
      Sophomore

      ebjoew

      It seems you like poking around and troubleshooting things as much as I do. But I think this issue is out of our hands! I am willing to guess that the issue may be with the Albuquerque, NM gateway, as that seems to be a common trend. 

       

      At least we have a great community here to allow us to easily do some arm waving for attention. 

       

      Amanda

      Thanks for the update! As always if I notice anything new which may help I will make sure to let you know.

      • ebjoew's avatar
        ebjoew
        Sophomore

        Matt_Is_My_Name

         

        I also see that not only are we serviced through the same gateway (Gateway ID 2 which is Albuquerque, NM) but we are both on the same beam (Beam ID 23 which covers much of Ohio where we are both located). Perhaps the issue is with Beam 23 of the satellite.

         

        I certainly hope things pan out the way @Amanda said they are expecting. I am hanging a lot of hope on her statement. I was right on the very edge of terminating my use of HughesNet. I think their tech support people need to have their scripts updated such that when the tech support people cannot provide good resolution to a recurring problem, the issue gets reported to the upper levels of technical support much more quickly. Intermittent problems can be just as disturbing emotionally to end users as solid failures. Solid failures tend to get fixed quickly because they are much easier to diagnose but intermittent issues nag at you with little hope of being found. The tech support people don't really want to believe you it seems. It may be a cultural difference thing.

         

        I am eagerly waiting for the all-clear update from @Amanda.

    • Matt_Is_My_Name's avatar
      Matt_Is_My_Name
      Sophomore

      Saw something new in the event log. I have no idea what it means, but I figured I would share anyway.

       

      05/29/2017 12:41:591230541Command initiated ranging: Range every rate in the trajectory table
      05/29/2017 12:42:071280221Start Range Process Sym 512 Code 2 Process id 8 Reason NOC INITIATED Frame 376866550 Group 25
      05/29/2017 12:42:131200121Uplink DOWN | Started on frame 376866551 | Current State Code is 12.8.2 | SQF=111 Symcod=6 IGID=25
      05/29/2017 12:42:1310121System StateCode DOWN | Started at 05/29/2017 12:42:08 | Current State Code is 12.8.2 | SQF=111 Symcod=6
      05/29/2017 12:42:201280321Ranging Final State: Frame 376866817 Group 25 SymRate 512 Code Rate 2 MinEsNo 46 TargetEsNo 70
      05/29/2017 12:42:201280321Ranging Final State: InitialEsNo 152 FinalEsNo 70 Final BackOff 144 Outcome SUCCESS - MET TARGET ESNO
      05/29/2017 12:42:201250431AIS/CLPC Filter Reset due to Ranging
      05/29/2017 12:42:251200221Uplink RECOVERED | Recovered on frame 376866835 | Outage Time 000:00:00:12 | Last State Code was 12.8.3 | SQF=111 Symcod=6 IGID=25
      05/29/2017 12:42:2510221System StateCode UP | Recovered at 05/29/2017 12:42:20 | Outage Time 000:00:00:12 | Last State Code was 12.8.2 | SQF=111 Symcod=6
      05/29/2017 12:42:501280221Start Range Process Sym 4096 Code 2 Process id 9 Reason BASELINE RATE SET Frame 376867488 Group 26
      05/29/2017 12:42:551200121Uplink DOWN | Started on frame 376867489 | Current State Code is 12.8.2 | SQF=112 Symcod=21 IGID=26
      05/29/2017 12:42:5610121System StateCode DOWN | Started at 05/29/2017 12:42:51 | Current State Code is 12.8.2 | SQF=111 Symcod=21
      05/29/2017 12:42:591280321Ranging Final State: Frame 376867700 Group 26 SymRate 4096 Code Rate 2 MinEsNo 47 TargetEsNo 54
      05/29/2017 12:42:591280321Ranging Final State: InitialEsNo 145 FinalEsNo 51 Final BackOff 85 Outcome SUCCESS - MET TARGET ESNO
      05/29/2017 12:43:001250431AIS/CLPC Filter Reset due to Ranging
      05/29/2017 12:43:051200221Uplink RECOVERED | Recovered on frame 376867718 | Outage Time 000:00:00:10 | Last State Code was 12.8.3 | SQF=112 Symcod=10 IGID=25
      05/29/2017 12:43:0610221

      System StateCode UP | Recovered at 05/29/2017 12:43:01 | Outage Time 000:00:00:10 | Last State Code was 12.8.3 | SQF=111 Symcod=10