TLS handshake failure between IoT product and AWS
Hello,
I am a software developer building an IoT water treatment product that is currently in soft release. (https://dropconnect.com) The core of the system is a WiFi-enabled hub that communicates with Amazon's AWS IoT Core product. We have several customers that use HughesNet Gen5 HT2000W equipment where our product cannot communicate with AWS. The TLS encryption handshake gets about 80% of the way through the process and then stops when an expected response from AWS never arrives. The problem only happens over a HughesNet satellite link, so we purchased Gen5 equipment and entered a two-year HughesNet contract specifically to diagnose this problem.
So far, I've proven that network latency is not a contributing factor, DNS lookups are not part of the problem, and the state of the firewall and web acceleration features on the HT2000W have no effect on the problem. I've tested our device on a HughesNet Gen4 connection and it works just fine, so the hangup appears to be specific to the Gen5 platform. Basic communication with the server at Amazon is not a problem; by the time the TLS handshake fails, there have already been several rounds of back-and-forth communication with the server. The TLS handshake is performed using the Mbed TLS library (https://tls.mbed.org/) which is solid and commonly used on embedded IoT products for encryption. I can provide much more information about exactly what is happening during the TLS handshake, but for now I'll save that for someone who is interested...
Can anyone suggest features or behavior of the HughesNet Gen5 service that might be contributing to this TLS handshake failure? What is the best method for getting the attention of technical engineers at HughesNet that could help diagnose and solve this problem? Any ideas that could help me chase down and solve this problem would be appreciated.
Thanks,
Patrick Frazer
Chandler Systems, Inc.
Just adding an update for closure on this thread...
In the end, the problem was corruption of the TLS handshake caused by a default buffer size in Microchip's TCP/IP library being too small. I haven't completely studied the cause yet, but it appears that traffic received via a HughesNet link uses a larger than typical MTU setting or something along those lines. Simply resizing that buffer made the problem disappear.
Patrick