Forum Discussion
TLS handshake failure between IoT product and AWS
Hello,
I am a software developer building an IoT water treatment product that is currently in soft release. (https://dropconnect.com) The core of the system is a WiFi-enabled hub that communicates with Amazon's AWS IoT Core product. We have several customers that use HughesNet Gen5 HT2000W equipment where our product cannot communicate with AWS. The TLS encryption handshake gets about 80% of the way through the process and then stops when an expected response from AWS never arrives. The problem only happens over a HughesNet satellite link, so we purchased Gen5 equipment and entered a two-year HughesNet contract specifically to diagnose this problem.
So far, I've proven that network latency is not a contributing factor, DNS lookups are not part of the problem, and the state of the firewall and web acceleration features on the HT2000W have no effect on the problem. I've tested our device on a HughesNet Gen4 connection and it works just fine, so the hangup appears to be specific to the Gen5 platform. Basic communication with the server at Amazon is not a problem; by the time the TLS handshake fails, there have already been several rounds of back-and-forth communication with the server. The TLS handshake is performed using the Mbed TLS library (https://tls.mbed.org/) which is solid and commonly used on embedded IoT products for encryption. I can provide much more information about exactly what is happening during the TLS handshake, but for now I'll save that for someone who is interested...
Can anyone suggest features or behavior of the HughesNet Gen5 service that might be contributing to this TLS handshake failure? What is the best method for getting the attention of technical engineers at HughesNet that could help diagnose and solve this problem? Any ideas that could help me chase down and solve this problem would be appreciated.
Thanks,
Patrick Frazer
Chandler Systems, Inc.
Just adding an update for closure on this thread...
In the end, the problem was corruption of the TLS handshake caused by a default buffer size in Microchip's TCP/IP library being too small. I haven't completely studied the cause yet, but it appears that traffic received via a HughesNet link uses a larger than typical MTU setting or something along those lines. Simply resizing that buffer made the problem disappear.
Patrick
- LizModerator
Hi Patrick,
I'm glad you found the community, thank you for posting. Wow, I admire your dedication to finding a solution. Let me send this over to our engineers for their input. I'll post back once I hear anything.
- LizModerator
Hi Patrick,
Good news, I got a quick turnaround from engineering and they are interested in looking at this. They'd like to get in touch with you. Although I have your SAN, it looks like it's for Mr. Chandler. Please privately message me your contact information so an engineer can reach out.
- MarkJFineProfessor
Liz,
If I could chime in...
I noticed something similar when going to my bank's web page yesterday (just didn't have time to report it) which also hung on a TLS handshake to AWS. To add to this, it seems like it's happening very sporadically (or transitionally, as I explain later) and eventually clears.
AWS switches their IPs around very often (a pure annoyance from a web security standpoint, imo). It's very possible that when they do that, the IP caching used in the DNS acceleration may get confused and try to handshake with the wrong IP, thus causing a TLS error. If that's the case, there might need to be exceptions made for AWS and any other cloud/server farms that tend to do the same thing, like DigitalOcean, etc.
I'd venture to guess this is part of the problem people were having going to amazon.com recently, as well.
- LizModerator
Hi Mark!
Thanks for chiming in, I just noticed your post. Let me also send this over to the engineers for their information.
Related Content
- 3 years ago
- 5 years ago
- 5 years ago
- 5 years ago
- 5 years ago