Forum Discussion
TLS handshake failure between IoT product and AWS
- 6 years ago
Just adding an update for closure on this thread...
In the end, the problem was corruption of the TLS handshake caused by a default buffer size in Microchip's TCP/IP library being too small. I haven't completely studied the cause yet, but it appears that traffic received via a HughesNet link uses a larger than typical MTU setting or something along those lines. Simply resizing that buffer made the problem disappear.
Patrick
Hi Patrick,
Good news, I got a quick turnaround from engineering and they are interested in looking at this. They'd like to get in touch with you. Although I have your SAN, it looks like it's for Mr. Chandler. Please privately message me your contact information so an engineer can reach out.
- MarkJFine6 years agoProfessor
Liz,
If I could chime in...
I noticed something similar when going to my bank's web page yesterday (just didn't have time to report it) which also hung on a TLS handshake to AWS. To add to this, it seems like it's happening very sporadically (or transitionally, as I explain later) and eventually clears.
AWS switches their IPs around very often (a pure annoyance from a web security standpoint, imo). It's very possible that when they do that, the IP caching used in the DNS acceleration may get confused and try to handshake with the wrong IP, thus causing a TLS error. If that's the case, there might need to be exceptions made for AWS and any other cloud/server farms that tend to do the same thing, like DigitalOcean, etc.
I'd venture to guess this is part of the problem people were having going to amazon.com recently, as well.
- Liz6 years agoModerator
Hi Mark!
Thanks for chiming in, I just noticed your post. Let me also send this over to the engineers for their information.- Liz6 years agoModerator
Good morning Patrick,
Thank you for PMing me your contact info. One of our engineers informed me he'll be reaching out to you soon. Looking forward to some productive findings!
- pfrazer6 years agoFreshman
Mark,
I have been in communication with an engineer at the ARM Mbed TLS group, and his latest response after looking at the detailed logging of a failed connection over a HughesNet link is that the server side at Amazon likely silently terminated the connection due to an inconsistency in the TLS handshake packet. His guess was it might be a bad MAC or something, and my interpretation is that it might be due to some sort of cacheing or acceleration mechanism that's getting in the way. (No conclusive evidence at all, of course)
However, one of the tests I performed was to look up a valid IP for our API endpoint at Amazon and then hard-code that IP into one device trying to connect over a HughesNet link. It spent the better part of a day trying to connect to that one IP and it never succeeded. (I proved at the beginning and the end that the IP was functional by switching to a non-HughesNet internet link; it connected immediately) Also, if it was a cacheing issue, you'd think that we would see at least an occasional success while trying to connect. I have two units in the field on HughesNet connections and they've been trying to connect to Amazon every few minutes for over two months. In that timeframe, neither one of them have ever finished a single TLS handshake. That 100% failure rate makes my issue feel different than what you've observed...
Patrick
Related Content
- 3 years ago
- 5 years ago
- 5 years ago
- 5 years ago
- 5 years ago