I just got caught by a “reliable” internal service which started to give timeouts.
I never configured a timeout on the connection (default was many minutes) which jammed the whole program.
It’s important to set aggressive timeouts in prod, better to error and figure out a way to accommodate the error than just wait.
Perhaps the next step is to make my program internally defensive in order to combat my poor coding skills.