Patterns for designing self healing apps

Self healing is important concern while developing an application. For example, if a downstream service is not available , how can the app handle this situation? Will it retry more for the service which is already down or will it understand the situation and stop hammering a failing service. What if there is failure in one subsystem which can sometime cascade, for example a thread or a socket not getting freed in timely manner can result to cascading failures. All too often, success path is well tested but not the failure path. There are many patterns for handling these failures but here are few must have pattern to gracefully handle these situation.

PatternPremiseAkaHow does the pattern mitigate?
Retry failed operations with retry strategy (Retry Pattern)
Many faults are transient and may self-correct after a short delay. Have a retry pattern with strategy of increasing delay. “Maybe it’s just a blip”Allows configuring automatic retries.
Protecting failing app with Circuit Breaker
When a system is seriously struggling, failing fast is better than making users/callers wait.

Protecting a faulting system from overload can help it recover.
“Stop doing it if it hurts”

“Give that system a break”
Breaks the circuit (blocks executions) for a period, when faults exceed some pre-configured threshold.
Better caller experience with timeout
Beyond a certain wait, a success result is unlikely.“Don’t wait forever”Guarantees the caller won’t have to wait beyond the timeout.
Isolate critical resources (Bulkhead Isolation)
A Ship should not sink because there is hole in one place. When a process faults, multiple failing calls can stack up (if unbounded) and can easily swamp resource (threads/ CPU/ memory) in a host.

This can affect performance more widely by starving other operations of resource, bringing down the host, or causing cascading failures upstream.
“One fault shouldn’t sink the whole ship”Constrains the governed actions to a fixed-size resource pool, isolating their potential to affect others.
Throttle clients with Rate Limit (Rate-limit)
Limiting the rate a system handles requests is another way to control load.

This can apply to the way your system accepts incoming calls, and/or to the way you call downstream services.
“Slow down a bit, will you?”Constrains executions to not exceed a certain rate.
Things will still fail – plan what you will do when that happens.“Degrade gracefully”Defines an alternative value to be returned (or action to be executed) on failure.

What’s wrong with just hashing a password?

Storing password is critical for any application. If you do not take right precaution then you loose your user password to attacker. For password security, storing password in plain text in database is certainly a bad design. Hashing password is well known but unfortunately it is also not enough. As we know that when user tries to log in, the hash of the password they entered is checked against the hash of their password in the database. If the hash matches, the user gains access to the account. If an attacker gains access to password database, they can use the rainbow table attack to compare hashed passwords to potential hashes in the table. The rainbow table then gives plain text possibilities with each hash, which the attacker can use to access an account. For example, if attacker has a rainbow table with the hash for the password “welcome123” any user that uses that password will have the same hash, so that password can easily be cracked.

To mitigate this attack, we use password salting. As per OWASP “a salt is a unique, randomly generated string that is added to each password as part of the hashing process”.

The password in the database can be stored in the following format Hash(password + salt). A salt randomizes each hash by adding random data that is unique to each user to their password hash, so even the same password has a unique hash. If someone tried to compare hashes in a rainbow table to those in a database, none of the hashes would match, even if the passwords were the same.

Nonetheless, rainbow tables may not be the biggest threat to organizations today. Still, they are certainly a threat and should be considered and accounted for as part of an overall security strategy.