Error Handling: From Crash-Prone Code to Bulletproof Applications
It was my third week at Amazon, and I had just deployed what I thought was bulletproof code to production. Within minutes, our monitoring dashboard lit up like a Christmas tree. The culprit? A single unhandled exception that brought down the entire order processing system during Black Friday traffic.
My manager pulled me aside and said, "Maya, your code works great when everything goes right. But software engineering is about what happens when everything goes wrong." That moment taught me that error handling isn't just about preventing crashes - it's about building systems that gracefully degrade under pressure.
Table Of Contents
- The Million-Dollar Bug That Changed Everything
- Exception Handling Patterns Across Languages
- Error Monitoring and Alerting
- Error Recovery Strategies
- Final Thoughts: Embracing Failure as a Design Requirement
The Million-Dollar Bug That Changed Everything
Here's the code that cost us (literally) thousands of dollars in lost orders:
# My naive approach - what could go wrong?
def process_order(order_data):
user = get_user(order_data['user_id'])
product = get_product(order_data['product_id'])
# Calculate total
total = product['price'] * order_data['quantity']
# Process payment
payment_result = charge_card(user['payment_method'], total)
# Create order record
order = create_order({
'user_id': user['id'],
'product_id': product['id'],
'total': total,
'payment_id': payment_result['transaction_id']
})
return order
What happens when get_user()
returns None
? Or when the payment service is down? Or when the database connection fails? The entire application crashes, taking down hundreds of concurrent orders with it.
Here's how I rewrote it after learning proper error handling:
# Bulletproof version with comprehensive error handling
import logging
from typing import Optional, Dict, Any
from dataclasses import dataclass
@dataclass
class OrderResult:
success: bool
order: Optional[Dict[str, Any]] = None
error_message: Optional[str] = None
error_code: Optional[str] = None
def process_order(order_data: Dict[str, Any]) -> OrderResult:
logger = logging.getLogger(__name__)
try:
# Validate input data
if not order_data or not isinstance(order_data, dict):
return OrderResult(
success=False,
error_message="Invalid order data",
error_code="INVALID_INPUT"
)
required_fields = ['user_id', 'product_id', 'quantity']
for field in required_fields:
if field not in order_data:
return OrderResult(
success=False,
error_message=f"Missing required field: {field}",
error_code="MISSING_FIELD"
)
# Get user with error handling
try:
user = get_user(order_data['user_id'])
if not user:
return OrderResult(
success=False,
error_message="User not found",
error_code="USER_NOT_FOUND"
)
except DatabaseConnectionError:
logger.error(f"Database connection failed while fetching user {order_data['user_id']}")
return OrderResult(
success=False,
error_message="Service temporarily unavailable",
error_code="DATABASE_ERROR"
)
except Exception as e:
logger.error(f"Unexpected error fetching user: {str(e)}")
return OrderResult(
success=False,
error_message="Internal server error",
error_code="INTERNAL_ERROR"
)
# Get product with error handling
try:
product = get_product(order_data['product_id'])
if not product:
return OrderResult(
success=False,
error_message="Product not found",
error_code="PRODUCT_NOT_FOUND"
)
if not product.get('available', False):
return OrderResult(
success=False,
error_message="Product not available",
error_code="PRODUCT_UNAVAILABLE"
)
except Exception as e:
logger.error(f"Error fetching product {order_data['product_id']}: {str(e)}")
return OrderResult(
success=False,
error_message="Product service error",
error_code="PRODUCT_SERVICE_ERROR"
)
# Calculate total with validation
try:
quantity = int(order_data['quantity'])
if quantity <= 0:
return OrderResult(
success=False,
error_message="Quantity must be positive",
error_code="INVALID_QUANTITY"
)
total = product['price'] * quantity
if total > 10000: # Business rule: max order $10k
return OrderResult(
success=False,
error_message="Order total exceeds maximum allowed",
error_code="TOTAL_TOO_HIGH"
)
except (ValueError, TypeError):
return OrderResult(
success=False,
error_message="Invalid quantity value",
error_code="INVALID_QUANTITY_TYPE"
)
# Process payment with retries
payment_result = None
max_retries = 3
for attempt in range(max_retries):
try:
payment_result = charge_card(user['payment_method'], total)
break # Success, exit retry loop
except PaymentDeclinedError as e:
return OrderResult(
success=False,
error_message="Payment declined",
error_code="PAYMENT_DECLINED"
)
except PaymentServiceTimeoutError:
if attempt == max_retries - 1: # Last attempt
return OrderResult(
success=False,
error_message="Payment service timeout",
error_code="PAYMENT_TIMEOUT"
)
logger.warning(f"Payment timeout, retrying... (attempt {attempt + 1})")
time.sleep(0.5 * (attempt + 1)) # Exponential backoff
except Exception as e:
logger.error(f"Payment processing error: {str(e)}")
return OrderResult(
success=False,
error_message="Payment processing failed",
error_code="PAYMENT_ERROR"
)
# Create order record
try:
order = create_order({
'user_id': user['id'],
'product_id': product['id'],
'quantity': quantity,
'total': total,
'payment_id': payment_result['transaction_id']
})
logger.info(f"Order created successfully: {order['id']}")
return OrderResult(success=True, order=order)
except Exception as e:
# If order creation fails, we need to refund the payment
logger.error(f"Order creation failed, initiating refund: {str(e)}")
try:
refund_payment(payment_result['transaction_id'])
except Exception as refund_error:
logger.critical(f"Refund failed for transaction {payment_result['transaction_id']}: {str(refund_error)}")
# Alert operations team
send_alert("CRITICAL_REFUND_FAILURE", {
'transaction_id': payment_result['transaction_id'],
'amount': total,
'error': str(refund_error)
})
return OrderResult(
success=False,
error_message="Order creation failed",
error_code="ORDER_CREATION_ERROR"
)
except Exception as e:
# Catch-all for any unexpected errors
logger.critical(f"Unexpected error in process_order: {str(e)}", exc_info=True)
return OrderResult(
success=False,
error_message="Internal server error",
error_code="UNEXPECTED_ERROR"
)
Exception Handling Patterns Across Languages
Python: Pythonic Error Handling
# Context managers for resource management
class DatabaseConnection:
def __init__(self, connection_string):
self.connection_string = connection_string
self.connection = None
def __enter__(self):
try:
self.connection = connect(self.connection_string)
return self.connection
except ConnectionError as e:
logging.error(f"Failed to connect to database: {e}")
raise
def __exit__(self, exc_type, exc_val, exc_tb):
if self.connection:
try:
self.connection.close()
except Exception as e:
logging.warning(f"Error closing database connection: {e}")
# Return False to propagate any exception
return False
# Usage
def get_user_orders(user_id):
try:
with DatabaseConnection(DATABASE_URL) as db:
cursor = db.cursor()
cursor.execute(
"SELECT * FROM orders WHERE user_id = %s",
(user_id,)
)
return cursor.fetchall()
except DatabaseError as e:
logging.error(f"Database error fetching orders for user {user_id}: {e}")
return []
except Exception as e:
logging.error(f"Unexpected error: {e}")
return []
# Custom exceptions for better error categorization
class CoffeeShopError(Exception):
"""Base exception for coffee shop application"""
pass
class InsufficientInventoryError(CoffeeShopError):
"""Raised when product is out of stock"""
def __init__(self, product_name, requested, available):
self.product_name = product_name
self.requested = requested
self.available = available
super().__init__(
f"Insufficient inventory for {product_name}: "
f"requested {requested}, available {available}"
)
class PaymentError(CoffeeShopError):
"""Base class for payment-related errors"""
pass
class PaymentDeclinedError(PaymentError):
"""Raised when payment is declined"""
def __init__(self, reason):
self.reason = reason
super().__init__(f"Payment declined: {reason}")
# Using custom exceptions
def create_order(items):
try:
for item in items:
check_inventory(item['product_id'], item['quantity'])
total = calculate_total(items)
payment_result = process_payment(total)
return complete_order(items, payment_result)
except InsufficientInventoryError as e:
logging.warning(f"Inventory check failed: {e}")
return {
'success': False,
'error': 'insufficient_inventory',
'message': str(e),
'product': e.product_name
}
except PaymentDeclinedError as e:
logging.warning(f"Payment declined: {e}")
return {
'success': False,
'error': 'payment_declined',
'message': str(e),
'reason': e.reason
}
except Exception as e:
logging.error(f"Order creation failed: {e}")
return {
'success': False,
'error': 'internal_error',
'message': 'Order processing failed'
}
JavaScript: Promise-Based Error Handling
// Modern async/await error handling
class OrderService {
constructor() {
this.logger = new Logger('OrderService');
}
async processOrder(orderData) {
try {
// Input validation
this.validateOrderData(orderData);
// Parallel data fetching with error handling
const [user, product] = await Promise.all([
this.getUser(orderData.userId).catch(error => {
this.logger.error('Failed to fetch user', {
userId: orderData.userId,
error: error.message
});
throw new UserNotFoundError(`User ${orderData.userId} not found`);
}),
this.getProduct(orderData.productId).catch(error => {
this.logger.error('Failed to fetch product', {
productId: orderData.productId,
error: error.message
});
throw new ProductNotFoundError(`Product ${orderData.productId} not found`);
})
]);
// Business logic validation
if (!product.available) {
throw new ProductUnavailableError(`Product ${product.name} is not available`);
}
const total = product.price * orderData.quantity;
// Payment processing with timeout
const paymentResult = await Promise.race([
this.processPayment(user.paymentMethod, total),
new Promise((_, reject) =>
setTimeout(() => reject(new PaymentTimeoutError()), 10000)
)
]);
// Create order
const order = await this.createOrder({
userId: user.id,
productId: product.id,
quantity: orderData.quantity,
total: total,
paymentId: paymentResult.transactionId
});
this.logger.info('Order created successfully', { orderId: order.id });
return {
success: true,
order: order
};
} catch (error) {
return this.handleOrderError(error);
}
}
handleOrderError(error) {
if (error instanceof ValidationError) {
this.logger.warn('Order validation failed', { error: error.message });
return {
success: false,
error: 'validation_error',
message: error.message,
statusCode: 400
};
}
if (error instanceof UserNotFoundError || error instanceof ProductNotFoundError) {
this.logger.warn('Resource not found', { error: error.message });
return {
success: false,
error: 'not_found',
message: error.message,
statusCode: 404
};
}
if (error instanceof PaymentError) {
this.logger.error('Payment processing failed', { error: error.message });
return {
success: false,
error: 'payment_error',
message: 'Payment processing failed',
statusCode: 402
};
}
if (error instanceof PaymentTimeoutError) {
this.logger.error('Payment timeout', { error: error.message });
return {
success: false,
error: 'payment_timeout',
message: 'Payment processing timed out',
statusCode: 408
};
}
// Unknown error
this.logger.error('Unexpected error processing order', {
error: error.message,
stack: error.stack
});
return {
success: false,
error: 'internal_error',
message: 'Order processing failed',
statusCode: 500
};
}
validateOrderData(orderData) {
if (!orderData || typeof orderData !== 'object') {
throw new ValidationError('Invalid order data');
}
const requiredFields = ['userId', 'productId', 'quantity'];
for (const field of requiredFields) {
if (!(field in orderData)) {
throw new ValidationError(`Missing required field: ${field}`);
}
}
if (!Number.isInteger(orderData.quantity) || orderData.quantity <= 0) {
throw new ValidationError('Quantity must be a positive integer');
}
}
}
// Custom error classes
class ValidationError extends Error {
constructor(message) {
super(message);
this.name = 'ValidationError';
}
}
class UserNotFoundError extends Error {
constructor(message) {
super(message);
this.name = 'UserNotFoundError';
}
}
class ProductNotFoundError extends Error {
constructor(message) {
super(message);
this.name = 'ProductNotFoundError';
}
}
class ProductUnavailableError extends Error {
constructor(message) {
super(message);
this.name = 'ProductUnavailableError';
}
}
class PaymentError extends Error {
constructor(message) {
super(message);
this.name = 'PaymentError';
}
}
class PaymentTimeoutError extends PaymentError {
constructor(message = 'Payment processing timed out') {
super(message);
this.name = 'PaymentTimeoutError';
}
}
// Usage with proper error handling
async function handleOrderRequest(req, res) {
try {
const orderService = new OrderService();
const result = await orderService.processOrder(req.body);
if (result.success) {
res.status(201).json(result);
} else {
res.status(result.statusCode || 500).json({
error: result.error,
message: result.message
});
}
} catch (error) {
console.error('Unhandled error in order endpoint:', error);
res.status(500).json({
error: 'internal_error',
message: 'Internal server error'
});
}
}
Java: Enterprise Error Handling
// Spring Boot exception handling
@RestController
@RequestMapping("/api/orders")
public class OrderController {
private final OrderService orderService;
private final Logger logger = LoggerFactory.getLogger(OrderController.class);
@PostMapping
public ResponseEntity<OrderResponse> createOrder(@Valid @RequestBody OrderRequest request) {
try {
OrderResult result = orderService.processOrder(request);
return ResponseEntity.ok(new OrderResponse(result));
} catch (ValidationException e) {
logger.warn("Validation error: {}", e.getMessage());
return ResponseEntity.badRequest()
.body(new OrderResponse("validation_error", e.getMessage()));
} catch (ResourceNotFoundException e) {
logger.warn("Resource not found: {}", e.getMessage());
return ResponseEntity.notFound().build();
} catch (PaymentException e) {
logger.error("Payment error: {}", e.getMessage());
return ResponseEntity.status(HttpStatus.PAYMENT_REQUIRED)
.body(new OrderResponse("payment_error", "Payment processing failed"));
} catch (Exception e) {
logger.error("Unexpected error processing order", e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new OrderResponse("internal_error", "Order processing failed"));
}
}
}
// Global exception handler
@ControllerAdvice
public class GlobalExceptionHandler {
private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
@ExceptionHandler(ValidationException.class)
public ResponseEntity<ErrorResponse> handleValidation(ValidationException e) {
logger.warn("Validation error: {}", e.getMessage());
return ResponseEntity.badRequest()
.body(new ErrorResponse("validation_error", e.getMessage()));
}
@ExceptionHandler(ResourceNotFoundException.class)
public ResponseEntity<ErrorResponse> handleNotFound(ResourceNotFoundException e) {
logger.warn("Resource not found: {}", e.getMessage());
return ResponseEntity.notFound().build();
}
@ExceptionHandler(PaymentException.class)
public ResponseEntity<ErrorResponse> handlePayment(PaymentException e) {
logger.error("Payment error: {}", e.getMessage());
return ResponseEntity.status(HttpStatus.PAYMENT_REQUIRED)
.body(new ErrorResponse("payment_error", "Payment processing failed"));
}
@ExceptionHandler(Exception.class)
public ResponseEntity<ErrorResponse> handleGeneral(Exception e) {
logger.error("Unexpected error", e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new ErrorResponse("internal_error", "Internal server error"));
}
}
// Service layer with proper exception handling
@Service
@Transactional
public class OrderService {
private final UserRepository userRepository;
private final ProductRepository productRepository;
private final PaymentService paymentService;
private final Logger logger = LoggerFactory.getLogger(OrderService.class);
public OrderResult processOrder(OrderRequest request) {
// Validate input
validateOrderRequest(request);
// Get user
User user = userRepository.findById(request.getUserId())
.orElseThrow(() -> new ResourceNotFoundException("User not found: " + request.getUserId()));
// Get product
Product product = productRepository.findById(request.getProductId())
.orElseThrow(() -> new ResourceNotFoundException("Product not found: " + request.getProductId()));
// Business validation
if (!product.isAvailable()) {
throw new BusinessException("Product not available: " + product.getName());
}
if (request.getQuantity() > product.getStockQuantity()) {
throw new InsufficientStockException(
"Insufficient stock for " + product.getName() +
": requested " + request.getQuantity() +
", available " + product.getStockQuantity()
);
}
BigDecimal total = product.getPrice().multiply(BigDecimal.valueOf(request.getQuantity()));
// Process payment with retry logic
PaymentResult paymentResult = processPaymentWithRetry(user, total);
try {
// Create order
Order order = new Order();
order.setUserId(user.getId());
order.setProductId(product.getId());
order.setQuantity(request.getQuantity());
order.setTotal(total);
order.setPaymentId(paymentResult.getTransactionId());
order.setStatus(OrderStatus.CONFIRMED);
order = orderRepository.save(order);
// Update inventory
product.setStockQuantity(product.getStockQuantity() - request.getQuantity());
productRepository.save(product);
logger.info("Order created successfully: {}", order.getId());
return new OrderResult(true, order);
} catch (Exception e) {
// If order creation fails, refund payment
try {
paymentService.refund(paymentResult.getTransactionId());
} catch (Exception refundError) {
logger.error("Refund failed for transaction {}",
paymentResult.getTransactionId(), refundError);
// Alert operations team
alertService.sendCriticalAlert("REFUND_FAILURE",
Map.of("transactionId", paymentResult.getTransactionId(),
"amount", total.toString(),
"error", refundError.getMessage()));
}
throw new OrderCreationException("Failed to create order", e);
}
}
private PaymentResult processPaymentWithRetry(User user, BigDecimal amount) {
int maxRetries = 3;
Exception lastException = null;
for (int attempt = 1; attempt <= maxRetries; attempt++) {
try {
return paymentService.processPayment(user.getPaymentMethod(), amount);
} catch (PaymentDeclinedException e) {
// Don't retry declined payments
throw e;
} catch (PaymentServiceUnavailableException e) {
lastException = e;
if (attempt < maxRetries) {
logger.warn("Payment service unavailable, retrying... (attempt {})", attempt);
try {
Thread.sleep(1000 * attempt); // Exponential backoff
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new PaymentException("Payment processing interrupted", ie);
}
} else {
logger.error("Payment service unavailable after {} attempts", maxRetries);
throw new PaymentException("Payment service unavailable", e);
}
}
}
throw new PaymentException("Payment processing failed after retries", lastException);
}
private void validateOrderRequest(OrderRequest request) {
if (request == null) {
throw new ValidationException("Order request is required");
}
if (request.getUserId() == null) {
throw new ValidationException("User ID is required");
}
if (request.getProductId() == null) {
throw new ValidationException("Product ID is required");
}
if (request.getQuantity() == null || request.getQuantity() <= 0) {
throw new ValidationException("Quantity must be positive");
}
}
}
Error Monitoring and Alerting
Structured Logging for Better Debugging
import logging
import json
from datetime import datetime
import traceback
class StructuredLogger:
def __init__(self, name):
self.logger = logging.getLogger(name)
self.logger.setLevel(logging.INFO)
# Create handler with JSON formatter
handler = logging.StreamHandler()
handler.setFormatter(self.JSONFormatter())
self.logger.addHandler(handler)
class JSONFormatter(logging.Formatter):
def format(self, record):
log_entry = {
'timestamp': datetime.utcnow().isoformat(),
'level': record.levelname,
'logger': record.name,
'message': record.getMessage(),
'module': record.module,
'function': record.funcName,
'line': record.lineno
}
# Add exception info if present
if record.exc_info:
log_entry['exception'] = {
'type': record.exc_info[0].__name__,
'message': str(record.exc_info[1]),
'traceback': traceback.format_exception(*record.exc_info)
}
# Add extra fields
if hasattr(record, 'user_id'):
log_entry['user_id'] = record.user_id
if hasattr(record, 'order_id'):
log_entry['order_id'] = record.order_id
if hasattr(record, 'trace_id'):
log_entry['trace_id'] = record.trace_id
return json.dumps(log_entry)
def info(self, message, **kwargs):
extra = kwargs
self.logger.info(message, extra=extra)
def warning(self, message, **kwargs):
extra = kwargs
self.logger.warning(message, extra=extra)
def error(self, message, **kwargs):
extra = kwargs
self.logger.error(message, extra=extra, exc_info=True)
# Usage
logger = StructuredLogger('order_service')
def process_order(order_data):
trace_id = generate_trace_id()
try:
logger.info("Order processing started",
user_id=order_data['user_id'],
trace_id=trace_id)
# Process order...
logger.info("Order processing completed",
order_id=order.id,
trace_id=trace_id)
except Exception as e:
logger.error("Order processing failed",
user_id=order_data['user_id'],
trace_id=trace_id)
raise
Circuit Breaker Pattern
import time
from enum import Enum
from typing import Callable, Any
import asyncio
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Circuit is open, failing fast
HALF_OPEN = "half_open" # Testing if service is back
class CircuitBreaker:
def __init__(self,
failure_threshold: int = 5,
recovery_timeout: int = 60,
expected_exception: type = Exception):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.expected_exception = expected_exception
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func: Callable, *args, **kwargs) -> Any:
if self.state == CircuitState.OPEN:
if self._should_attempt_reset():
self.state = CircuitState.HALF_OPEN
else:
raise CircuitOpenException("Circuit breaker is open")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except self.expected_exception as e:
self._on_failure()
raise
def _should_attempt_reset(self) -> bool:
return (time.time() - self.last_failure_time) >= self.recovery_timeout
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
class CircuitOpenException(Exception):
pass
# Usage
payment_circuit = CircuitBreaker(
failure_threshold=3,
recovery_timeout=30,
expected_exception=PaymentServiceError
)
def process_payment_with_circuit_breaker(payment_data):
try:
return payment_circuit.call(payment_service.charge, payment_data)
except CircuitOpenException:
# Fallback: queue payment for later processing
payment_queue.add(payment_data)
return {'status': 'queued', 'message': 'Payment queued for processing'}
Error Recovery Strategies
Graceful Degradation
class CoffeeShopService:
def __init__(self):
self.primary_db = PrimaryDatabase()
self.cache = RedisCache()
self.backup_db = BackupDatabase()
def get_menu(self):
# Try cache first (fastest)
try:
menu = self.cache.get('menu')
if menu:
return menu
except Exception as e:
logging.warning(f"Cache unavailable: {e}")
# Try primary database
try:
menu = self.primary_db.get_menu()
# Update cache for next time
try:
self.cache.set('menu', menu, ttl=300)
except Exception:
pass # Cache failure shouldn't break the response
return menu
except Exception as e:
logging.error(f"Primary database unavailable: {e}")
# Fallback to backup database
try:
menu = self.backup_db.get_menu()
logging.info("Using backup database for menu")
return menu
except Exception as e:
logging.error(f"Backup database unavailable: {e}")
# Last resort: return static menu
logging.warning("All data sources unavailable, returning static menu")
return self.get_static_menu()
def get_static_menu(self):
return {
'drinks': [
{'name': 'Coffee', 'price': 3.00},
{'name': 'Tea', 'price': 2.50},
{'name': 'Espresso', 'price': 2.00}
],
'message': 'Limited menu - some items may be unavailable'
}
Final Thoughts: Embracing Failure as a Design Requirement
That Black Friday disaster taught me that errors aren't bugs to be eliminated - they're requirements to be handled. Good software doesn't just work when everything goes right; it fails gracefully when things go wrong.
Error handling is like defensive driving in Seattle traffic. You assume that other drivers might make mistakes, that the roads might be slippery, and that your car might break down. You plan for these scenarios, not because you're pessimistic, but because you're prepared.
Whether you're building a simple web app or a distributed system handling millions of requests, thoughtful error handling separates professional software from amateur scripts. It's the difference between an application that crashes under pressure and one that degrades gracefully while alerting you to fix the underlying issues.
Start with the basics: validate inputs, handle expected errors, and always have a fallback plan. As your system grows, add circuit breakers, retry logic, and monitoring. Your future self (and your users) will thank you when your application stays running during the unexpected.
Remember: the goal isn't to prevent all errors - it's to handle them so well that your users barely notice when things go wrong.
Currently writing this from Victrola Coffee on Capitol Hill, where I'm debugging a gnarly exception while enjoying my usual cortado. Share your error handling war stories @maya_codes_pnw - we've all learned the hard way! 🚨☕
Add Comment
No comments yet. Be the first to comment!