Table Of Contents
- When Numbers Go Missing
- Detecting and Managing NaN
- NaN Propagation and Pitfalls
- Smart NaN Strategies
- Pro Tips
- Dive Deeper
When Numbers Go Missing
Real-world data is messy. Missing values, represented as NaN (Not a Number) in NumPy, can break your calculations. Here's how to handle them gracefully.
Detecting and Managing NaN
import numpy as np
# Creating arrays with NaN
data = np.array([1.0, 2.0, np.nan, 4.0, 5.0])
matrix = np.array([[1.0, np.nan, 3.0],
[4.0, 5.0, np.nan],
[7.0, 8.0, 9.0]])
# Detecting NaN values
print(np.isnan(data)) # [False False True False False]
print(np.any(np.isnan(matrix))) # True - contains NaN
print(np.sum(np.isnan(matrix))) # 2 - count of NaNs
# Finding NaN positions
nan_indices = np.argwhere(np.isnan(matrix))
print(nan_indices) # [[0 1], [1 2]]
# NaN-safe computations
print(np.nanmean(data)) # 3.0 (ignores NaN)
print(np.nanstd(matrix)) # 2.58...
print(np.nansum(data)) # 12.0
# Replace NaN with specific value
cleaned = np.nan_to_num(data, nan=0.0)
print(cleaned) # [1. 2. 0. 4. 5.]
# Custom replacement
data_copy = data.copy()
data_copy[np.isnan(data_copy)] = -999
print(data_copy) # [1. 2. -999. 4. 5.]
# Forward fill (propagate last valid value)
def forward_fill(arr):
mask = np.isnan(arr)
idx = np.where(~mask, np.arange(mask.size), 0)
np.maximum.accumulate(idx, out=idx)
return arr[idx]
series = np.array([1, np.nan, np.nan, 4, np.nan])
filled = forward_fill(series.copy())
print(filled) # [1. 1. 1. 4. 4.]
NaN Propagation and Pitfalls
# NaN propagates in regular operations
arr = np.array([1, 2, np.nan, 4])
print(arr.sum()) # nan
print(arr.mean()) # nan
# Comparisons with NaN are tricky
print(np.nan == np.nan) # False!
print(np.nan > 5) # False
print(np.nan < 5) # False
Smart NaN Strategies
- Deletion: Remove rows/columns with NaN
- Imputation: Replace with mean, median, or interpolated values
- Masking: Use masked arrays for complex operations
- Nan-functions: Use NumPy's nan-aware functions
Pro Tips
- Always check for NaN before calculations
- Use
np.isfinite()
to catch both NaN and infinity - Consider
pandas
for more sophisticated missing data handling
Dive Deeper
Explore NumPy masked arrays, master data cleaning techniques, and learn about scientific data processing.
Share this article
Add Comment
No comments yet. Be the first to comment!