Navigation

Python

How to Use NumPy where() for Conditional Selection

Master NumPy's where() function to efficiently select and replace array elements based on conditions - your conditional logic accelerator.

Table Of Contents

Conditional Logic at Warp Speed

Traditional if-else statements crawl when handling arrays. NumPy's where() function makes conditional selection lightning-fast, handling millions of elements in microseconds.

Basic Conditional Selection

import numpy as np

# Simple condition: positive vs negative
data = np.array([-3, -1, 0, 2, 5, -2])
result = np.where(data > 0, data, 0)  # Keep positive, replace negative with 0
print(result)  # [0 0 0 2 5 0]

# Boolean mask alternative
positive_mask = data > 0
result_mask = np.where(positive_mask, data, 0)
print(result_mask)  # Same result

# Three-way condition using nested where
temp_data = np.array([15, 25, 35, 5, 45])
comfort = np.where(temp_data < 20, 'Cold', 
                  np.where(temp_data > 30, 'Hot', 'Perfect'))
print(comfort)  # ['Cold' 'Perfect' 'Hot' 'Cold' 'Hot']

Advanced Selection Patterns

# Matrix conditional replacement
matrix = np.array([[1, -2, 3], 
                   [-4, 5, -6], 
                   [7, -8, 9]])

# Replace negatives with their absolute value
abs_matrix = np.where(matrix < 0, -matrix, matrix)
print(abs_matrix)
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

# Complex conditions with multiple arrays
scores = np.array([85, 92, 78, 95, 88])
attempts = np.array([1, 2, 3, 1, 2])

# Bonus points for first attempt high scores
final_scores = np.where((scores > 90) & (attempts == 1), 
                       scores + 5, scores)
print(final_scores)  # [85 92 78 100 88]

Getting Indices Instead of Values

# Find indices where condition is true
data = np.array([10, 25, 30, 15, 35, 20])
high_indices = np.where(data > 20)
print(high_indices[0])  # [1 2 4] - indices where data > 20

# Multiple conditions for indices
matrix = np.random.randint(1, 10, (4, 4))
row_indices, col_indices = np.where(matrix > 5)
print(f"Elements > 5 at positions: {list(zip(row_indices, col_indices))}")

# Get the actual values at those positions
high_values = matrix[row_indices, col_indices]
print(f"Values > 5: {high_values}")

Performance Comparison

import time

# Large array performance test
large_array = np.random.randint(-100, 100, 1000000)

# NumPy where() approach (fast)
start = time.time()
np_result = np.where(large_array > 0, large_array, 0)
np_time = time.time() - start

# Pure Python approach (slow)
start = time.time()
py_result = [x if x > 0 else 0 for x in large_array]
py_time = time.time() - start

print(f"NumPy where(): {np_time:.4f}s")
print(f"Python loop: {py_time:.4f}s")
print(f"Speedup: {py_time/np_time:.1f}x faster")

Real-World Applications

# Data cleaning: replace outliers
sensor_data = np.array([22.1, 23.5, 150.0, 21.8, 22.9, -50.0, 23.1])
cleaned = np.where((sensor_data < 0) | (sensor_data > 100), 
                  np.mean(sensor_data[(sensor_data > 0) & (sensor_data < 100)]), 
                  sensor_data)

# Financial data: profit/loss categorization
returns = np.array([0.05, -0.02, 0.08, -0.01, 0.12])
categories = np.where(returns > 0.05, 'High Gain',
                     np.where(returns > 0, 'Small Gain', 'Loss'))

# Image processing: threshold application
image_data = np.random.rand(100, 100)  # Simulated grayscale image
binary_image = np.where(image_data > 0.5, 255, 0)  # Black and white

Pro Tips for where()

  • Use parentheses for complex conditions: (cond1) & (cond2)
  • where() without replacement returns indices
  • Combine with boolean indexing for powerful selections
  • Works efficiently with broadcasting

Explore Advanced Techniques

Dive into advanced NumPy indexing, master array manipulation techniques, and explore data analysis workflows.

Share this article

Add Comment

No comments yet. Be the first to comment!

More from Python