Table Of Contents
- Transform Data Like a Pro
- The Core Differences Explained
- Series.apply() - The Flexible Transformer
- DataFrame.apply() - Row and Column Operations
- Series.map() - Dictionary and Function Mapping
- DataFrame.applymap() - Element-wise Transformation
- Performance Comparison
- Advanced Use Cases
- Real-World Business Applications
- When to Use Which Function
- Master Data Transformation
Transform Data Like a Pro
Confused by pandas' transformation functions? Understanding the distinct purposes of apply(), map(), and applymap() transforms you from a struggling beginner to a data manipulation expert.
The Core Differences Explained
import pandas as pd
import numpy as np
# Sample data for demonstrations
df = pd.DataFrame({
'name': ['Alice Johnson', 'Bob Smith', 'Charlie Brown'],
'age': [25, 30, 35],
'salary': [50000, 60000, 75000],
'department': ['IT', 'HR', 'Engineering']
})
print("Original DataFrame:")
print(df)
# apply(): Works on Series or DataFrame, can return Series or DataFrame
# map(): Works only on Series, returns Series (1-to-1 mapping)
# applymap(): Works on entire DataFrame element-wise, returns DataFrame
print("\n=== Key Differences ===")
print("apply(): Series/DataFrame → Series/DataFrame (flexible)")
print("map(): Series → Series (1-to-1 mapping)")
print("applymap(): DataFrame → DataFrame (element-wise)")
Series.apply() - The Flexible Transformer
# Series apply() examples
print("\n=== Series.apply() Examples ===")
# Simple function on salary column
def categorize_salary(salary):
if salary < 55000:
return 'Low'
elif salary < 70000:
return 'Medium'
else:
return 'High'
# Apply function to Series
salary_categories = df['salary'].apply(categorize_salary)
print("Salary categories:")
print(salary_categories)
# Apply with lambda
name_lengths = df['name'].apply(lambda x: len(x))
print("\nName lengths:")
print(name_lengths)
# Apply returning multiple values (Series)
def name_analysis(name):
return pd.Series({
'first_name': name.split()[0],
'last_name': name.split()[-1],
'full_length': len(name),
'word_count': len(name.split())
})
# This returns a DataFrame
name_details = df['name'].apply(name_analysis)
print("\nName analysis (returns DataFrame):")
print(name_details)
DataFrame.apply() - Row and Column Operations
print("\n=== DataFrame.apply() Examples ===")
# Apply along columns (axis=0) - operates on each column
column_means = df[['age', 'salary']].apply(np.mean)
print("Column means:")
print(column_means)
# Apply along rows (axis=1) - operates on each row
def create_profile(row):
return f"{row['name']} ({row['age']} years old) works in {row['department']}"
profiles = df.apply(create_profile, axis=1)
print("\nEmployee profiles:")
print(profiles)
# Apply returning multiple columns
def salary_analysis(row):
return pd.Series({
'salary_category': categorize_salary(row['salary']),
'salary_per_year_age': row['salary'] / row['age'],
'is_senior': row['age'] > 30
})
# Add multiple columns at once
analysis_df = df.apply(salary_analysis, axis=1)
df_with_analysis = pd.concat([df, analysis_df], axis=1)
print("\nDataFrame with analysis:")
print(df_with_analysis)
Series.map() - Dictionary and Function Mapping
print("\n=== Series.map() Examples ===")
# Dictionary mapping (most common use case)
department_codes = {
'IT': 'TECH',
'HR': 'PEOPLE',
'Engineering': 'ENG',
'Finance': 'FIN'
}
dept_codes = df['department'].map(department_codes)
print("Department codes:")
print(dept_codes)
# Function mapping (similar to apply but more restrictive)
age_groups = df['age'].map(lambda x: 'Young' if x < 30 else 'Experienced')
print("\nAge groups:")
print(age_groups)
# Series mapping (use another Series as lookup)
salary_lookup = pd.Series([45000, 55000, 65000, 75000],
index=['Junior', 'Mid', 'Senior', 'Lead'])
# This would work if we had matching values
# mapped_salaries = df['level'].map(salary_lookup)
# map() with NA handling
incomplete_mapping = {'IT': 'Technology', 'HR': 'Human Resources'}
mapped_depts = df['department'].map(incomplete_mapping)
print("\nIncomplete mapping (NaN for unmapped):")
print(mapped_depts)
# Handle missing mappings
mapped_depts_filled = df['department'].map(incomplete_mapping).fillna('Other')
print("With NaN filled:")
print(mapped_depts_filled)
DataFrame.applymap() - Element-wise Transformation
print("\n=== DataFrame.applymap() Examples ===")
# Create DataFrame with mixed data for demonstration
numeric_df = pd.DataFrame({
'A': [1.23456, 2.34567, 3.45678],
'B': [10.1234, 20.2345, 30.3456],
'C': [100.567, 200.678, 300.789]
})
print("Original numeric DataFrame:")
print(numeric_df)
# Round all values to 2 decimal places
rounded_df = numeric_df.applymap(lambda x: round(x, 2))
print("\nRounded to 2 decimal places:")
print(rounded_df)
# Apply string formatting to all elements
formatted_df = numeric_df.applymap(lambda x: f"${x:,.2f}")
print("\nFormatted as currency:")
print(formatted_df)
# Conditional transformation on all elements
def threshold_transform(x):
if x > 50:
return 'High'
elif x > 10:
return 'Medium'
else:
return 'Low'
categorized_df = numeric_df.applymap(threshold_transform)
print("\nCategorized values:")
print(categorized_df)
Performance Comparison
import time
# Create large dataset for performance testing
np.random.seed(42)
large_df = pd.DataFrame({
'values': np.random.randn(100000),
'categories': np.random.choice(['A', 'B', 'C'], 100000)
})
# Test function
def square_plus_one(x):
return x**2 + 1
print("\n=== Performance Comparison ===")
# Method 1: apply()
start = time.time()
result_apply = large_df['values'].apply(square_plus_one)
apply_time = time.time() - start
# Method 2: map()
start = time.time()
result_map = large_df['values'].map(square_plus_one)
map_time = time.time() - start
# Method 3: Vectorized operation (fastest)
start = time.time()
result_vectorized = large_df['values']**2 + 1
vectorized_time = time.time() - start
print(f"apply() time: {apply_time:.4f}s")
print(f"map() time: {map_time:.4f}s")
print(f"Vectorized time: {vectorized_time:.4f}s")
print(f"Vectorized is {apply_time/vectorized_time:.1f}x faster than apply()")
print(f"Vectorized is {map_time/vectorized_time:.1f}x faster than map()")
# Dictionary mapping performance
category_mapping = {'A': 1, 'B': 2, 'C': 3}
start = time.time()
map_dict_result = large_df['categories'].map(category_mapping)
map_dict_time = time.time() - start
start = time.time()
apply_dict_result = large_df['categories'].apply(lambda x: category_mapping[x])
apply_dict_time = time.time() - start
print(f"\nDictionary mapping:")
print(f"map() time: {map_dict_time:.4f}s")
print(f"apply() time: {apply_dict_time:.4f}s")
print(f"map() is {apply_dict_time/map_dict_time:.1f}x faster for dictionary mapping")
Advanced Use Cases
# Complex apply() example: moving window calculations
def rolling_statistics(series, window=3):
"""Calculate rolling statistics for a series"""
def calc_stats(x):
if len(x) < window:
return pd.Series({'mean': np.nan, 'std': np.nan, 'min': np.nan, 'max': np.nan})
return pd.Series({
'mean': x.mean(),
'std': x.std(),
'min': x.min(),
'max': x.max()
})
return series.rolling(window=window).apply(lambda x: pd.Series({
'mean': x.mean(),
'std': x.std(),
'min': x.min(),
'max': x.max()
}), raw=False)
# Time series data
time_series = pd.DataFrame({
'date': pd.date_range('2025-01-01', periods=10),
'price': [100, 102, 98, 105, 107, 103, 109, 111, 108, 115]
})
print("\n=== Advanced apply() Example ===")
print("Time series data:")
print(time_series)
# Custom aggregation with apply()
def price_analysis(group):
return pd.Series({
'avg_price': group['price'].mean(),
'price_volatility': group['price'].std(),
'price_trend': 'up' if group['price'].iloc[-1] > group['price'].iloc[0] else 'down',
'max_price': group['price'].max(),
'min_price': group['price'].min()
})
# Group by week and apply analysis
time_series['week'] = time_series['date'].dt.isocalendar().week
weekly_analysis = time_series.groupby('week').apply(price_analysis)
print("\nWeekly price analysis:")
print(weekly_analysis)
Real-World Business Applications
# Sales data transformation
sales_data = pd.DataFrame({
'product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'region': ['North', 'South', 'East', 'West', 'North'],
'sales': [15000, 8000, 12000, 18000, 9500],
'quarter': ['Q1', 'Q1', 'Q2', 'Q2', 'Q3']
})
# Business logic with apply()
def sales_performance(row):
base_target = {'Laptop': 16000, 'Phone': 10000, 'Tablet': 8000}
target = base_target.get(row['product'], 10000)
performance = row['sales'] / target
return pd.Series({
'target': target,
'performance_ratio': performance,
'performance_category': 'Excellent' if performance > 1.2
else 'Good' if performance > 1.0
else 'Needs Improvement'
})
sales_analysis = sales_data.apply(sales_performance, axis=1)
full_sales_data = pd.concat([sales_data, sales_analysis], axis=1)
print("\n=== Business Application ===")
print("Sales performance analysis:")
print(full_sales_data)
# Region mapping with map()
region_managers = {
'North': 'Alice Johnson',
'South': 'Bob Smith',
'East': 'Charlie Brown',
'West': 'Diana Prince'
}
full_sales_data['manager'] = full_sales_data['region'].map(region_managers)
print("\nWith manager assignments:")
print(full_sales_data[['region', 'manager', 'performance_category']])
When to Use Which Function
print("\n=== Decision Guide ===")
print("""
USE map() WHEN:
✅ Simple 1-to-1 value mapping
✅ Dictionary/Series lookup
✅ Performance is critical for simple transformations
✅ Working with categorical data
USE apply() WHEN:
✅ Complex logic or calculations
✅ Need to return multiple values
✅ Working with grouped data
✅ Need access to multiple columns (axis=1)
✅ Custom aggregations
USE applymap() WHEN:
✅ Same transformation on ALL DataFrame elements
✅ Element-wise formatting/conversion
✅ Simple mathematical operations on entire DataFrame
✅ Note: Consider vectorized operations first!
AVOID applymap() FOR:
❌ Large DataFrames (use vectorized operations)
❌ Column-specific transformations (use apply())
❌ Complex logic (usually apply() is better)
""")
Master Data Transformation
Explore advanced pandas vectorization, learn high-performance data processing, and discover functional programming patterns.
Share this article
Add Comment
No comments yet. Be the first to comment!