Navigation

Python

How to Use apply() vs map() vs applymap()

Master pandas' three powerhouse transformation functions - apply(), map(), and applymap() - and know exactly when to use each for optimal performance.

Table Of Contents

Transform Data Like a Pro

Confused by pandas' transformation functions? Understanding the distinct purposes of apply(), map(), and applymap() transforms you from a struggling beginner to a data manipulation expert.

The Core Differences Explained

import pandas as pd
import numpy as np

# Sample data for demonstrations
df = pd.DataFrame({
    'name': ['Alice Johnson', 'Bob Smith', 'Charlie Brown'],
    'age': [25, 30, 35],
    'salary': [50000, 60000, 75000],
    'department': ['IT', 'HR', 'Engineering']
})

print("Original DataFrame:")
print(df)

# apply(): Works on Series or DataFrame, can return Series or DataFrame
# map(): Works only on Series, returns Series (1-to-1 mapping)
# applymap(): Works on entire DataFrame element-wise, returns DataFrame

print("\n=== Key Differences ===")
print("apply(): Series/DataFrame → Series/DataFrame (flexible)")
print("map(): Series → Series (1-to-1 mapping)")
print("applymap(): DataFrame → DataFrame (element-wise)")

Series.apply() - The Flexible Transformer

# Series apply() examples
print("\n=== Series.apply() Examples ===")

# Simple function on salary column
def categorize_salary(salary):
    if salary < 55000:
        return 'Low'
    elif salary < 70000:
        return 'Medium'
    else:
        return 'High'

# Apply function to Series
salary_categories = df['salary'].apply(categorize_salary)
print("Salary categories:")
print(salary_categories)

# Apply with lambda
name_lengths = df['name'].apply(lambda x: len(x))
print("\nName lengths:")
print(name_lengths)

# Apply returning multiple values (Series)
def name_analysis(name):
    return pd.Series({
        'first_name': name.split()[0],
        'last_name': name.split()[-1],
        'full_length': len(name),
        'word_count': len(name.split())
    })

# This returns a DataFrame
name_details = df['name'].apply(name_analysis)
print("\nName analysis (returns DataFrame):")
print(name_details)

DataFrame.apply() - Row and Column Operations

print("\n=== DataFrame.apply() Examples ===")

# Apply along columns (axis=0) - operates on each column
column_means = df[['age', 'salary']].apply(np.mean)
print("Column means:")
print(column_means)

# Apply along rows (axis=1) - operates on each row
def create_profile(row):
    return f"{row['name']} ({row['age']} years old) works in {row['department']}"

profiles = df.apply(create_profile, axis=1)
print("\nEmployee profiles:")
print(profiles)

# Apply returning multiple columns
def salary_analysis(row):
    return pd.Series({
        'salary_category': categorize_salary(row['salary']),
        'salary_per_year_age': row['salary'] / row['age'],
        'is_senior': row['age'] > 30
    })

# Add multiple columns at once
analysis_df = df.apply(salary_analysis, axis=1)
df_with_analysis = pd.concat([df, analysis_df], axis=1)
print("\nDataFrame with analysis:")
print(df_with_analysis)

Series.map() - Dictionary and Function Mapping

print("\n=== Series.map() Examples ===")

# Dictionary mapping (most common use case)
department_codes = {
    'IT': 'TECH',
    'HR': 'PEOPLE',
    'Engineering': 'ENG',
    'Finance': 'FIN'
}

dept_codes = df['department'].map(department_codes)
print("Department codes:")
print(dept_codes)

# Function mapping (similar to apply but more restrictive)
age_groups = df['age'].map(lambda x: 'Young' if x < 30 else 'Experienced')
print("\nAge groups:")
print(age_groups)

# Series mapping (use another Series as lookup)
salary_lookup = pd.Series([45000, 55000, 65000, 75000], 
                         index=['Junior', 'Mid', 'Senior', 'Lead'])
# This would work if we had matching values
# mapped_salaries = df['level'].map(salary_lookup)

# map() with NA handling
incomplete_mapping = {'IT': 'Technology', 'HR': 'Human Resources'}
mapped_depts = df['department'].map(incomplete_mapping)
print("\nIncomplete mapping (NaN for unmapped):")
print(mapped_depts)

# Handle missing mappings
mapped_depts_filled = df['department'].map(incomplete_mapping).fillna('Other')
print("With NaN filled:")
print(mapped_depts_filled)

DataFrame.applymap() - Element-wise Transformation

print("\n=== DataFrame.applymap() Examples ===")

# Create DataFrame with mixed data for demonstration
numeric_df = pd.DataFrame({
    'A': [1.23456, 2.34567, 3.45678],
    'B': [10.1234, 20.2345, 30.3456],
    'C': [100.567, 200.678, 300.789]
})

print("Original numeric DataFrame:")
print(numeric_df)

# Round all values to 2 decimal places
rounded_df = numeric_df.applymap(lambda x: round(x, 2))
print("\nRounded to 2 decimal places:")
print(rounded_df)

# Apply string formatting to all elements
formatted_df = numeric_df.applymap(lambda x: f"${x:,.2f}")
print("\nFormatted as currency:")
print(formatted_df)

# Conditional transformation on all elements
def threshold_transform(x):
    if x > 50:
        return 'High'
    elif x > 10:
        return 'Medium'
    else:
        return 'Low'

categorized_df = numeric_df.applymap(threshold_transform)
print("\nCategorized values:")
print(categorized_df)

Performance Comparison

import time

# Create large dataset for performance testing
np.random.seed(42)
large_df = pd.DataFrame({
    'values': np.random.randn(100000),
    'categories': np.random.choice(['A', 'B', 'C'], 100000)
})

# Test function
def square_plus_one(x):
    return x**2 + 1

print("\n=== Performance Comparison ===")

# Method 1: apply()
start = time.time()
result_apply = large_df['values'].apply(square_plus_one)
apply_time = time.time() - start

# Method 2: map()
start = time.time()
result_map = large_df['values'].map(square_plus_one)
map_time = time.time() - start

# Method 3: Vectorized operation (fastest)
start = time.time()
result_vectorized = large_df['values']**2 + 1
vectorized_time = time.time() - start

print(f"apply() time: {apply_time:.4f}s")
print(f"map() time: {map_time:.4f}s")
print(f"Vectorized time: {vectorized_time:.4f}s")
print(f"Vectorized is {apply_time/vectorized_time:.1f}x faster than apply()")
print(f"Vectorized is {map_time/vectorized_time:.1f}x faster than map()")

# Dictionary mapping performance
category_mapping = {'A': 1, 'B': 2, 'C': 3}

start = time.time()
map_dict_result = large_df['categories'].map(category_mapping)
map_dict_time = time.time() - start

start = time.time()
apply_dict_result = large_df['categories'].apply(lambda x: category_mapping[x])
apply_dict_time = time.time() - start

print(f"\nDictionary mapping:")
print(f"map() time: {map_dict_time:.4f}s")
print(f"apply() time: {apply_dict_time:.4f}s")
print(f"map() is {apply_dict_time/map_dict_time:.1f}x faster for dictionary mapping")

Advanced Use Cases

# Complex apply() example: moving window calculations
def rolling_statistics(series, window=3):
    """Calculate rolling statistics for a series"""
    def calc_stats(x):
        if len(x) < window:
            return pd.Series({'mean': np.nan, 'std': np.nan, 'min': np.nan, 'max': np.nan})
        return pd.Series({
            'mean': x.mean(),
            'std': x.std(),
            'min': x.min(),
            'max': x.max()
        })
    
    return series.rolling(window=window).apply(lambda x: pd.Series({
        'mean': x.mean(),
        'std': x.std(),
        'min': x.min(),
        'max': x.max()
    }), raw=False)

# Time series data
time_series = pd.DataFrame({
    'date': pd.date_range('2025-01-01', periods=10),
    'price': [100, 102, 98, 105, 107, 103, 109, 111, 108, 115]
})

print("\n=== Advanced apply() Example ===")
print("Time series data:")
print(time_series)

# Custom aggregation with apply()
def price_analysis(group):
    return pd.Series({
        'avg_price': group['price'].mean(),
        'price_volatility': group['price'].std(),
        'price_trend': 'up' if group['price'].iloc[-1] > group['price'].iloc[0] else 'down',
        'max_price': group['price'].max(),
        'min_price': group['price'].min()
    })

# Group by week and apply analysis
time_series['week'] = time_series['date'].dt.isocalendar().week
weekly_analysis = time_series.groupby('week').apply(price_analysis)
print("\nWeekly price analysis:")
print(weekly_analysis)

Real-World Business Applications

# Sales data transformation
sales_data = pd.DataFrame({
    'product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'region': ['North', 'South', 'East', 'West', 'North'],
    'sales': [15000, 8000, 12000, 18000, 9500],
    'quarter': ['Q1', 'Q1', 'Q2', 'Q2', 'Q3']
})

# Business logic with apply()
def sales_performance(row):
    base_target = {'Laptop': 16000, 'Phone': 10000, 'Tablet': 8000}
    target = base_target.get(row['product'], 10000)
    performance = row['sales'] / target
    
    return pd.Series({
        'target': target,
        'performance_ratio': performance,
        'performance_category': 'Excellent' if performance > 1.2 
                               else 'Good' if performance > 1.0 
                               else 'Needs Improvement'
    })

sales_analysis = sales_data.apply(sales_performance, axis=1)
full_sales_data = pd.concat([sales_data, sales_analysis], axis=1)

print("\n=== Business Application ===")
print("Sales performance analysis:")
print(full_sales_data)

# Region mapping with map()
region_managers = {
    'North': 'Alice Johnson',
    'South': 'Bob Smith', 
    'East': 'Charlie Brown',
    'West': 'Diana Prince'
}

full_sales_data['manager'] = full_sales_data['region'].map(region_managers)
print("\nWith manager assignments:")
print(full_sales_data[['region', 'manager', 'performance_category']])

When to Use Which Function

print("\n=== Decision Guide ===")
print("""
USE map() WHEN:
✅ Simple 1-to-1 value mapping
✅ Dictionary/Series lookup
✅ Performance is critical for simple transformations
✅ Working with categorical data

USE apply() WHEN:
✅ Complex logic or calculations
✅ Need to return multiple values
✅ Working with grouped data
✅ Need access to multiple columns (axis=1)
✅ Custom aggregations

USE applymap() WHEN:
✅ Same transformation on ALL DataFrame elements
✅ Element-wise formatting/conversion
✅ Simple mathematical operations on entire DataFrame
✅ Note: Consider vectorized operations first!

AVOID applymap() FOR:
❌ Large DataFrames (use vectorized operations)
❌ Column-specific transformations (use apply())
❌ Complex logic (usually apply() is better)
""")

Master Data Transformation

Explore advanced pandas vectorization, learn high-performance data processing, and discover functional programming patterns.

Share this article

Add Comment

No comments yet. Be the first to comment!

More from Python