Navigation

Python

How to Save and Load NumPy Arrays

Master NumPy's I/O operations to efficiently save and load arrays in various formats for data persistence and sharing.

Table Of Contents

Data Persistence Made Simple

Your carefully crafted NumPy arrays shouldn't vanish when your program ends. Learn to save them efficiently and load them back exactly as they were.

NumPy's Native Formats

import numpy as np

# Create sample data
data = np.random.rand(1000, 100)
labels = np.array(['cat', 'dog', 'bird'] * 100)

# Save single array (.npy format)
np.save('my_data.npy', data)
loaded_data = np.load('my_data.npy')
print(np.array_equal(data, loaded_data))  # True

# Save multiple arrays (.npz format)
np.savez('dataset.npz', features=data, labels=labels)
loaded = np.load('dataset.npz')
print(loaded['features'].shape)  # (1000, 100)
print(loaded['labels'][:3])      # ['cat' 'dog' 'bird']

# Compressed format (saves space)
np.savez_compressed('compressed_data.npz', 
                   large_array=np.random.rand(10000, 1000))

# With context manager (automatically closes file)
with np.load('dataset.npz') as data:
    features = data['features']
    labels = data['labels']

Text-Based Formats for Human Readability

# Save as text (CSV-like)
small_array = np.random.rand(5, 3)
np.savetxt('data.txt', small_array, delimiter=',', fmt='%.4f')

# Load text data
loaded_text = np.loadtxt('data.txt', delimiter=',')

# Custom formatting
np.savetxt('formatted.txt', small_array, 
           fmt='%.2e',          # Scientific notation
           delimiter='\t',      # Tab separated
           header='col1\tcol2\tcol3',  # Header row
           comments='# ')       # Comment prefix

# Handling mixed data types
mixed_data = np.array([('Alice', 25, 1.75), ('Bob', 30, 1.80)], 
                     dtype=[('name', 'U10'), ('age', 'i4'), ('height', 'f4')])
np.savetxt('mixed.txt', mixed_data, fmt='%s %d %.2f')

Memory Mapping for Large Arrays

# Memory-mapped arrays for huge datasets
huge_array = np.random.rand(100000, 1000)

# Save as memory-mapped file
mmap_array = np.memmap('huge_data.dat', dtype='float64', mode='w+', 
                       shape=(100000, 1000))
mmap_array[:] = huge_array[:]  # Copy data
del mmap_array  # Flush to disk

# Load as memory-mapped (doesn't load into RAM immediately)
loaded_mmap = np.memmap('huge_data.dat', dtype='float64', mode='r', 
                        shape=(100000, 1000))
print(loaded_mmap[0, :5])  # Access specific parts without loading all

Binary Data with Pickle

import pickle

# Complex objects with metadata
class DataContainer:
    def __init__(self, data, metadata):
        self.data = data
        self.metadata = metadata

container = DataContainer(np.random.rand(100, 50), 
                         {'created': '2024-01-01', 'version': 1.0})

# Save with pickle
with open('container.pkl', 'wb') as f:
    pickle.dump(container, f)

# Load with pickle
with open('container.pkl', 'rb') as f:
    loaded_container = pickle.load(f)
    print(loaded_container.metadata)

Cross-Language Compatibility

# HDF5 format (requires h5py)
# Great for large datasets and cross-language compatibility
try:
    import h5py
    
    with h5py.File('data.h5', 'w') as f:
        f.create_dataset('array1', data=np.random.rand(1000, 100))
        f.create_dataset('array2', data=np.random.rand(500, 200))
        f.attrs['description'] = 'My dataset'
    
    with h5py.File('data.h5', 'r') as f:
        loaded_array = f['array1'][:]
        description = f.attrs['description']
        
except ImportError:
    print("Install h5py for HDF5 support: pip install h5py")

Performance and Format Comparison

Format Speed Size Cross-platform Human Readable
.npy ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
.npz ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐
.txt ⭐⭐ ⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
.h5 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐

Best Practices

  • Use .npy for single arrays, .npz for multiple arrays
  • Choose compressed formats for storage efficiency
  • Use memory mapping for arrays larger than RAM
  • Consider HDF5 for complex, structured datasets

Explore More

Dive into large-scale data processing, master data serialization techniques, and explore scientific data workflows.

Share this article

Add Comment

No comments yet. Be the first to comment!

More from Python