Navigation

Go

Go String Manipulation and the `strings` Package 2025

Master Go string manipulation with comprehensive guide covering the strings package, performance optimization, Unicode handling, and advanced text processing techniques in Go applications 2025.

Table Of Contents

Introduction

String manipulation is one of the most common tasks in programming, and Go provides powerful tools through its built-in strings package. Whether you're processing user input, parsing configuration files, building APIs, or handling text data, understanding Go's string manipulation capabilities is essential for writing efficient and maintainable code.

Many developers underestimate the complexity and performance implications of string operations. In Go, strings are immutable, which means every modification creates a new string. This characteristic, combined with Go's Unicode support and UTF-8 encoding, requires careful consideration when designing string-heavy applications.

In this comprehensive guide, we'll explore the strings package in depth, examine performance characteristics of different string operations, learn about Unicode handling, and discover advanced techniques for efficient text processing. You'll gain practical knowledge that will help you write faster, more reliable string manipulation code.

Understanding Strings in Go

String Fundamentals

Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text:

package main

import (
    "fmt"
    "unsafe"
)

func demonstrateStringBasics() {
    // String literals
    s1 := "Hello, World!"
    s2 := `Multi-line
string with
raw literals`
    s3 := "Unicode: 👋🌍"
    
    fmt.Printf("String 1: %s (len: %d bytes)\n", s1, len(s1))
    fmt.Printf("String 2: %s (len: %d bytes)\n", s2, len(s2))
    fmt.Printf("String 3: %s (len: %d bytes)\n", s3, len(s3))
    
    // String immutability
    original := "Hello"
    modified := original + ", World!"
    
    fmt.Printf("Original: %s\n", original)
    fmt.Printf("Modified: %s\n", modified)
    fmt.Printf("Same memory address: %t\n", 
        unsafe.StringData(original) == unsafe.StringData(modified))
    
    // Byte vs rune access
    text := "Hello, 世界"
    fmt.Printf("Byte length: %d\n", len(text))
    fmt.Printf("Rune count: %d\n", len([]rune(text)))
    
    // Byte iteration
    fmt.Print("Bytes: ")
    for i := 0; i < len(text); i++ {
        fmt.Printf("%02x ", text[i])
    }
    fmt.Println()
    
    // Rune iteration
    fmt.Print("Runes: ")
    for _, r := range text {
        fmt.Printf("'%c'(%U) ", r, r)
    }
    fmt.Println()
}

func main() {
    demonstrateStringBasics()
}

String vs []byte vs []rune

Understanding the different string representations:

package main

import (
    "fmt"
    "time"
    "unsafe"
)

func compareStringTypes() {
    text := "Hello, 世界! 🌍"
    
    // String (immutable, UTF-8 bytes)
    str := text
    
    // []byte (mutable, UTF-8 bytes)
    bytes := []byte(text)
    
    // []rune (mutable, Unicode code points)
    runes := []rune(text)
    
    fmt.Printf("Original: %s\n", text)
    fmt.Printf("String length: %d bytes\n", len(str))
    fmt.Printf("Bytes length: %d bytes\n", len(bytes))
    fmt.Printf("Runes length: %d runes\n", len(runes))
    
    // Memory usage
    fmt.Printf("String memory: %d bytes\n", unsafe.Sizeof(str))
    fmt.Printf("[]byte memory: %d bytes\n", unsafe.Sizeof(bytes))
    fmt.Printf("[]rune memory: %d bytes\n", unsafe.Sizeof(runes))
    
    // Conversion performance
    const iterations = 1000000
    
    // String to []byte
    start := time.Now()
    for i := 0; i < iterations; i++ {
        _ = []byte(text)
    }
    stringToBytesTime := time.Since(start)
    
    // String to []rune
    start = time.Now()
    for i := 0; i < iterations; i++ {
        _ = []rune(text)
    }
    stringToRunesTime := time.Since(start)
    
    // []byte to string
    start = time.Now()
    for i := 0; i < iterations; i++ {
        _ = string(bytes)
    }
    bytesToStringTime := time.Since(start)
    
    fmt.Printf("String to []byte: %v\n", stringToBytesTime)
    fmt.Printf("String to []rune: %v\n", stringToRunesTime)
    fmt.Printf("[]byte to string: %v\n", bytesToStringTime)
}

func main() {
    compareStringTypes()
}

Core strings Package Functions

Searching and Checking

Essential functions for finding and validating string content:

package main

import (
    "fmt"
    "strings"
)

func demonstrateSearchingFunctions() {
    text := "The quick brown fox jumps over the lazy dog"
    
    // Contains functions
    fmt.Printf("Contains 'fox': %t\n", strings.Contains(text, "fox"))
    fmt.Printf("Contains any 'xyz': %t\n", strings.ContainsAny(text, "xyz"))
    fmt.Printf("Contains rune 'x': %t\n", strings.ContainsRune(text, 'x'))
    
    // Prefix and suffix checking
    fmt.Printf("Has prefix 'The': %t\n", strings.HasPrefix(text, "The"))
    fmt.Printf("Has suffix 'dog': %t\n", strings.HasSuffix(text, "dog"))
    
    // Index functions
    fmt.Printf("Index of 'fox': %d\n", strings.Index(text, "fox"))
    fmt.Printf("Last index of 'o': %d\n", strings.LastIndex(text, "o"))
    fmt.Printf("Index of any 'xyz': %d\n", strings.IndexAny(text, "xyz"))
    fmt.Printf("Index of rune 'q': %d\n", strings.IndexRune(text, 'q'))
    
    // Count occurrences
    fmt.Printf("Count of 'o': %d\n", strings.Count(text, "o"))
    fmt.Printf("Count of 'the': %d\n", strings.Count(strings.ToLower(text), "the"))
    
    // Case-insensitive operations
    fmt.Printf("Equal fold 'THE QUICK' and 'the quick': %t\n", 
        strings.EqualFold("THE QUICK", "the quick"))
}

func demonstrateAdvancedSearching() {
    data := []string{
        "user@example.com",
        "admin@company.org",
        "test@domain.net",
        "contact@business.com",
    }
    
    // Find emails from specific domains
    targetDomains := []string{".com", ".org"}
    
    fmt.Println("Emails from target domains:")
    for _, email := range data {
        for _, domain := range targetDomains {
            if strings.HasSuffix(email, domain) {
                fmt.Printf("  %s\n", email)
                break
            }
        }
    }
    
    // Extract usernames (before @)
    fmt.Println("\nUsernames:")
    for _, email := range data {
        if idx := strings.Index(email, "@"); idx != -1 {
            username := email[:idx]
            fmt.Printf("  %s\n", username)
        }
    }
    
    // Case-insensitive filtering
    searchTerm := "ADMIN"
    fmt.Printf("\nEmails containing '%s' (case-insensitive):\n", searchTerm)
    for _, email := range data {
        if strings.Contains(strings.ToUpper(email), searchTerm) {
            fmt.Printf("  %s\n", email)
        }
    }
}

func main() {
    demonstrateSearchingFunctions()
    fmt.Println()
    demonstrateAdvancedSearching()
}

String Modification

Functions for changing string content:

package main

import (
    "fmt"
    "strings"
)

func demonstrateModificationFunctions() {
    text := "  Hello, World!  "
    
    // Case conversion
    fmt.Printf("Original: '%s'\n", text)
    fmt.Printf("ToUpper: '%s'\n", strings.ToUpper(text))
    fmt.Printf("ToLower: '%s'\n", strings.ToLower(text))
    fmt.Printf("Title: '%s'\n", strings.Title(text)) // Deprecated but still used
    fmt.Printf("ToTitle: '%s'\n", strings.ToTitle(text))
    
    // Trimming
    fmt.Printf("TrimSpace: '%s'\n", strings.TrimSpace(text))
    fmt.Printf("Trim 'H el!': '%s'\n", strings.Trim(text, " H!el"))
    fmt.Printf("TrimLeft: '%s'\n", strings.TrimLeft(text, " "))
    fmt.Printf("TrimRight: '%s'\n", strings.TrimRight(text, " !"))
    fmt.Printf("TrimPrefix: '%s'\n", strings.TrimPrefix(text, "  Hello"))
    fmt.Printf("TrimSuffix: '%s'\n", strings.TrimSuffix(text, "!  "))
    
    // Replacement
    sample := "The quick brown fox jumps over the lazy dog"
    fmt.Printf("Original: %s\n", sample)
    fmt.Printf("Replace 'o' with '0': %s\n", strings.Replace(sample, "o", "0", -1))
    fmt.Printf("Replace first 'o' with '0': %s\n", strings.Replace(sample, "o", "0", 1))
    fmt.Printf("ReplaceAll 'o' with '0': %s\n", strings.ReplaceAll(sample, "o", "0"))
    
    // Multiple replacements
    replacer := strings.NewReplacer(
        "quick", "fast",
        "brown", "red",
        "lazy", "sleepy",
    )
    fmt.Printf("Multiple replacements: %s\n", replacer.Replace(sample))
}

func demonstrateAdvancedModification() {
    // Clean and normalize user input
    userInput := "  JoHn.DoE@EXAMPLE.COM  "
    
    cleaned := strings.TrimSpace(strings.ToLower(userInput))
    fmt.Printf("Cleaned email: '%s'\n", cleaned)
    
    // Normalize whitespace
    messyText := "This   has    irregular     spacing"
    normalized := strings.Join(strings.Fields(messyText), " ")
    fmt.Printf("Normalized: '%s'\n", normalized)
    
    // Remove unwanted characters
    phoneNumber := "(123) 456-7890 ext. 123"
    digitsOnly := strings.Map(func(r rune) rune {
        if r >= '0' && r <= '9' {
            return r
        }
        return -1 // Remove character
    }, phoneNumber)
    fmt.Printf("Digits only: '%s'\n", digitsOnly)
    
    // Custom cleaning function
    cleanText := func(s string) string {
        // Remove leading/trailing whitespace
        s = strings.TrimSpace(s)
        // Replace multiple spaces with single space
        s = strings.Join(strings.Fields(s), " ")
        // Remove special characters except letters, numbers, and spaces
        return strings.Map(func(r rune) rune {
            if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || 
               (r >= '0' && r <= '9') || r == ' ' {
                return r
            }
            return -1
        }, s)
    }
    
    dirtyText := "  Hello@#$   World!!!   123  "
    fmt.Printf("Clean text: '%s'\n", cleanText(dirtyText))
}

func main() {
    demonstrateModificationFunctions()
    fmt.Println()
    demonstrateAdvancedModification()
}

String Splitting and Joining

Working with string arrays and delimiters:

package main

import (
    "fmt"
    "strings"
)

func demonstrateSplittingJoining() {
    // Basic splitting
    csv := "apple,banana,orange,grape"
    fruits := strings.Split(csv, ",")
    fmt.Printf("Split CSV: %v\n", fruits)
    
    // Split with limit
    text := "a,b,c,d,e"
    parts := strings.SplitN(text, ",", 3)
    fmt.Printf("SplitN (limit 3): %v\n", parts)
    
    // Split after delimiter
    parts = strings.SplitAfter(text, ",")
    fmt.Printf("SplitAfter: %v\n", parts)
    
    // Fields (split on whitespace)
    sentence := "The   quick brown   fox"
    words := strings.Fields(sentence)
    fmt.Printf("Fields: %v\n", words)
    
    // Custom field function
    customSplit := strings.FieldsFunc(sentence, func(r rune) bool {
        return r == ' ' || r == '\t' || r == '\n'
    })
    fmt.Printf("Custom fields: %v\n", customSplit)
    
    // Joining
    joined := strings.Join(fruits, " | ")
    fmt.Printf("Joined with ' | ': %s\n", joined)
    
    // Repeat
    pattern := strings.Repeat("*", 10)
    fmt.Printf("Repeated pattern: %s\n", pattern)
}

func demonstrateAdvancedSplitting() {
    // Parse configuration-like data
    config := `
    database.host=localhost
    database.port=5432
    database.name=myapp
    server.port=8080
    server.debug=true
    `
    
    settings := make(map[string]string)
    lines := strings.Split(strings.TrimSpace(config), "\n")
    
    for _, line := range lines {
        line = strings.TrimSpace(line)
        if line == "" || strings.HasPrefix(line, "#") {
            continue
        }
        
        parts := strings.SplitN(line, "=", 2)
        if len(parts) == 2 {
            key := strings.TrimSpace(parts[0])
            value := strings.TrimSpace(parts[1])
            settings[key] = value
        }
    }
    
    fmt.Println("Parsed configuration:")
    for key, value := range settings {
        fmt.Printf("  %s = %s\n", key, value)
    }
    
    // Parse CSV with quoted fields
    csvLine := `"John Doe","Software Engineer","New York, NY","$75,000"`
    
    // Simple CSV parser (for demonstration - use encoding/csv for production)
    parseCSV := func(line string) []string {
        var fields []string
        var current strings.Builder
        inQuotes := false
        
        for i, r := range line {
            switch r {
            case '"':
                if inQuotes && i+1 < len(line) && line[i+1] == '"' {
                    current.WriteRune('"')
                    i++ // Skip next quote
                } else {
                    inQuotes = !inQuotes
                }
            case ',':
                if !inQuotes {
                    fields = append(fields, current.String())
                    current.Reset()
                } else {
                    current.WriteRune(r)
                }
            default:
                current.WriteRune(r)
            }
        }
        fields = append(fields, current.String())
        return fields
    }
    
    fields := parseCSV(csvLine)
    fmt.Printf("\nParsed CSV fields: %v\n", fields)
    
    // Smart word splitting (handle punctuation)
    text := "Hello, world! How are you? I'm fine."
    smartSplit := func(s string) []string {
        return strings.FieldsFunc(s, func(r rune) bool {
            return !(r >= 'a' && r <= 'z') && !(r >= 'A' && r <= 'Z') && 
                   !(r >= '0' && r <= '9') && r != '\''
        })
    }
    
    words := smartSplit(text)
    fmt.Printf("Smart word split: %v\n", words)
}

func main() {
    demonstrateSplittingJoining()
    fmt.Println()
    demonstrateAdvancedSplitting()
}

Performance Optimization

String Building and Concatenation

Efficient ways to build strings:

package main

import (
    "fmt"
    "strings"
    "time"
)

func benchmarkStringConcatenation() {
    const iterations = 10000
    words := make([]string, iterations)
    for i := 0; i < iterations; i++ {
        words[i] = fmt.Sprintf("word%d", i)
    }
    
    // Method 1: Simple concatenation (inefficient)
    start := time.Now()
    result1 := ""
    for _, word := range words {
        result1 += word + " "
    }
    simpleTime := time.Since(start)
    
    // Method 2: strings.Builder (efficient)
    start = time.Now()
    var builder strings.Builder
    builder.Grow(len(result1)) // Pre-allocate if size is known
    for _, word := range words {
        builder.WriteString(word)
        builder.WriteString(" ")
    }
    result2 := builder.String()
    builderTime := time.Since(start)
    
    // Method 3: strings.Join (most efficient for this case)
    start = time.Now()
    result3 := strings.Join(words, " ") + " "
    joinTime := time.Since(start)
    
    // Method 4: []byte approach
    start = time.Now()
    var buf []byte
    for _, word := range words {
        buf = append(buf, word...)
        buf = append(buf, ' ')
    }
    result4 := string(buf)
    byteTime := time.Since(start)
    
    fmt.Printf("Results are equal: %t\n", 
        result1 == result2 && result2 == result3 && result3 == result4)
    
    fmt.Printf("Simple concatenation: %v\n", simpleTime)
    fmt.Printf("strings.Builder: %v\n", builderTime)
    fmt.Printf("strings.Join: %v\n", joinTime)
    fmt.Printf("[]byte approach: %v\n", byteTime)
    
    fmt.Printf("Builder vs Simple: %.2fx faster\n", 
        float64(simpleTime)/float64(builderTime))
    fmt.Printf("Join vs Simple: %.2fx faster\n", 
        float64(simpleTime)/float64(joinTime))
}

func demonstrateStringBuilder() {
    var builder strings.Builder
    
    // Pre-allocate capacity if known
    builder.Grow(100)
    
    // Various write methods
    builder.WriteString("Hello, ")
    builder.WriteRune('W')
    builder.WriteRune('o')
    builder.WriteRune('r')
    builder.WriteRune('l')
    builder.WriteRune('d')
    builder.WriteByte('!')
    
    result := builder.String()
    fmt.Printf("Built string: %s\n", result)
    fmt.Printf("Builder capacity: %d\n", builder.Cap())
    fmt.Printf("Builder length: %d\n", builder.Len())
    
    // Reset and reuse
    builder.Reset()
    builder.WriteString("Reused builder")
    fmt.Printf("After reset: %s\n", builder.String())
}

func demonstrateEfficientPatterns() {
    // Template-like string building
    buildTemplate := func(template string, values map[string]string) string {
        var builder strings.Builder
        builder.Grow(len(template) * 2) // Estimate final size
        
        i := 0
        for i < len(template) {
            if i < len(template)-1 && template[i] == '{' && template[i+1] == '{' {
                // Find closing }}
                end := strings.Index(template[i+2:], "}}")
                if end != -1 {
                    key := template[i+2 : i+2+end]
                    if value, exists := values[key]; exists {
                        builder.WriteString(value)
                    } else {
                        builder.WriteString("{{" + key + "}}")
                    }
                    i += 4 + end // Skip past }}
                    continue
                }
            }
            builder.WriteByte(template[i])
            i++
        }
        
        return builder.String()
    }
    
    template := "Hello {{name}}, welcome to {{app}}! Your ID is {{id}}."
    values := map[string]string{
        "name": "John",
        "app":  "MyApp",
        "id":   "12345",
    }
    
    result := buildTemplate(template, values)
    fmt.Printf("Template result: %s\n", result)
    
    // Efficient CSV building
    buildCSV := func(records [][]string) string {
        if len(records) == 0 {
            return ""
        }
        
        var builder strings.Builder
        
        for i, record := range records {
            if i > 0 {
                builder.WriteByte('\n')
            }
            
            for j, field := range record {
                if j > 0 {
                    builder.WriteByte(',')
                }
                
                // Quote field if it contains comma or quotes
                if strings.Contains(field, ",") || strings.Contains(field, "\"") {
                    builder.WriteByte('"')
                    builder.WriteString(strings.ReplaceAll(field, "\"", "\"\""))
                    builder.WriteByte('"')
                } else {
                    builder.WriteString(field)
                }
            }
        }
        
        return builder.String()
    }
    
    records := [][]string{
        {"Name", "Age", "City"},
        {"John Doe", "30", "New York"},
        {"Jane Smith", "25", "San Francisco"},
        {"Bob Johnson", "35", "Chicago, IL"},
    }
    
    csv := buildCSV(records)
    fmt.Printf("Generated CSV:\n%s\n", csv)
}

func main() {
    fmt.Println("String concatenation benchmark:")
    benchmarkStringConcatenation()
    
    fmt.Println("\nstrings.Builder demonstration:")
    demonstrateStringBuilder()
    
    fmt.Println("\nEfficient string building patterns:")
    demonstrateEfficientPatterns()
}

Memory-Efficient String Operations

Techniques to minimize memory allocations:

package main

import (
    "fmt"
    "strings"
    "unsafe"
)

func demonstrateZeroCopyOperations() {
    text := "The quick brown fox jumps over the lazy dog"
    
    // Efficient substring without allocation (view into original)
    // Note: This keeps the original string in memory
    quickBrown := text[4:15] // "quick brown"
    fmt.Printf("Substring: '%s'\n", quickBrown)
    
    // Check if they share memory
    fmt.Printf("Share memory: %t\n", 
        unsafe.StringData(text) == unsafe.StringData(quickBrown))
    
    // When to copy vs when to slice
    largeSentence := strings.Repeat("This is a very long sentence. ", 1000)
    
    // Bad: Keeping reference to large string for small substring
    badWord := largeSentence[0:4] // "This"
    
    // Good: Copy small substring to release large string
    goodWord := string([]byte(largeSentence[0:4]))
    
    fmt.Printf("Bad substring shares memory: %t\n", 
        unsafe.StringData(largeSentence) == unsafe.StringData(badWord))
    fmt.Printf("Good substring shares memory: %t\n", 
        unsafe.StringData(largeSentence) == unsafe.StringData(goodWord))
    
    // Efficient field extraction
    parseFields := func(line string) (string, string, string) {
        fields := strings.SplitN(line, "|", 3)
        if len(fields) != 3 {
            return "", "", ""
        }
        return strings.TrimSpace(fields[0]), 
               strings.TrimSpace(fields[1]), 
               strings.TrimSpace(fields[2])
    }
    
    data := "John Doe | Software Engineer | New York"
    name, title, city := parseFields(data)
    fmt.Printf("Name: %s, Title: %s, City: %s\n", name, title, city)
}

func demonstrateInPlaceOperations() {
    // In-place rune manipulation
    processRunes := func(s string, processor func(rune) rune) string {
        runes := []rune(s)
        for i, r := range runes {
            runes[i] = processor(r)
        }
        return string(runes)
    }
    
    text := "Hello, World!"
    
    // Convert to alternating case
    alternating := processRunes(text, func() func(rune) rune {
        upper := true
        return func(r rune) rune {
            if r >= 'a' && r <= 'z' {
                if upper {
                    r = r - 'a' + 'A'
                }
                upper = !upper
            } else if r >= 'A' && r <= 'Z' {
                if !upper {
                    r = r - 'A' + 'a'
                }
                upper = !upper
            }
            return r
        }
    }())
    
    fmt.Printf("Alternating case: %s\n", alternating)
    
    // Efficient character filtering
    filterChars := func(s string, keep func(rune) bool) string {
        var buf []byte
        for _, r := range s {
            if keep(r) {
                buf = append(buf, string(r)...)
            }
        }
        return string(buf)
    }
    
    mixed := "Hello123World456!"
    lettersOnly := filterChars(mixed, func(r rune) bool {
        return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z')
    })
    
    fmt.Printf("Letters only: %s\n", lettersOnly)
}

func demonstrateStringPool() {
    // String interning for memory efficiency
    type StringPool struct {
        pool map[string]string
    }
    
    func NewStringPool() *StringPool {
        return &StringPool{pool: make(map[string]string)}
    }
    
    func (sp *StringPool) Intern(s string) string {
        if interned, exists := sp.pool[s]; exists {
            return interned
        }
        sp.pool[s] = s
        return s
    }
    
    func (sp *StringPool) Size() int {
        return len(sp.pool)
    }
    
    pool := NewStringPool()
    
    // Simulate repeated string operations
    commonStrings := []string{"GET", "POST", "PUT", "DELETE", "application/json"}
    var requests []string
    
    for i := 0; i < 1000; i++ {
        method := commonStrings[i%len(commonStrings)]
        internedMethod := pool.Intern(method)
        requests = append(requests, internedMethod)
    }
    
    fmt.Printf("Pool size: %d unique strings\n", pool.Size())
    fmt.Printf("Total requests: %d\n", len(requests))
    
    // All identical strings share the same memory
    firstGET := -1
    secondGET := -1
    for i, req := range requests {
        if req == "GET" {
            if firstGET == -1 {
                firstGET = i
            } else if secondGET == -1 {
                secondGET = i
                break
            }
        }
    }
    
    if firstGET != -1 && secondGET != -1 {
        fmt.Printf("Same GET strings share memory: %t\n",
            unsafe.StringData(requests[firstGET]) == unsafe.StringData(requests[secondGET]))
    }
}

func main() {
    fmt.Println("Zero-copy string operations:")
    demonstrateZeroCopyOperations()
    
    fmt.Println("\nIn-place string operations:")
    demonstrateInPlaceOperations()
    
    fmt.Println("\nString pooling for memory efficiency:")
    demonstrateStringPool()
}

Unicode and Internationalization

Working with Unicode

Proper handling of international characters:

package main

import (
    "fmt"
    "strings"
    "unicode"
    "unicode/utf8"
)

func demonstrateUnicodeBasics() {
    // Unicode strings
    texts := []string{
        "Hello",           // ASCII
        "Héllo",          // Latin with accents
        "Привет",         // Cyrillic
        "こんにちは",        // Japanese Hiragana
        "🌍🚀🎉",         // Emojis
        "नमस्ते",          // Devanagari (Hindi)
    }
    
    for _, text := range texts {
        fmt.Printf("Text: %s\n", text)
        fmt.Printf("  Byte length: %d\n", len(text))
        fmt.Printf("  Rune count: %d\n", utf8.RuneCountInString(text))
        fmt.Printf("  Valid UTF-8: %t\n", utf8.ValidString(text))
        
        // Print individual runes
        fmt.Print("  Runes: ")
        for _, r := range text {
            fmt.Printf("U+%04X('%c') ", r, r)
        }
        fmt.Println()
        fmt.Println()
    }
}

func demonstrateUnicodeOperations() {
    text := "Café ñoño 123 🎉"
    
    // Count different character types
    var letters, digits, spaces, symbols int
    
    for _, r := range text {
        switch {
        case unicode.IsLetter(r):
            letters++
        case unicode.IsDigit(r):
            digits++
        case unicode.IsSpace(r):
            spaces++
        default:
            symbols++
        }
    }
    
    fmt.Printf("Text: %s\n", text)
    fmt.Printf("Letters: %d, Digits: %d, Spaces: %d, Symbols: %d\n", 
        letters, digits, spaces, symbols)
    
    // Unicode normalization and comparison
    text1 := "café"      // é as single character
    text2 := "cafe\u0301" // e + combining acute accent
    
    fmt.Printf("Text 1: %s (length: %d)\n", text1, len(text1))
    fmt.Printf("Text 2: %s (length: %d)\n", text2, len(text2))
    fmt.Printf("Equal: %t\n", text1 == text2)
    fmt.Printf("EqualFold: %t\n", strings.EqualFold(text1, text2))
    
    // Case conversion with Unicode
    multilingual := "Istanbul İstanbul ISTANBUL"
    fmt.Printf("Original: %s\n", multilingual)
    fmt.Printf("ToLower: %s\n", strings.ToLower(multilingual))
    fmt.Printf("ToUpper: %s\n", strings.ToUpper(multilingual))
}

func demonstrateUnicodeProcessing() {
    // Extract words from multilingual text
    extractWords := func(text string) []string {
        var words []string
        var current strings.Builder
        
        for _, r := range text {
            if unicode.IsLetter(r) || unicode.IsDigit(r) {
                current.WriteRune(r)
            } else {
                if current.Len() > 0 {
                    words = append(words, current.String())
                    current.Reset()
                }
            }
        }
        
        if current.Len() > 0 {
            words = append(words, current.String())
        }
        
        return words
    }
    
    multiText := "Hello, 世界! How are you? 元気ですか?"
    words := extractWords(multiText)
    fmt.Printf("Extracted words: %v\n", words)
    
    // Character classification
    classifyText := func(text string) map[string]int {
        stats := map[string]int{
            "latin":     0,
            "cyrillic":  0,
            "cjk":       0,
            "arabic":    0,
            "digit":     0,
            "symbol":    0,
            "other":     0,
        }
        
        for _, r := range text {
            switch {
            case unicode.IsDigit(r):
                stats["digit"]++
            case unicode.IsSymbol(r) || unicode.IsPunct(r):
                stats["symbol"]++
            case unicode.In(r, unicode.Latin):
                stats["latin"]++
            case unicode.In(r, unicode.Cyrillic):
                stats["cyrillic"]++
            case unicode.In(r, unicode.Han, unicode.Hiragana, unicode.Katakana):
                stats["cjk"]++
            case unicode.In(r, unicode.Arabic):
                stats["arabic"]++
            default:
                stats["other"]++
            }
        }
        
        return stats
    }
    
    mixedText := "Hello Привет 你好 مرحبا 123 🎉"
    stats := classifyText(mixedText)
    fmt.Printf("\nCharacter classification for: %s\n", mixedText)
    for script, count := range stats {
        if count > 0 {
            fmt.Printf("  %s: %d\n", script, count)
        }
    }
}

func demonstrateTextProcessing() {
    // Smart truncation that respects word boundaries and Unicode
    smartTruncate := func(text string, maxLen int) string {
        if utf8.RuneCountInString(text) <= maxLen {
            return text
        }
        
        runes := []rune(text)
        if len(runes) <= maxLen {
            return text
        }
        
        // Try to truncate at word boundary
        truncated := runes[:maxLen]
        lastSpace := -1
        
        for i := len(truncated) - 1; i >= 0; i-- {
            if unicode.IsSpace(truncated[i]) {
                lastSpace = i
                break
            }
        }
        
        if lastSpace > maxLen/2 { // Don't truncate too much
            truncated = truncated[:lastSpace]
        }
        
        return string(truncated) + "..."
    }
    
    longText := "This is a very long text with international characters like café, naïve, and résumé that needs to be truncated properly."
    
    fmt.Printf("Original: %s\n", longText)
    fmt.Printf("Truncated (30): %s\n", smartTruncate(longText, 30))
    fmt.Printf("Truncated (50): %s\n", smartTruncate(longText, 50))
    
    // Remove diacritics (simplified approach)
    removeDiacritics := func(text string) string {
        // This is a simplified version - for production use golang.org/x/text/transform
        replacements := map[rune]rune{
            'á': 'a', 'à': 'a', 'ä': 'a', 'â': 'a', 'ã': 'a',
            'é': 'e', 'è': 'e', 'ë': 'e', 'ê': 'e',
            'í': 'i', 'ì': 'i', 'ï': 'i', 'î': 'i',
            'ó': 'o', 'ò': 'o', 'ö': 'o', 'ô': 'o', 'õ': 'o',
            'ú': 'u', 'ù': 'u', 'ü': 'u', 'û': 'u',
            'ñ': 'n', 'ç': 'c',
        }
        
        var result strings.Builder
        for _, r := range text {
            if replacement, exists := replacements[r]; exists {
                result.WriteRune(replacement)
            } else {
                result.WriteRune(r)
            }
        }
        
        return result.String()
    }
    
    accentedText := "café naïve résumé piñata"
    fmt.Printf("\nOriginal: %s\n", accentedText)
    fmt.Printf("No diacritics: %s\n", removeDiacritics(accentedText))
}

func main() {
    fmt.Println("Unicode basics:")
    demonstrateUnicodeBasics()
    
    fmt.Println("Unicode operations:")
    demonstrateUnicodeOperations()
    
    fmt.Println("\nUnicode text processing:")
    demonstrateUnicodeProcessing()
    
    fmt.Println("\nAdvanced text processing:")
    demonstrateTextProcessing()
}

Real-World Applications

Text Processing and Parsing

Practical string manipulation for common tasks:

package main

import (
    "fmt"
    "strings"
    "regexp"
    "net/url"
)

// Log parser
type LogEntry struct {
    Timestamp string
    Level     string
    Message   string
    Source    string
}

func parseLogEntry(line string) (*LogEntry, error) {
    // Expected format: "2023-07-30 10:15:30 [INFO] main.go:42 - User login successful"
    parts := strings.SplitN(strings.TrimSpace(line), " ", 5)
    if len(parts) < 5 {
        return nil, fmt.Errorf("invalid log format")
    }
    
    timestamp := parts[0] + " " + parts[1]
    level := strings.Trim(parts[2], "[]")
    source := parts[3]
    message := strings.TrimPrefix(parts[4], "- ")
    
    return &LogEntry{
        Timestamp: timestamp,
        Level:     level,
        Message:   message,
        Source:    source,
    }, nil
}

func demonstrateLogParsing() {
    logLines := []string{
        "2023-07-30 10:15:30 [INFO] main.go:42 - User login successful",
        "2023-07-30 10:15:31 [ERROR] database.go:123 - Connection timeout",
        "2023-07-30 10:15:32 [DEBUG] auth.go:67 - Token validation passed",
        "2023-07-30 10:15:33 [WARN] cache.go:89 - Cache miss for key user:12345",
    }
    
    fmt.Println("Parsed log entries:")
    for _, line := range logLines {
        if entry, err := parseLogEntry(line); err == nil {
            fmt.Printf("  %s [%s] %s (%s)\n", 
                entry.Timestamp, entry.Level, entry.Message, entry.Source)
        }
    }
}

// URL parsing and manipulation
func demonstrateURLProcessing() {
    urls := []string{
        "https://api.example.com/v1/users?page=1&limit=10",
        "http://localhost:8080/admin/dashboard",
        "https://www.example.com/search?q=golang&category=programming",
    }
    
    fmt.Println("\nURL processing:")
    for _, rawURL := range urls {
        parsedURL, err := url.Parse(rawURL)
        if err != nil {
            continue
        }
        
        fmt.Printf("Original: %s\n", rawURL)
        fmt.Printf("  Scheme: %s\n", parsedURL.Scheme)
        fmt.Printf("  Host: %s\n", parsedURL.Host)
        fmt.Printf("  Path: %s\n", parsedURL.Path)
        
        if parsedURL.RawQuery != "" {
            params, _ := url.ParseQuery(parsedURL.RawQuery)
            fmt.Printf("  Query params:\n")
            for key, values := range params {
                fmt.Printf("    %s: %s\n", key, strings.Join(values, ", "))
            }
        }
        fmt.Println()
    }
}

// Email validation and extraction
func demonstrateEmailProcessing() {
    text := `
    Contact us at support@example.com or sales@company.org.
    You can also reach John Doe at john.doe+test@domain.co.uk
    or the team at team-lead@sub.domain.com.
    Invalid emails: @invalid.com, invalid@, plain.text
    `
    
    // Simple email regex (for demonstration - use proper library for production)
    emailRegex := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
    emails := emailRegex.FindAllString(text, -1)
    
    fmt.Println("Extracted emails:")
    uniqueEmails := make(map[string]bool)
    for _, email := range emails {
        email = strings.ToLower(email)
        if !uniqueEmails[email] {
            uniqueEmails[email] = true
            
            // Extract domain
            parts := strings.Split(email, "@")
            if len(parts) == 2 {
                domain := parts[1]
                fmt.Printf("  %s (domain: %s)\n", email, domain)
            }
        }
    }
}

// Configuration file parser
func parseINIConfig(content string) map[string]map[string]string {
    config := make(map[string]map[string]string)
    var currentSection string
    
    lines := strings.Split(content, "\n")
    for _, line := range lines {
        line = strings.TrimSpace(line)
        
        // Skip empty lines and comments
        if line == "" || strings.HasPrefix(line, "#") || strings.HasPrefix(line, ";") {
            continue
        }
        
        // Section header
        if strings.HasPrefix(line, "[") && strings.HasSuffix(line, "]") {
            currentSection = strings.Trim(line, "[]")
            if config[currentSection] == nil {
                config[currentSection] = make(map[string]string)
            }
            continue
        }
        
        // Key-value pair
        if strings.Contains(line, "=") && currentSection != "" {
            parts := strings.SplitN(line, "=", 2)
            key := strings.TrimSpace(parts[0])
            value := strings.TrimSpace(parts[1])
            
            // Remove quotes if present
            if (strings.HasPrefix(value, "\"") && strings.HasSuffix(value, "\"")) ||
               (strings.HasPrefix(value, "'") && strings.HasSuffix(value, "'")) {
                value = value[1 : len(value)-1]
            }
            
            config[currentSection][key] = value
        }
    }
    
    return config
}

func demonstrateConfigParsing() {
    iniContent := `
    # Database configuration
    [database]
    host = localhost
    port = 5432
    name = "myapp_db"
    user = dbuser
    password = 'secret123'
    
    [server]
    port = 8080
    debug = true
    host = 0.0.0.0
    
    [cache]
    enabled = true
    ttl = 3600
    `
    
    config := parseINIConfig(iniContent)
    
    fmt.Println("\nParsed configuration:")
    for section, settings := range config {
        fmt.Printf("[%s]\n", section)
        for key, value := range settings {
            fmt.Printf("  %s = %s\n", key, value)
        }
        fmt.Println()
    }
}

func main() {
    demonstrateLogParsing()
    demonstrateURLProcessing()
    demonstrateEmailProcessing()
    demonstrateConfigParsing()
}

Data Validation and Sanitization

Input cleaning and validation patterns:

package main

import (
    "fmt"
    "strings"
    "unicode"
)

// Input validator
type Validator struct {
    errors []string
}

func NewValidator() *Validator {
    return &Validator{errors: make([]string, 0)}
}

func (v *Validator) AddError(message string) {
    v.errors = append(v.errors, message)
}

func (v *Validator) IsValid() bool {
    return len(v.errors) == 0
}

func (v *Validator) Errors() []string {
    return v.errors
}

func (v *Validator) ValidateRequired(value, fieldName string) {
    if strings.TrimSpace(value) == "" {
        v.AddError(fmt.Sprintf("%s is required", fieldName))
    }
}

func (v *Validator) ValidateLength(value, fieldName string, min, max int) {
    length := len(strings.TrimSpace(value))
    if length < min {
        v.AddError(fmt.Sprintf("%s must be at least %d characters", fieldName, min))
    }
    if length > max {
        v.AddError(fmt.Sprintf("%s must be at most %d characters", fieldName, max))
    }
}

func (v *Validator) ValidateEmail(email string) {
    email = strings.TrimSpace(strings.ToLower(email))
    if email == "" {
        v.AddError("email is required")
        return
    }
    
    parts := strings.Split(email, "@")
    if len(parts) != 2 {
        v.AddError("email format is invalid")
        return
    }
    
    local, domain := parts[0], parts[1]
    
    if local == "" || domain == "" {
        v.AddError("email format is invalid")
        return
    }
    
    if !strings.Contains(domain, ".") {
        v.AddError("email domain is invalid")
        return
    }
}

func (v *Validator) ValidatePassword(password string) {
    if len(password) < 8 {
        v.AddError("password must be at least 8 characters")
    }
    
    var hasUpper, hasLower, hasDigit, hasSpecial bool
    
    for _, r := range password {
        switch {
        case unicode.IsUpper(r):
            hasUpper = true
        case unicode.IsLower(r):
            hasLower = true
        case unicode.IsDigit(r):
            hasDigit = true
        case unicode.IsPunct(r) || unicode.IsSymbol(r):
            hasSpecial = true
        }
    }
    
    if !hasUpper {
        v.AddError("password must contain at least one uppercase letter")
    }
    if !hasLower {
        v.AddError("password must contain at least one lowercase letter")
    }
    if !hasDigit {
        v.AddError("password must contain at least one digit")
    }
    if !hasSpecial {
        v.AddError("password must contain at least one special character")
    }
}

// Input sanitizer
type Sanitizer struct{}

func NewSanitizer() *Sanitizer {
    return &Sanitizer{}
}

func (s *Sanitizer) SanitizeString(input string) string {
    // Trim whitespace
    input = strings.TrimSpace(input)
    
    // Normalize whitespace
    input = strings.Join(strings.Fields(input), " ")
    
    return input
}

func (s *Sanitizer) SanitizeName(name string) string {
    name = s.SanitizeString(name)
    
    // Remove non-letter characters except spaces, hyphens, and apostrophes
    var result strings.Builder
    for _, r := range name {
        if unicode.IsLetter(r) || r == ' ' || r == '-' || r == '\'' {
            result.WriteRune(r)
        }
    }
    
    return strings.TrimSpace(result.String())
}

func (s *Sanitizer) SanitizeEmail(email string) string {
    email = strings.TrimSpace(strings.ToLower(email))
    
    // Remove any characters that aren't valid in email addresses
    var result strings.Builder
    for _, r := range email {
        if unicode.IsLetter(r) || unicode.IsDigit(r) || 
           r == '@' || r == '.' || r == '-' || r == '_' || r == '+' {
            result.WriteRune(r)
        }
    }
    
    return result.String()
}

func (s *Sanitizer) SanitizePhone(phone string) string {
    // Extract only digits
    var result strings.Builder
    for _, r := range phone {
        if unicode.IsDigit(r) {
            result.WriteRune(r)
        }
    }
    
    return result.String()
}

func (s *Sanitizer) SanitizeHTML(input string) string {
    // Basic HTML escape (use proper library for production)
    replacer := strings.NewReplacer(
        "&", "&amp;",
        "<", "&lt;",
        ">", "&gt;",
        "\"", "&quot;",
        "'", "&#39;",
    )
    
    return replacer.Replace(input)
}

func demonstrateValidation() {
    testCases := []struct {
        name     string
        email    string
        password string
    }{
        {"", "invalid-email", "weak"},
        {"John Doe", "john@example.com", "StrongPass123!"},
        {"   ", "@invalid.com", ""},
        {"Alice Smith", "alice.smith@company.org", "MySecurePassword456#"},
    }
    
    fmt.Println("Validation results:")
    for i, tc := range testCases {
        validator := NewValidator()
        
        validator.ValidateRequired(tc.name, "name")
        validator.ValidateLength(tc.name, "name", 2, 50)
        validator.ValidateEmail(tc.email)
        validator.ValidatePassword(tc.password)
        
        fmt.Printf("\nTest case %d:\n", i+1)
        fmt.Printf("  Name: '%s'\n", tc.name)
        fmt.Printf("  Email: '%s'\n", tc.email)
        fmt.Printf("  Password: '%s'\n", tc.password)
        
        if validator.IsValid() {
            fmt.Println("  ✓ Valid")
        } else {
            fmt.Println("  ✗ Invalid:")
            for _, err := range validator.Errors() {
                fmt.Printf("    - %s\n", err)
            }
        }
    }
}

func demonstrateSanitization() {
    sanitizer := NewSanitizer()
    
    testData := map[string]string{
        "messy name":     "  John   O'Connor-Smith  123  ",
        "email":          "  JOHN.DOE+TEST@EXAMPLE.COM  ",
        "phone":          "(555) 123-4567 ext. 890",
        "html content":   "<script>alert('xss')</script>Hello & goodbye",
        "general text":   "  Multiple    spaces   normalized  ",
    }
    
    fmt.Println("\nSanitization results:")
    for label, input := range testData {
        fmt.Printf("\n%s:\n", label)
        fmt.Printf("  Input:  '%s'\n", input)
        
        switch label {
        case "messy name":
            output := sanitizer.SanitizeName(input)
            fmt.Printf("  Output: '%s'\n", output)
        case "email":
            output := sanitizer.SanitizeEmail(input)
            fmt.Printf("  Output: '%s'\n", output)
        case "phone":
            output := sanitizer.SanitizePhone(input)
            fmt.Printf("  Output: '%s'\n", output)
        case "html content":
            output := sanitizer.SanitizeHTML(input)
            fmt.Printf("  Output: '%s'\n", output)
        case "general text":
            output := sanitizer.SanitizeString(input)
            fmt.Printf("  Output: '%s'\n", output)
        }
    }
}

func main() {
    demonstrateValidation()
    demonstrateSanitization()
}

FAQ

Q: When should I use strings.Builder vs fmt.Sprintf for string building? A: Use strings.Builder for concatenating many strings or when building strings in loops, as it's much more efficient. Use fmt.Sprintf for simple formatting with a few variables where readability is more important than performance.

Q: How do I handle strings with different encodings in Go? A: Go strings are UTF-8 by default. For other encodings, use the golang.org/x/text package which provides encoding conversion utilities. Always validate that your input is valid UTF-8 using utf8.ValidString().

Q: What's the difference between len() and utf8.RuneCountInString()? A: len() returns the number of bytes in the string, while utf8.RuneCountInString() returns the number of Unicode characters (runes). For international text, these can be very different.

Q: How do I efficiently search for multiple substrings in a large text? A: For multiple exact matches, use strings.Replacer or build a trie data structure. For pattern matching, use regular expressions with alternation (|). For very large texts, consider using specialized algorithms like Aho-Corasick.

Q: Is it safe to slice strings to create substrings? A: Yes, string slicing is safe and doesn't copy the underlying data. However, be aware that the substring keeps a reference to the original string, which may prevent garbage collection of large strings.

Q: How do I handle case-insensitive string operations efficiently? A: Use strings.EqualFold() for comparisons and strings.ToLower()/ToUpper() for transformations. For repeated operations on the same strings, consider converting once and storing the result.

Conclusion

Mastering Go's string manipulation capabilities is essential for building robust, efficient applications. The key insights from this comprehensive guide include:

  • Understanding string immutability and its performance implications
  • Leveraging the strings package for common operations like searching, splitting, and modification
  • Using strings.Builder for efficient string construction and avoiding unnecessary allocations
  • Properly handling Unicode and international text with UTF-8 awareness
  • Implementing practical patterns for validation, sanitization, and text processing
  • Optimizing string operations for better memory usage and performance

The strings package provides powerful, well-optimized functions for most text processing needs. By combining these with proper Unicode handling and performance-conscious patterns, you can build applications that handle text efficiently and correctly across different languages and character sets.

Ready to optimize your string processing? Start by reviewing your current code for inefficient string concatenation patterns and opportunities to use the strings package functions. Consider implementing proper Unicode handling if your application processes international text. Share your experiences and questions in the comments below!

Share this article

Add Comment

No comments yet. Be the first to comment!

More from Go