Table Of Contents
- Introduction
- Understanding Strings in Go
- Core strings Package Functions
- Performance Optimization
- Unicode and Internationalization
- Real-World Applications
- FAQ
- Conclusion
Introduction
String manipulation is one of the most common tasks in programming, and Go provides powerful tools through its built-in strings
package. Whether you're processing user input, parsing configuration files, building APIs, or handling text data, understanding Go's string manipulation capabilities is essential for writing efficient and maintainable code.
Many developers underestimate the complexity and performance implications of string operations. In Go, strings are immutable, which means every modification creates a new string. This characteristic, combined with Go's Unicode support and UTF-8 encoding, requires careful consideration when designing string-heavy applications.
In this comprehensive guide, we'll explore the strings
package in depth, examine performance characteristics of different string operations, learn about Unicode handling, and discover advanced techniques for efficient text processing. You'll gain practical knowledge that will help you write faster, more reliable string manipulation code.
Understanding Strings in Go
String Fundamentals
Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text:
package main
import (
"fmt"
"unsafe"
)
func demonstrateStringBasics() {
// String literals
s1 := "Hello, World!"
s2 := `Multi-line
string with
raw literals`
s3 := "Unicode: 👋🌍"
fmt.Printf("String 1: %s (len: %d bytes)\n", s1, len(s1))
fmt.Printf("String 2: %s (len: %d bytes)\n", s2, len(s2))
fmt.Printf("String 3: %s (len: %d bytes)\n", s3, len(s3))
// String immutability
original := "Hello"
modified := original + ", World!"
fmt.Printf("Original: %s\n", original)
fmt.Printf("Modified: %s\n", modified)
fmt.Printf("Same memory address: %t\n",
unsafe.StringData(original) == unsafe.StringData(modified))
// Byte vs rune access
text := "Hello, 世界"
fmt.Printf("Byte length: %d\n", len(text))
fmt.Printf("Rune count: %d\n", len([]rune(text)))
// Byte iteration
fmt.Print("Bytes: ")
for i := 0; i < len(text); i++ {
fmt.Printf("%02x ", text[i])
}
fmt.Println()
// Rune iteration
fmt.Print("Runes: ")
for _, r := range text {
fmt.Printf("'%c'(%U) ", r, r)
}
fmt.Println()
}
func main() {
demonstrateStringBasics()
}
String vs []byte vs []rune
Understanding the different string representations:
package main
import (
"fmt"
"time"
"unsafe"
)
func compareStringTypes() {
text := "Hello, 世界! 🌍"
// String (immutable, UTF-8 bytes)
str := text
// []byte (mutable, UTF-8 bytes)
bytes := []byte(text)
// []rune (mutable, Unicode code points)
runes := []rune(text)
fmt.Printf("Original: %s\n", text)
fmt.Printf("String length: %d bytes\n", len(str))
fmt.Printf("Bytes length: %d bytes\n", len(bytes))
fmt.Printf("Runes length: %d runes\n", len(runes))
// Memory usage
fmt.Printf("String memory: %d bytes\n", unsafe.Sizeof(str))
fmt.Printf("[]byte memory: %d bytes\n", unsafe.Sizeof(bytes))
fmt.Printf("[]rune memory: %d bytes\n", unsafe.Sizeof(runes))
// Conversion performance
const iterations = 1000000
// String to []byte
start := time.Now()
for i := 0; i < iterations; i++ {
_ = []byte(text)
}
stringToBytesTime := time.Since(start)
// String to []rune
start = time.Now()
for i := 0; i < iterations; i++ {
_ = []rune(text)
}
stringToRunesTime := time.Since(start)
// []byte to string
start = time.Now()
for i := 0; i < iterations; i++ {
_ = string(bytes)
}
bytesToStringTime := time.Since(start)
fmt.Printf("String to []byte: %v\n", stringToBytesTime)
fmt.Printf("String to []rune: %v\n", stringToRunesTime)
fmt.Printf("[]byte to string: %v\n", bytesToStringTime)
}
func main() {
compareStringTypes()
}
Core strings
Package Functions
Searching and Checking
Essential functions for finding and validating string content:
package main
import (
"fmt"
"strings"
)
func demonstrateSearchingFunctions() {
text := "The quick brown fox jumps over the lazy dog"
// Contains functions
fmt.Printf("Contains 'fox': %t\n", strings.Contains(text, "fox"))
fmt.Printf("Contains any 'xyz': %t\n", strings.ContainsAny(text, "xyz"))
fmt.Printf("Contains rune 'x': %t\n", strings.ContainsRune(text, 'x'))
// Prefix and suffix checking
fmt.Printf("Has prefix 'The': %t\n", strings.HasPrefix(text, "The"))
fmt.Printf("Has suffix 'dog': %t\n", strings.HasSuffix(text, "dog"))
// Index functions
fmt.Printf("Index of 'fox': %d\n", strings.Index(text, "fox"))
fmt.Printf("Last index of 'o': %d\n", strings.LastIndex(text, "o"))
fmt.Printf("Index of any 'xyz': %d\n", strings.IndexAny(text, "xyz"))
fmt.Printf("Index of rune 'q': %d\n", strings.IndexRune(text, 'q'))
// Count occurrences
fmt.Printf("Count of 'o': %d\n", strings.Count(text, "o"))
fmt.Printf("Count of 'the': %d\n", strings.Count(strings.ToLower(text), "the"))
// Case-insensitive operations
fmt.Printf("Equal fold 'THE QUICK' and 'the quick': %t\n",
strings.EqualFold("THE QUICK", "the quick"))
}
func demonstrateAdvancedSearching() {
data := []string{
"user@example.com",
"admin@company.org",
"test@domain.net",
"contact@business.com",
}
// Find emails from specific domains
targetDomains := []string{".com", ".org"}
fmt.Println("Emails from target domains:")
for _, email := range data {
for _, domain := range targetDomains {
if strings.HasSuffix(email, domain) {
fmt.Printf(" %s\n", email)
break
}
}
}
// Extract usernames (before @)
fmt.Println("\nUsernames:")
for _, email := range data {
if idx := strings.Index(email, "@"); idx != -1 {
username := email[:idx]
fmt.Printf(" %s\n", username)
}
}
// Case-insensitive filtering
searchTerm := "ADMIN"
fmt.Printf("\nEmails containing '%s' (case-insensitive):\n", searchTerm)
for _, email := range data {
if strings.Contains(strings.ToUpper(email), searchTerm) {
fmt.Printf(" %s\n", email)
}
}
}
func main() {
demonstrateSearchingFunctions()
fmt.Println()
demonstrateAdvancedSearching()
}
String Modification
Functions for changing string content:
package main
import (
"fmt"
"strings"
)
func demonstrateModificationFunctions() {
text := " Hello, World! "
// Case conversion
fmt.Printf("Original: '%s'\n", text)
fmt.Printf("ToUpper: '%s'\n", strings.ToUpper(text))
fmt.Printf("ToLower: '%s'\n", strings.ToLower(text))
fmt.Printf("Title: '%s'\n", strings.Title(text)) // Deprecated but still used
fmt.Printf("ToTitle: '%s'\n", strings.ToTitle(text))
// Trimming
fmt.Printf("TrimSpace: '%s'\n", strings.TrimSpace(text))
fmt.Printf("Trim 'H el!': '%s'\n", strings.Trim(text, " H!el"))
fmt.Printf("TrimLeft: '%s'\n", strings.TrimLeft(text, " "))
fmt.Printf("TrimRight: '%s'\n", strings.TrimRight(text, " !"))
fmt.Printf("TrimPrefix: '%s'\n", strings.TrimPrefix(text, " Hello"))
fmt.Printf("TrimSuffix: '%s'\n", strings.TrimSuffix(text, "! "))
// Replacement
sample := "The quick brown fox jumps over the lazy dog"
fmt.Printf("Original: %s\n", sample)
fmt.Printf("Replace 'o' with '0': %s\n", strings.Replace(sample, "o", "0", -1))
fmt.Printf("Replace first 'o' with '0': %s\n", strings.Replace(sample, "o", "0", 1))
fmt.Printf("ReplaceAll 'o' with '0': %s\n", strings.ReplaceAll(sample, "o", "0"))
// Multiple replacements
replacer := strings.NewReplacer(
"quick", "fast",
"brown", "red",
"lazy", "sleepy",
)
fmt.Printf("Multiple replacements: %s\n", replacer.Replace(sample))
}
func demonstrateAdvancedModification() {
// Clean and normalize user input
userInput := " JoHn.DoE@EXAMPLE.COM "
cleaned := strings.TrimSpace(strings.ToLower(userInput))
fmt.Printf("Cleaned email: '%s'\n", cleaned)
// Normalize whitespace
messyText := "This has irregular spacing"
normalized := strings.Join(strings.Fields(messyText), " ")
fmt.Printf("Normalized: '%s'\n", normalized)
// Remove unwanted characters
phoneNumber := "(123) 456-7890 ext. 123"
digitsOnly := strings.Map(func(r rune) rune {
if r >= '0' && r <= '9' {
return r
}
return -1 // Remove character
}, phoneNumber)
fmt.Printf("Digits only: '%s'\n", digitsOnly)
// Custom cleaning function
cleanText := func(s string) string {
// Remove leading/trailing whitespace
s = strings.TrimSpace(s)
// Replace multiple spaces with single space
s = strings.Join(strings.Fields(s), " ")
// Remove special characters except letters, numbers, and spaces
return strings.Map(func(r rune) rune {
if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') ||
(r >= '0' && r <= '9') || r == ' ' {
return r
}
return -1
}, s)
}
dirtyText := " Hello@#$ World!!! 123 "
fmt.Printf("Clean text: '%s'\n", cleanText(dirtyText))
}
func main() {
demonstrateModificationFunctions()
fmt.Println()
demonstrateAdvancedModification()
}
String Splitting and Joining
Working with string arrays and delimiters:
package main
import (
"fmt"
"strings"
)
func demonstrateSplittingJoining() {
// Basic splitting
csv := "apple,banana,orange,grape"
fruits := strings.Split(csv, ",")
fmt.Printf("Split CSV: %v\n", fruits)
// Split with limit
text := "a,b,c,d,e"
parts := strings.SplitN(text, ",", 3)
fmt.Printf("SplitN (limit 3): %v\n", parts)
// Split after delimiter
parts = strings.SplitAfter(text, ",")
fmt.Printf("SplitAfter: %v\n", parts)
// Fields (split on whitespace)
sentence := "The quick brown fox"
words := strings.Fields(sentence)
fmt.Printf("Fields: %v\n", words)
// Custom field function
customSplit := strings.FieldsFunc(sentence, func(r rune) bool {
return r == ' ' || r == '\t' || r == '\n'
})
fmt.Printf("Custom fields: %v\n", customSplit)
// Joining
joined := strings.Join(fruits, " | ")
fmt.Printf("Joined with ' | ': %s\n", joined)
// Repeat
pattern := strings.Repeat("*", 10)
fmt.Printf("Repeated pattern: %s\n", pattern)
}
func demonstrateAdvancedSplitting() {
// Parse configuration-like data
config := `
database.host=localhost
database.port=5432
database.name=myapp
server.port=8080
server.debug=true
`
settings := make(map[string]string)
lines := strings.Split(strings.TrimSpace(config), "\n")
for _, line := range lines {
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
continue
}
parts := strings.SplitN(line, "=", 2)
if len(parts) == 2 {
key := strings.TrimSpace(parts[0])
value := strings.TrimSpace(parts[1])
settings[key] = value
}
}
fmt.Println("Parsed configuration:")
for key, value := range settings {
fmt.Printf(" %s = %s\n", key, value)
}
// Parse CSV with quoted fields
csvLine := `"John Doe","Software Engineer","New York, NY","$75,000"`
// Simple CSV parser (for demonstration - use encoding/csv for production)
parseCSV := func(line string) []string {
var fields []string
var current strings.Builder
inQuotes := false
for i, r := range line {
switch r {
case '"':
if inQuotes && i+1 < len(line) && line[i+1] == '"' {
current.WriteRune('"')
i++ // Skip next quote
} else {
inQuotes = !inQuotes
}
case ',':
if !inQuotes {
fields = append(fields, current.String())
current.Reset()
} else {
current.WriteRune(r)
}
default:
current.WriteRune(r)
}
}
fields = append(fields, current.String())
return fields
}
fields := parseCSV(csvLine)
fmt.Printf("\nParsed CSV fields: %v\n", fields)
// Smart word splitting (handle punctuation)
text := "Hello, world! How are you? I'm fine."
smartSplit := func(s string) []string {
return strings.FieldsFunc(s, func(r rune) bool {
return !(r >= 'a' && r <= 'z') && !(r >= 'A' && r <= 'Z') &&
!(r >= '0' && r <= '9') && r != '\''
})
}
words := smartSplit(text)
fmt.Printf("Smart word split: %v\n", words)
}
func main() {
demonstrateSplittingJoining()
fmt.Println()
demonstrateAdvancedSplitting()
}
Performance Optimization
String Building and Concatenation
Efficient ways to build strings:
package main
import (
"fmt"
"strings"
"time"
)
func benchmarkStringConcatenation() {
const iterations = 10000
words := make([]string, iterations)
for i := 0; i < iterations; i++ {
words[i] = fmt.Sprintf("word%d", i)
}
// Method 1: Simple concatenation (inefficient)
start := time.Now()
result1 := ""
for _, word := range words {
result1 += word + " "
}
simpleTime := time.Since(start)
// Method 2: strings.Builder (efficient)
start = time.Now()
var builder strings.Builder
builder.Grow(len(result1)) // Pre-allocate if size is known
for _, word := range words {
builder.WriteString(word)
builder.WriteString(" ")
}
result2 := builder.String()
builderTime := time.Since(start)
// Method 3: strings.Join (most efficient for this case)
start = time.Now()
result3 := strings.Join(words, " ") + " "
joinTime := time.Since(start)
// Method 4: []byte approach
start = time.Now()
var buf []byte
for _, word := range words {
buf = append(buf, word...)
buf = append(buf, ' ')
}
result4 := string(buf)
byteTime := time.Since(start)
fmt.Printf("Results are equal: %t\n",
result1 == result2 && result2 == result3 && result3 == result4)
fmt.Printf("Simple concatenation: %v\n", simpleTime)
fmt.Printf("strings.Builder: %v\n", builderTime)
fmt.Printf("strings.Join: %v\n", joinTime)
fmt.Printf("[]byte approach: %v\n", byteTime)
fmt.Printf("Builder vs Simple: %.2fx faster\n",
float64(simpleTime)/float64(builderTime))
fmt.Printf("Join vs Simple: %.2fx faster\n",
float64(simpleTime)/float64(joinTime))
}
func demonstrateStringBuilder() {
var builder strings.Builder
// Pre-allocate capacity if known
builder.Grow(100)
// Various write methods
builder.WriteString("Hello, ")
builder.WriteRune('W')
builder.WriteRune('o')
builder.WriteRune('r')
builder.WriteRune('l')
builder.WriteRune('d')
builder.WriteByte('!')
result := builder.String()
fmt.Printf("Built string: %s\n", result)
fmt.Printf("Builder capacity: %d\n", builder.Cap())
fmt.Printf("Builder length: %d\n", builder.Len())
// Reset and reuse
builder.Reset()
builder.WriteString("Reused builder")
fmt.Printf("After reset: %s\n", builder.String())
}
func demonstrateEfficientPatterns() {
// Template-like string building
buildTemplate := func(template string, values map[string]string) string {
var builder strings.Builder
builder.Grow(len(template) * 2) // Estimate final size
i := 0
for i < len(template) {
if i < len(template)-1 && template[i] == '{' && template[i+1] == '{' {
// Find closing }}
end := strings.Index(template[i+2:], "}}")
if end != -1 {
key := template[i+2 : i+2+end]
if value, exists := values[key]; exists {
builder.WriteString(value)
} else {
builder.WriteString("{{" + key + "}}")
}
i += 4 + end // Skip past }}
continue
}
}
builder.WriteByte(template[i])
i++
}
return builder.String()
}
template := "Hello {{name}}, welcome to {{app}}! Your ID is {{id}}."
values := map[string]string{
"name": "John",
"app": "MyApp",
"id": "12345",
}
result := buildTemplate(template, values)
fmt.Printf("Template result: %s\n", result)
// Efficient CSV building
buildCSV := func(records [][]string) string {
if len(records) == 0 {
return ""
}
var builder strings.Builder
for i, record := range records {
if i > 0 {
builder.WriteByte('\n')
}
for j, field := range record {
if j > 0 {
builder.WriteByte(',')
}
// Quote field if it contains comma or quotes
if strings.Contains(field, ",") || strings.Contains(field, "\"") {
builder.WriteByte('"')
builder.WriteString(strings.ReplaceAll(field, "\"", "\"\""))
builder.WriteByte('"')
} else {
builder.WriteString(field)
}
}
}
return builder.String()
}
records := [][]string{
{"Name", "Age", "City"},
{"John Doe", "30", "New York"},
{"Jane Smith", "25", "San Francisco"},
{"Bob Johnson", "35", "Chicago, IL"},
}
csv := buildCSV(records)
fmt.Printf("Generated CSV:\n%s\n", csv)
}
func main() {
fmt.Println("String concatenation benchmark:")
benchmarkStringConcatenation()
fmt.Println("\nstrings.Builder demonstration:")
demonstrateStringBuilder()
fmt.Println("\nEfficient string building patterns:")
demonstrateEfficientPatterns()
}
Memory-Efficient String Operations
Techniques to minimize memory allocations:
package main
import (
"fmt"
"strings"
"unsafe"
)
func demonstrateZeroCopyOperations() {
text := "The quick brown fox jumps over the lazy dog"
// Efficient substring without allocation (view into original)
// Note: This keeps the original string in memory
quickBrown := text[4:15] // "quick brown"
fmt.Printf("Substring: '%s'\n", quickBrown)
// Check if they share memory
fmt.Printf("Share memory: %t\n",
unsafe.StringData(text) == unsafe.StringData(quickBrown))
// When to copy vs when to slice
largeSentence := strings.Repeat("This is a very long sentence. ", 1000)
// Bad: Keeping reference to large string for small substring
badWord := largeSentence[0:4] // "This"
// Good: Copy small substring to release large string
goodWord := string([]byte(largeSentence[0:4]))
fmt.Printf("Bad substring shares memory: %t\n",
unsafe.StringData(largeSentence) == unsafe.StringData(badWord))
fmt.Printf("Good substring shares memory: %t\n",
unsafe.StringData(largeSentence) == unsafe.StringData(goodWord))
// Efficient field extraction
parseFields := func(line string) (string, string, string) {
fields := strings.SplitN(line, "|", 3)
if len(fields) != 3 {
return "", "", ""
}
return strings.TrimSpace(fields[0]),
strings.TrimSpace(fields[1]),
strings.TrimSpace(fields[2])
}
data := "John Doe | Software Engineer | New York"
name, title, city := parseFields(data)
fmt.Printf("Name: %s, Title: %s, City: %s\n", name, title, city)
}
func demonstrateInPlaceOperations() {
// In-place rune manipulation
processRunes := func(s string, processor func(rune) rune) string {
runes := []rune(s)
for i, r := range runes {
runes[i] = processor(r)
}
return string(runes)
}
text := "Hello, World!"
// Convert to alternating case
alternating := processRunes(text, func() func(rune) rune {
upper := true
return func(r rune) rune {
if r >= 'a' && r <= 'z' {
if upper {
r = r - 'a' + 'A'
}
upper = !upper
} else if r >= 'A' && r <= 'Z' {
if !upper {
r = r - 'A' + 'a'
}
upper = !upper
}
return r
}
}())
fmt.Printf("Alternating case: %s\n", alternating)
// Efficient character filtering
filterChars := func(s string, keep func(rune) bool) string {
var buf []byte
for _, r := range s {
if keep(r) {
buf = append(buf, string(r)...)
}
}
return string(buf)
}
mixed := "Hello123World456!"
lettersOnly := filterChars(mixed, func(r rune) bool {
return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z')
})
fmt.Printf("Letters only: %s\n", lettersOnly)
}
func demonstrateStringPool() {
// String interning for memory efficiency
type StringPool struct {
pool map[string]string
}
func NewStringPool() *StringPool {
return &StringPool{pool: make(map[string]string)}
}
func (sp *StringPool) Intern(s string) string {
if interned, exists := sp.pool[s]; exists {
return interned
}
sp.pool[s] = s
return s
}
func (sp *StringPool) Size() int {
return len(sp.pool)
}
pool := NewStringPool()
// Simulate repeated string operations
commonStrings := []string{"GET", "POST", "PUT", "DELETE", "application/json"}
var requests []string
for i := 0; i < 1000; i++ {
method := commonStrings[i%len(commonStrings)]
internedMethod := pool.Intern(method)
requests = append(requests, internedMethod)
}
fmt.Printf("Pool size: %d unique strings\n", pool.Size())
fmt.Printf("Total requests: %d\n", len(requests))
// All identical strings share the same memory
firstGET := -1
secondGET := -1
for i, req := range requests {
if req == "GET" {
if firstGET == -1 {
firstGET = i
} else if secondGET == -1 {
secondGET = i
break
}
}
}
if firstGET != -1 && secondGET != -1 {
fmt.Printf("Same GET strings share memory: %t\n",
unsafe.StringData(requests[firstGET]) == unsafe.StringData(requests[secondGET]))
}
}
func main() {
fmt.Println("Zero-copy string operations:")
demonstrateZeroCopyOperations()
fmt.Println("\nIn-place string operations:")
demonstrateInPlaceOperations()
fmt.Println("\nString pooling for memory efficiency:")
demonstrateStringPool()
}
Unicode and Internationalization
Working with Unicode
Proper handling of international characters:
package main
import (
"fmt"
"strings"
"unicode"
"unicode/utf8"
)
func demonstrateUnicodeBasics() {
// Unicode strings
texts := []string{
"Hello", // ASCII
"Héllo", // Latin with accents
"Привет", // Cyrillic
"こんにちは", // Japanese Hiragana
"🌍🚀🎉", // Emojis
"नमस्ते", // Devanagari (Hindi)
}
for _, text := range texts {
fmt.Printf("Text: %s\n", text)
fmt.Printf(" Byte length: %d\n", len(text))
fmt.Printf(" Rune count: %d\n", utf8.RuneCountInString(text))
fmt.Printf(" Valid UTF-8: %t\n", utf8.ValidString(text))
// Print individual runes
fmt.Print(" Runes: ")
for _, r := range text {
fmt.Printf("U+%04X('%c') ", r, r)
}
fmt.Println()
fmt.Println()
}
}
func demonstrateUnicodeOperations() {
text := "Café ñoño 123 🎉"
// Count different character types
var letters, digits, spaces, symbols int
for _, r := range text {
switch {
case unicode.IsLetter(r):
letters++
case unicode.IsDigit(r):
digits++
case unicode.IsSpace(r):
spaces++
default:
symbols++
}
}
fmt.Printf("Text: %s\n", text)
fmt.Printf("Letters: %d, Digits: %d, Spaces: %d, Symbols: %d\n",
letters, digits, spaces, symbols)
// Unicode normalization and comparison
text1 := "café" // é as single character
text2 := "cafe\u0301" // e + combining acute accent
fmt.Printf("Text 1: %s (length: %d)\n", text1, len(text1))
fmt.Printf("Text 2: %s (length: %d)\n", text2, len(text2))
fmt.Printf("Equal: %t\n", text1 == text2)
fmt.Printf("EqualFold: %t\n", strings.EqualFold(text1, text2))
// Case conversion with Unicode
multilingual := "Istanbul İstanbul ISTANBUL"
fmt.Printf("Original: %s\n", multilingual)
fmt.Printf("ToLower: %s\n", strings.ToLower(multilingual))
fmt.Printf("ToUpper: %s\n", strings.ToUpper(multilingual))
}
func demonstrateUnicodeProcessing() {
// Extract words from multilingual text
extractWords := func(text string) []string {
var words []string
var current strings.Builder
for _, r := range text {
if unicode.IsLetter(r) || unicode.IsDigit(r) {
current.WriteRune(r)
} else {
if current.Len() > 0 {
words = append(words, current.String())
current.Reset()
}
}
}
if current.Len() > 0 {
words = append(words, current.String())
}
return words
}
multiText := "Hello, 世界! How are you? 元気ですか?"
words := extractWords(multiText)
fmt.Printf("Extracted words: %v\n", words)
// Character classification
classifyText := func(text string) map[string]int {
stats := map[string]int{
"latin": 0,
"cyrillic": 0,
"cjk": 0,
"arabic": 0,
"digit": 0,
"symbol": 0,
"other": 0,
}
for _, r := range text {
switch {
case unicode.IsDigit(r):
stats["digit"]++
case unicode.IsSymbol(r) || unicode.IsPunct(r):
stats["symbol"]++
case unicode.In(r, unicode.Latin):
stats["latin"]++
case unicode.In(r, unicode.Cyrillic):
stats["cyrillic"]++
case unicode.In(r, unicode.Han, unicode.Hiragana, unicode.Katakana):
stats["cjk"]++
case unicode.In(r, unicode.Arabic):
stats["arabic"]++
default:
stats["other"]++
}
}
return stats
}
mixedText := "Hello Привет 你好 مرحبا 123 🎉"
stats := classifyText(mixedText)
fmt.Printf("\nCharacter classification for: %s\n", mixedText)
for script, count := range stats {
if count > 0 {
fmt.Printf(" %s: %d\n", script, count)
}
}
}
func demonstrateTextProcessing() {
// Smart truncation that respects word boundaries and Unicode
smartTruncate := func(text string, maxLen int) string {
if utf8.RuneCountInString(text) <= maxLen {
return text
}
runes := []rune(text)
if len(runes) <= maxLen {
return text
}
// Try to truncate at word boundary
truncated := runes[:maxLen]
lastSpace := -1
for i := len(truncated) - 1; i >= 0; i-- {
if unicode.IsSpace(truncated[i]) {
lastSpace = i
break
}
}
if lastSpace > maxLen/2 { // Don't truncate too much
truncated = truncated[:lastSpace]
}
return string(truncated) + "..."
}
longText := "This is a very long text with international characters like café, naïve, and résumé that needs to be truncated properly."
fmt.Printf("Original: %s\n", longText)
fmt.Printf("Truncated (30): %s\n", smartTruncate(longText, 30))
fmt.Printf("Truncated (50): %s\n", smartTruncate(longText, 50))
// Remove diacritics (simplified approach)
removeDiacritics := func(text string) string {
// This is a simplified version - for production use golang.org/x/text/transform
replacements := map[rune]rune{
'á': 'a', 'à': 'a', 'ä': 'a', 'â': 'a', 'ã': 'a',
'é': 'e', 'è': 'e', 'ë': 'e', 'ê': 'e',
'í': 'i', 'ì': 'i', 'ï': 'i', 'î': 'i',
'ó': 'o', 'ò': 'o', 'ö': 'o', 'ô': 'o', 'õ': 'o',
'ú': 'u', 'ù': 'u', 'ü': 'u', 'û': 'u',
'ñ': 'n', 'ç': 'c',
}
var result strings.Builder
for _, r := range text {
if replacement, exists := replacements[r]; exists {
result.WriteRune(replacement)
} else {
result.WriteRune(r)
}
}
return result.String()
}
accentedText := "café naïve résumé piñata"
fmt.Printf("\nOriginal: %s\n", accentedText)
fmt.Printf("No diacritics: %s\n", removeDiacritics(accentedText))
}
func main() {
fmt.Println("Unicode basics:")
demonstrateUnicodeBasics()
fmt.Println("Unicode operations:")
demonstrateUnicodeOperations()
fmt.Println("\nUnicode text processing:")
demonstrateUnicodeProcessing()
fmt.Println("\nAdvanced text processing:")
demonstrateTextProcessing()
}
Real-World Applications
Text Processing and Parsing
Practical string manipulation for common tasks:
package main
import (
"fmt"
"strings"
"regexp"
"net/url"
)
// Log parser
type LogEntry struct {
Timestamp string
Level string
Message string
Source string
}
func parseLogEntry(line string) (*LogEntry, error) {
// Expected format: "2023-07-30 10:15:30 [INFO] main.go:42 - User login successful"
parts := strings.SplitN(strings.TrimSpace(line), " ", 5)
if len(parts) < 5 {
return nil, fmt.Errorf("invalid log format")
}
timestamp := parts[0] + " " + parts[1]
level := strings.Trim(parts[2], "[]")
source := parts[3]
message := strings.TrimPrefix(parts[4], "- ")
return &LogEntry{
Timestamp: timestamp,
Level: level,
Message: message,
Source: source,
}, nil
}
func demonstrateLogParsing() {
logLines := []string{
"2023-07-30 10:15:30 [INFO] main.go:42 - User login successful",
"2023-07-30 10:15:31 [ERROR] database.go:123 - Connection timeout",
"2023-07-30 10:15:32 [DEBUG] auth.go:67 - Token validation passed",
"2023-07-30 10:15:33 [WARN] cache.go:89 - Cache miss for key user:12345",
}
fmt.Println("Parsed log entries:")
for _, line := range logLines {
if entry, err := parseLogEntry(line); err == nil {
fmt.Printf(" %s [%s] %s (%s)\n",
entry.Timestamp, entry.Level, entry.Message, entry.Source)
}
}
}
// URL parsing and manipulation
func demonstrateURLProcessing() {
urls := []string{
"https://api.example.com/v1/users?page=1&limit=10",
"http://localhost:8080/admin/dashboard",
"https://www.example.com/search?q=golang&category=programming",
}
fmt.Println("\nURL processing:")
for _, rawURL := range urls {
parsedURL, err := url.Parse(rawURL)
if err != nil {
continue
}
fmt.Printf("Original: %s\n", rawURL)
fmt.Printf(" Scheme: %s\n", parsedURL.Scheme)
fmt.Printf(" Host: %s\n", parsedURL.Host)
fmt.Printf(" Path: %s\n", parsedURL.Path)
if parsedURL.RawQuery != "" {
params, _ := url.ParseQuery(parsedURL.RawQuery)
fmt.Printf(" Query params:\n")
for key, values := range params {
fmt.Printf(" %s: %s\n", key, strings.Join(values, ", "))
}
}
fmt.Println()
}
}
// Email validation and extraction
func demonstrateEmailProcessing() {
text := `
Contact us at support@example.com or sales@company.org.
You can also reach John Doe at john.doe+test@domain.co.uk
or the team at team-lead@sub.domain.com.
Invalid emails: @invalid.com, invalid@, plain.text
`
// Simple email regex (for demonstration - use proper library for production)
emailRegex := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
emails := emailRegex.FindAllString(text, -1)
fmt.Println("Extracted emails:")
uniqueEmails := make(map[string]bool)
for _, email := range emails {
email = strings.ToLower(email)
if !uniqueEmails[email] {
uniqueEmails[email] = true
// Extract domain
parts := strings.Split(email, "@")
if len(parts) == 2 {
domain := parts[1]
fmt.Printf(" %s (domain: %s)\n", email, domain)
}
}
}
}
// Configuration file parser
func parseINIConfig(content string) map[string]map[string]string {
config := make(map[string]map[string]string)
var currentSection string
lines := strings.Split(content, "\n")
for _, line := range lines {
line = strings.TrimSpace(line)
// Skip empty lines and comments
if line == "" || strings.HasPrefix(line, "#") || strings.HasPrefix(line, ";") {
continue
}
// Section header
if strings.HasPrefix(line, "[") && strings.HasSuffix(line, "]") {
currentSection = strings.Trim(line, "[]")
if config[currentSection] == nil {
config[currentSection] = make(map[string]string)
}
continue
}
// Key-value pair
if strings.Contains(line, "=") && currentSection != "" {
parts := strings.SplitN(line, "=", 2)
key := strings.TrimSpace(parts[0])
value := strings.TrimSpace(parts[1])
// Remove quotes if present
if (strings.HasPrefix(value, "\"") && strings.HasSuffix(value, "\"")) ||
(strings.HasPrefix(value, "'") && strings.HasSuffix(value, "'")) {
value = value[1 : len(value)-1]
}
config[currentSection][key] = value
}
}
return config
}
func demonstrateConfigParsing() {
iniContent := `
# Database configuration
[database]
host = localhost
port = 5432
name = "myapp_db"
user = dbuser
password = 'secret123'
[server]
port = 8080
debug = true
host = 0.0.0.0
[cache]
enabled = true
ttl = 3600
`
config := parseINIConfig(iniContent)
fmt.Println("\nParsed configuration:")
for section, settings := range config {
fmt.Printf("[%s]\n", section)
for key, value := range settings {
fmt.Printf(" %s = %s\n", key, value)
}
fmt.Println()
}
}
func main() {
demonstrateLogParsing()
demonstrateURLProcessing()
demonstrateEmailProcessing()
demonstrateConfigParsing()
}
Data Validation and Sanitization
Input cleaning and validation patterns:
package main
import (
"fmt"
"strings"
"unicode"
)
// Input validator
type Validator struct {
errors []string
}
func NewValidator() *Validator {
return &Validator{errors: make([]string, 0)}
}
func (v *Validator) AddError(message string) {
v.errors = append(v.errors, message)
}
func (v *Validator) IsValid() bool {
return len(v.errors) == 0
}
func (v *Validator) Errors() []string {
return v.errors
}
func (v *Validator) ValidateRequired(value, fieldName string) {
if strings.TrimSpace(value) == "" {
v.AddError(fmt.Sprintf("%s is required", fieldName))
}
}
func (v *Validator) ValidateLength(value, fieldName string, min, max int) {
length := len(strings.TrimSpace(value))
if length < min {
v.AddError(fmt.Sprintf("%s must be at least %d characters", fieldName, min))
}
if length > max {
v.AddError(fmt.Sprintf("%s must be at most %d characters", fieldName, max))
}
}
func (v *Validator) ValidateEmail(email string) {
email = strings.TrimSpace(strings.ToLower(email))
if email == "" {
v.AddError("email is required")
return
}
parts := strings.Split(email, "@")
if len(parts) != 2 {
v.AddError("email format is invalid")
return
}
local, domain := parts[0], parts[1]
if local == "" || domain == "" {
v.AddError("email format is invalid")
return
}
if !strings.Contains(domain, ".") {
v.AddError("email domain is invalid")
return
}
}
func (v *Validator) ValidatePassword(password string) {
if len(password) < 8 {
v.AddError("password must be at least 8 characters")
}
var hasUpper, hasLower, hasDigit, hasSpecial bool
for _, r := range password {
switch {
case unicode.IsUpper(r):
hasUpper = true
case unicode.IsLower(r):
hasLower = true
case unicode.IsDigit(r):
hasDigit = true
case unicode.IsPunct(r) || unicode.IsSymbol(r):
hasSpecial = true
}
}
if !hasUpper {
v.AddError("password must contain at least one uppercase letter")
}
if !hasLower {
v.AddError("password must contain at least one lowercase letter")
}
if !hasDigit {
v.AddError("password must contain at least one digit")
}
if !hasSpecial {
v.AddError("password must contain at least one special character")
}
}
// Input sanitizer
type Sanitizer struct{}
func NewSanitizer() *Sanitizer {
return &Sanitizer{}
}
func (s *Sanitizer) SanitizeString(input string) string {
// Trim whitespace
input = strings.TrimSpace(input)
// Normalize whitespace
input = strings.Join(strings.Fields(input), " ")
return input
}
func (s *Sanitizer) SanitizeName(name string) string {
name = s.SanitizeString(name)
// Remove non-letter characters except spaces, hyphens, and apostrophes
var result strings.Builder
for _, r := range name {
if unicode.IsLetter(r) || r == ' ' || r == '-' || r == '\'' {
result.WriteRune(r)
}
}
return strings.TrimSpace(result.String())
}
func (s *Sanitizer) SanitizeEmail(email string) string {
email = strings.TrimSpace(strings.ToLower(email))
// Remove any characters that aren't valid in email addresses
var result strings.Builder
for _, r := range email {
if unicode.IsLetter(r) || unicode.IsDigit(r) ||
r == '@' || r == '.' || r == '-' || r == '_' || r == '+' {
result.WriteRune(r)
}
}
return result.String()
}
func (s *Sanitizer) SanitizePhone(phone string) string {
// Extract only digits
var result strings.Builder
for _, r := range phone {
if unicode.IsDigit(r) {
result.WriteRune(r)
}
}
return result.String()
}
func (s *Sanitizer) SanitizeHTML(input string) string {
// Basic HTML escape (use proper library for production)
replacer := strings.NewReplacer(
"&", "&",
"<", "<",
">", ">",
"\"", """,
"'", "'",
)
return replacer.Replace(input)
}
func demonstrateValidation() {
testCases := []struct {
name string
email string
password string
}{
{"", "invalid-email", "weak"},
{"John Doe", "john@example.com", "StrongPass123!"},
{" ", "@invalid.com", ""},
{"Alice Smith", "alice.smith@company.org", "MySecurePassword456#"},
}
fmt.Println("Validation results:")
for i, tc := range testCases {
validator := NewValidator()
validator.ValidateRequired(tc.name, "name")
validator.ValidateLength(tc.name, "name", 2, 50)
validator.ValidateEmail(tc.email)
validator.ValidatePassword(tc.password)
fmt.Printf("\nTest case %d:\n", i+1)
fmt.Printf(" Name: '%s'\n", tc.name)
fmt.Printf(" Email: '%s'\n", tc.email)
fmt.Printf(" Password: '%s'\n", tc.password)
if validator.IsValid() {
fmt.Println(" ✓ Valid")
} else {
fmt.Println(" ✗ Invalid:")
for _, err := range validator.Errors() {
fmt.Printf(" - %s\n", err)
}
}
}
}
func demonstrateSanitization() {
sanitizer := NewSanitizer()
testData := map[string]string{
"messy name": " John O'Connor-Smith 123 ",
"email": " JOHN.DOE+TEST@EXAMPLE.COM ",
"phone": "(555) 123-4567 ext. 890",
"html content": "<script>alert('xss')</script>Hello & goodbye",
"general text": " Multiple spaces normalized ",
}
fmt.Println("\nSanitization results:")
for label, input := range testData {
fmt.Printf("\n%s:\n", label)
fmt.Printf(" Input: '%s'\n", input)
switch label {
case "messy name":
output := sanitizer.SanitizeName(input)
fmt.Printf(" Output: '%s'\n", output)
case "email":
output := sanitizer.SanitizeEmail(input)
fmt.Printf(" Output: '%s'\n", output)
case "phone":
output := sanitizer.SanitizePhone(input)
fmt.Printf(" Output: '%s'\n", output)
case "html content":
output := sanitizer.SanitizeHTML(input)
fmt.Printf(" Output: '%s'\n", output)
case "general text":
output := sanitizer.SanitizeString(input)
fmt.Printf(" Output: '%s'\n", output)
}
}
}
func main() {
demonstrateValidation()
demonstrateSanitization()
}
FAQ
Q: When should I use strings.Builder vs fmt.Sprintf for string building? A: Use strings.Builder for concatenating many strings or when building strings in loops, as it's much more efficient. Use fmt.Sprintf for simple formatting with a few variables where readability is more important than performance.
Q: How do I handle strings with different encodings in Go? A: Go strings are UTF-8 by default. For other encodings, use the golang.org/x/text package which provides encoding conversion utilities. Always validate that your input is valid UTF-8 using utf8.ValidString().
Q: What's the difference between len() and utf8.RuneCountInString()? A: len() returns the number of bytes in the string, while utf8.RuneCountInString() returns the number of Unicode characters (runes). For international text, these can be very different.
Q: How do I efficiently search for multiple substrings in a large text? A: For multiple exact matches, use strings.Replacer or build a trie data structure. For pattern matching, use regular expressions with alternation (|). For very large texts, consider using specialized algorithms like Aho-Corasick.
Q: Is it safe to slice strings to create substrings? A: Yes, string slicing is safe and doesn't copy the underlying data. However, be aware that the substring keeps a reference to the original string, which may prevent garbage collection of large strings.
Q: How do I handle case-insensitive string operations efficiently? A: Use strings.EqualFold() for comparisons and strings.ToLower()/ToUpper() for transformations. For repeated operations on the same strings, consider converting once and storing the result.
Conclusion
Mastering Go's string manipulation capabilities is essential for building robust, efficient applications. The key insights from this comprehensive guide include:
- Understanding string immutability and its performance implications
- Leveraging the
strings
package for common operations like searching, splitting, and modification - Using strings.Builder for efficient string construction and avoiding unnecessary allocations
- Properly handling Unicode and international text with UTF-8 awareness
- Implementing practical patterns for validation, sanitization, and text processing
- Optimizing string operations for better memory usage and performance
The strings
package provides powerful, well-optimized functions for most text processing needs. By combining these with proper Unicode handling and performance-conscious patterns, you can build applications that handle text efficiently and correctly across different languages and character sets.
Ready to optimize your string processing? Start by reviewing your current code for inefficient string concatenation patterns and opportunities to use the strings
package functions. Consider implementing proper Unicode handling if your application processes international text. Share your experiences and questions in the comments below!
Add Comment
No comments yet. Be the first to comment!