Go 正则表达式：regexp 包的使用

正则表达式是处理文本的利器。无论是验证用户输入、解析日志文件，还是提取网页数据，它都能帮你用几行代码搞定复杂任务。Go 标准库提供了 regexp 包，功能全面且性能优异。

这篇文章将手把手带你掌握 Go 正则表达式，从基础匹配到高阶用法，学完就能直接应用到实际项目中。

一、为什么选择 regexp 包

Go 的 regexp 包基于 Google 开发的 RE2 引擎。相比 PCRE 或 PCRE2 等传统引擎，RE2 有两个显著优势：

第一，线性时间复杂度。无论正则表达式多复杂，匹配耗时与输入文本长度成线性关系，不会出现回溯导致的指数级性能爆炸。第二，线程安全。编译后的正则对象可以在多个 goroutine 中并发使用，无需额外加锁。

这意味着你可以放心地在高并发场景下使用正则表达式，不用担心某个复杂的模式拖垮整个服务。

二、快速上手：两种使用方式

regexp 包提供了两种使用方式，直接用函数或先编译后使用。

2.1 直接使用快捷函数

对于一次性匹配场景，快捷函数最方便：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 检查字符串是否匹配模式
    matched := regexp.MatchString(`\d+`, "abc123def")
    fmt.Println(matched) // true

    // 查找匹配的字符串
    result := regexp.FindString(`\w+\d+`, "hello123world")
    fmt.Println(result) // hello123

    // 替换字符串
    replaced := regexp.ReplaceAllString(`\d+`, "a1b22c333", "X")
    fmt.Println(replaced) // aXbXcX
}

这种方式会自动编译正则表达式，适合简单场景。但如果你要在循环中反复使用同一个模式，必须先编译，否则每次都会重新编译，造成不必要的性能开销。

2.2 先编译后使用

推荐的生产用法是先调用 Compile 或 MustCompile 获得一个正则对象，然后用对象的方法进行匹配：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // 编译正则表达式
    re, err := regexp.Compile(`\b\d{3,5}\b`) // 匹配 3-5 位数字组成的单词
    if err != nil {
        panic(err)
    }

    // 使用正则对象
    fmt.Println(re.MatchString("hello"))     // false
    fmt.Println(re.MatchString("123"))       // true
    fmt.Println(re.MatchString("12345"))     // true
    fmt.Println(re.MatchString("123456"))    // false

    // 提取匹配的字符串
    text := "用户 ID: 10086, 订单号: 2023001"
    fmt.Println(re.FindString(text))         // 10086
    fmt.Println(re.FindAllString(text, -1))  // [10086 2023001]
}

MustCompile 是编译函数的变种，编译失败时直接 panic。在初始化阶段或配置加载时使用它很合适，省去繁琐的错误处理。如果正则来自用户输入或配置文件，用 Compile 更安全。

三、核心匹配方法一览

编译后的正则对象提供了一系列方法，按功能可分为三类：检测存在、提取内容、替换内容。

3.1 检测是否存在匹配

方法	说明
`Match(b []byte) bool`	检查字节切片是否有匹配
`MatchString(s string) bool`	检查字符串是否有匹配
`MatchReader(r io.RuneReader) bool`	检查 io.RuneReader 是否有匹配

re := regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
email := "user@example.com"
if re.MatchString(email) {
    fmt.Println("邮箱格式正确")
}
```

### 3.2 提取匹配内容

| 方法 | 说明 |
|------|------|
| `Find(b []byte) []byte` | 返回第一个匹配的字节切片 |
| `FindString(s string) string` | 返回第一个匹配的字符串 |
| `FindAll(b []byte, n int) [][]byte` | 返回所有匹配的字节切片，n 为数量限制 |
| `FindAllString(s string, n int) []string` | 返回所有匹配的字符串，n 为数量限制 |

```go
re := regexp.MustCompile(`#\w+`) // 匹配微博话题标签
text := "Go 是#最棒的编程语言#，我爱#Golang#！"

fmt.Println(re.FindString(text))           // #最棒的编程语言
fmt.Println(re.FindAllString(text, -1))    // [#最棒的编程语言 #Golang#]
fmt.Println(re.FindAllString(text, 1))     // [#最棒的编程语言]
```

当 `n < 0` 时，返回所有匹配结果。当 `n > 0` 时，最多返回 `n` 个匹配。

### 3.3 替换匹配内容

| 方法 | 说明 |
|------|------|
| `ReplaceAll(b, repl []byte) []byte` | 替换所有匹配的字节切片 |
| `ReplaceAllString(s, repl string) string` | 替换所有匹配的字符串 |
| `ReplaceAllFunc(b []byte, f func([]byte) []byte) []byte` | 用函数处理匹配结果后替换 |

```go
re := regexp.MustCompile(`\s+`) // 匹配一个或多个空白字符
text := "这 是   一段   有 很多 空格 的  文字"

fmt.Println(re.ReplaceAllString(text, " "))  // 这 是 一段 有很多 空格 的 文字

// 用函数转换匹配结果：大写转小写，小写转大写
re2 := regexp.MustCompile(`[a-zA-Z]`)
result := re2.ReplaceAllStringFunc("Hello World", func(s string) string {
    if s >= "a" && s <= "z" {
        return strings.ToUpper(s)
    }
    return strings.ToLower(s)
})
fmt.Println(result) // hELLO wORLD
```

---

## 四、捕获组：提取子匹配

捕获组用圆括号 `()` 定义，可以从匹配结果中提取特定的子部分。匹配结果会包含完整匹配和各个捕获组。

### 4.1 基本用法

```go
// 解析日期格式：YYYY-MM-DD
re := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
text := "Today is 2024-01-15"

match := re.FindStringSubmatch(text)
fmt.Println(match)
// 输出: [2024-01-15 2024 01 15]
// match[0] 是完整匹配，match[1] 是第一个捕获组，依此类推

if len(match) == 4 {
    fmt.Printf("年份: %s\n", match[1]) // 2024
    fmt.Printf("月份: %s\n", match[2]) // 01
    fmt.Printf("日期: %s\n", match[3]) // 15
}
```

### 4.2 命名捕获组

Go 1.15 起支持命名捕获组，用 `?P<name>` 语法，可以给捕获组起个语义化的名字：

```go
re := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})`)
text := "Date: 2024-05-20"

match := re.FindStringSubmatch(text)
names := re.SubexpNames()

fmt.Printf("完整匹配: %s\n", match[0])    // 2024-05-20
for i, name := range names {
    if name != "" && i < len(match) {
        fmt.Printf("%s: %s\n", name, match[i])
    }
}
// year: 2024
// month: 05
// day: 20
```

命名捕获组让代码更易读，尤其是复杂的正则中有多个捕获组时。

### 4.3 非捕获组

如果你只需要分组但不想捕获，用 `(?:...)` 语法：

```go
// 匹配日期，分组但不捕获月份和日期
re := regexp.MustCompile(`\d{4}-(?:\d{2})-(?:\d{2})`)
text := "2024-01-15"

match := re.FindStringSubmatch(text)
fmt.Println(match) // [2024-01-15]
fmt.Println(len(match)) // 1，只有完整匹配，没有子捕获
```

---

## 五、零宽断言：精确匹配位置

零宽断言不匹配任何字符，只匹配一个位置。根据匹配方向分为四种。

### 5.1 先行断言 `(?=...)`

匹配后面紧跟特定模式的位置：

```go
// 匹配后面跟着 "元" 的数字
re := regexp.MustCompile(`\d+(?=元)`)
text := "苹果5元，香蕉3元，橙子4元"

matches := re.FindAllString(text, -1)
fmt.Println(matches) // [5 3 4]
```

### 5.2 负先行断言 `(?!...)`

匹配后面**不**跟特定模式的位置：

```go
// 匹配不在行末的数字
re := regexp.MustCompile(`\d+(?!\s*$)`)
text := "数字: 1\n数字: 42\n数字: 999"

lines := strings.Split(text, "\n")
for _, line := range lines {
    if re.MatchString(line) {
        fmt.Println(line) // 数字: 1 和 数字: 42
    }
}

5.3 后行断言 `(?<=...)`

匹配前面有特定模式的位置：

// 匹配跟在 "价格:" 后面的数字
re := regexp.MustCompile(`(?<=价格:)\d+`)
text := "商品A价格:100商品B价格:200"

matches := re.FindAllString(text, -1)
fmt.Println(matches) // [100 200]

5.4 负后行断言 `(?<!...)`

匹配前面没有特定模式的位置：

// 匹配不在 @ 后面的数字
re := regexp.MustCompile(`(?<!@)\d+`)
text := "邮箱: test@123.com"

matches := re.FindAllString(text, -1)
fmt.Println(matches) // [123]（注意：这里只是示例，正则匹配的是数字，不是邮箱）

六、常用模式速查

以下是开发中经常用到的正则模式：

场景	正则	说明
邮箱	^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$` \| 标准邮箱格式 \| \| 手机号 \| `^1[3-9]\d{9}$	中国大陆手机号
身份证	^[1-9]\d{5}(18\|19\|20)\d{2}(0[1-9]\|1[0-2])(0[1-9]\|[12]\d\|3[01])\d{3}(\d\|X)$` \| 18位身份证 \| \| URL \| `^https?://[^\s]+$	HTTP/HTTPS URL
IP 地址	^(\d{1,3}\.){3}\d{1,3}$` \| IPv4 地址 \| \| 日期 \| `^\d{4}-\d{2}-\d{2}$	YYYY-MM-DD 格式
时间	^([01]\d\|2[0-3]):[0-5]\d$` \| HH:MM 格式 \| ```go // 完整验证函数示例 func ValidatePhone(phone string) bool { re := regexp.MustCompile(`^1[3-9]\d{9}$)

return re.MatchString(phone)

}

func ValidateEmail(email string) bool {
re := regexp.MustCompile(^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`) return re.MatchString(phone) } ``` --- ## 七、性能优化技巧正则表达式的性能差异巨大。一个写得不合理的模式可能导致匹配耗时呈指数级增长。以下是几个关键优化建议： **第一，避免嵌套量词**。像 `(a+)+` 这样的模式在匹配 `aaaa...a` 时会导致灾难性回溯。Go 的 RE2 引擎不会回溯，但它仍然会在某些边缘情况下出问题。 **第二，使用字符类代替点号**。`.*?` 会匹配任何字符，如果确定要匹配的是字母，用 `[a-zA-Z]*?` 更高效。 **第三，尽可能使用锚点**。`^` 和 `$ 锚定匹配边界，Go 能更高效地处理。

第四，预编译常用正则。在 init() 函数或包变量中预编译正则：

var (
    emailRE    = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
    phoneRE    = regexp.MustCompile(`^1[3-9]\d{9}$`)
    numberRE   = regexp.MustCompile(`^\d+$`)
)

func ValidateInput(input, typeName string) bool {
    switch typeName {
    case "email":
        return emailRE.MatchString(input)
    case "phone":
        return phoneRE.MatchString(input)
    case "number":
        return numberRE.MatchString(input)
    default:
        return false
    }
}

八、完整实战示例

综合运用以上知识，编写一个解析 HTTP 日志行的函数：

package main

import (
    "fmt"
    "regexp"
    "strings"
)

// 日志行格式：127.0.0.1 - - [10/Jan/2024:13:55:36 +0800] "GET /api/users HTTP/1.1" 200 1234
type LogEntry struct {
    IP        string
    Timestamp string
    Method    string
    Path      string
    Status    int
    Size      int
}

func ParseHTTPLog(line string) (*LogEntry, error) {
    // 命名捕获组让代码自文档化
    pattern := `^(?P<IP>\d{1,3}(?:\.\d{1,3}){3}) - - \[(?P<Timestamp>[^\]]+)\] ` +
               `"(?P<Method>[A-Z]+) (?P<Path>[^\s]+) [^"]+" ` +
               `(?P<Status>\d{3}) (?P<Size>\d+)`

    re, err := regexp.Compile(pattern)
    if err != nil {
        return nil, err
    }

    match := re.FindStringSubmatch(line)
    if match == nil {
        return nil, fmt.Errorf("无法解析日志行: %s", strings.TrimSpace(line))
    }

    result := &LogEntry{}
    for i, name := range re.SubexpNames() {
        if i > 0 && i < len(match) {
            switch name {
            case "IP":
                result.IP = match[i]
            case "Timestamp":
                result.Timestamp = match[i]
            case "Method":
                result.Method = match[i]
            case "Path":
                result.Path = match[i]
            case "Status":
                fmt.Sscanf(match[i], "%d", &result.Status)
            case "Size":
                fmt.Sscanf(match[i], "%d", &result.Size)
            }
        }
    }

    return result, nil
}

func main() {
    logLine := `127.0.0.1 - - [10/Jan/2024:13:55:36 +0800] "GET /api/users HTTP/1.1" 200 1234`

    entry, err := ParseHTTPLog(logLine)
    if err != nil {
        fmt.Println("Error:", err)
        return
    }

    fmt.Printf("IP: %s\n", entry.IP)
    fmt.Printf("时间: %s\n", entry.Timestamp)
    fmt.Printf("方法: %s\n", entry.Method)
    fmt.Printf("路径: %s\n", entry.Path)
    fmt.Printf("状态码: %d\n", entry.Status)
    fmt.Printf("响应大小: %d bytes\n", entry.Size)
}

输出：

IP: 127.0.0.1
时间: 10/Jan/2024:13:55:36 +0800
方法: GET
路径: /api/users
状态码: 200
响应大小: 1234 bytes

九、注意事项

使用 regexp 包时有几个坑需要注意：

第一，UTF-8 支持。默认情况下 . 匹配单个字节而非 UTF-8 字符。如果要匹配完整的 Unicode 字符，使用 (?s:.) 或设置 s 标志：. 会匹配换行符外的任意字符，包括多字节 UTF-8 字符的中间字节。

re := regexp.MustCompile(`(?s).`) // 让 . 匹配包括换行在内的任意字符

第二，字符串长度限制。正则匹配可能涉及大量回溯，Go 没有内置超时机制。在处理不可信输入时，建议用 context 和 select 实现超时保护：

func matchWithTimeout(pattern, text string, timeout time.Duration) (bool, error) {
    done := make(chan bool)
    var result bool
    var err error

    go func() {
        re := regexp.MustCompile(pattern)
        result = re.MatchString(text)
        done <- true
    }()

    select {
    case <-done:
        return result, err
    case <-time.After(timeout):
        return false, fmt.Errorf("匹配超时")
    }
}

第三，特殊字符转义。在构建动态正则时，必须用 regexp.QuoteMeta 转义用户输入：

userInput := "http://example.com?foo=bar"
safePattern := regexp.QuoteMeta(userInput)
// 结果: "http://example\.com\?foo=bar"
re := regexp.MustCompile(safePattern)

regexp 包是 Go 标准库中少数几个"开箱即用"且性能出众的工具之一。掌握它能大幅提升文本处理效率。记住几个关键点：优先预编译、用锚点定位、必要时用命名捕获组简化逻辑。

文章目录

Go 正则表达式：regexp 包的使用

Go 正则表达式：regexp 包的使用

一、为什么选择 regexp 包

二、快速上手：两种使用方式

2.1 直接使用快捷函数

2.2 先编译后使用

三、核心匹配方法一览

3.1 检测是否存在匹配

5.3 后行断言 `(?<=...)`

5.4 负后行断言 `(?<!...)`

六、常用模式速查

八、完整实战示例

九、注意事项

评论 (0)

文章目录

Go 正则表达式：regexp 包的使用

Go 正则表达式：regexp 包的使用

一、为什么选择 regexp 包

二、快速上手：两种使用方式

2.1 直接使用快捷函数

2.2 先编译后使用

三、核心匹配方法一览

3.1 检测是否存在匹配

5.3 后行断言 (?<=...)

5.4 负后行断言 (?<!...)

六、常用模式速查

八、完整实战示例

九、注意事项

评论 (0)

扫一扫，手机查看

5.3 后行断言 `(?<=...)`

5.4 负后行断言 `(?<!...)`