java之Matcher类详解,javapattern和matcher

摘要：

在JDK1.4中，Java增加了对正则表达式的支持。Java和常规工具主要在Java中提供。util在regex包中；这个包中主要有两个类：Pattern和Matcher。Matcher声明：publicfinalclassMatchereextendsObjectimplementsMatchResultMatcher类具有最终修饰。可以看出，它不能被子类继承*/PatternparentPattern；/***组使用的存储。在计算*/int[]组的过程中，*农产品价格上涨的主要有效值；/***序列内的变化是匹配的。锚*将匹配这些“硬”边界。改变区域*会改变这些值。序列中要匹配的范围*/intfrom，to；/***查找包含此值，以确保在遇到查找的点处，管道表达式*匹配*/intlookbehindTo；/***原始字符串被标记*/CharSequencetext；/***匹配项由上一个节点使用。当*匹配项不需要占用输入时，将使用NOAnchor。ENDAnchor是所有输入中的唯一格式。NOANCHOR表示不需要匹配所有输入；ENDANCHOR表示所有输入必须匹配*/intfirst=-1，last=0；/***上次匹配操作中匹配内容的结束索引。如何在定制包中获取Matc

在JDK 1.4中，Java增加了对正则表达式的支持。

java与正则相关的工具主要在java.util.regex包中；此包中主要有两个类：Pattern、Matcher。

Matcher

声明：public final classMatcherextendsObjectimplementsMatchResult

Matcher类有final 修饰，可知他不能被子类继承。

含义：匹配器类，通过解释 Pattern 对 character sequence 执行匹配操作的引擎。

注意：此类的实例用于多个并发线程是不安全的。

字段：

    /*** The Pattern object that created this Matcher.创建此对象的模式匹配。
     */Pattern parentPattern;

    /*** The storage used by groups. They may contain invalid values if
     * a group was skipped during the matching.组使用的存储。如果在匹配过程中跳过一个组，它们可能包含无效的值。
     */
    int[] groups;

    /*** The range within the sequence that is to be matched. Anchors
     * will match at these "hard" boundaries. Changing the region
     * changes these values.要匹配的序列中的范围。
     */
    intfrom, to;

    /*** Lookbehind uses this value to ensure that the subexpression
     * match ends at the point where the lookbehind was encountered.
     */
    intlookbehindTo;

    /*** The original string being matched.匹配的目的字符串。
     */CharSequence text;

    /*** Matcher state used by the last node. NOANCHOR is used when a
     * match does not have to consume all of the input. ENDANCHOR is
     * the mode used for matching all the input. NOANCHOR表示不必匹配所有的输入；ENDANCHOR表示必须匹配所有的输入。
     */
    static final int ENDANCHOR = 1;
    static final int NOANCHOR = 0;
    int acceptMode =NOANCHOR;

    /*** The range of string that last matched the pattern. If the last
     * match failed then first is -1; last initially holds 0 then it
     * holds the index of the end of the last match (which is where the
     * next search starts).最后一个匹配模式的字符串的范围。
     */
    int first = -1, last = 0;

    /*** The end index of what matched in the last match operation.在最后一次匹配操作中匹配的结束索引。
     */
    int oldLast = -1;

    /*** The index of the last position appended in a substitution.追加在替换中的最后位置的索引。
     */
    int lastAppendPosition = 0;

    /*** Storage used by nodes to tell what repetition they are on in
     * a pattern, and where groups begin. The nodes themselves are stateless,
     * so they rely on this field to hold state during a match.
     */
    int[] locals;

    /*** Boolean indicating whether or not more input could change
     * the results of the last match. 
     * 
     * If hitEnd is true, and a match was found, then more input
     * might cause a different match to be found.
     * If hitEnd is true and a match was not found, then more
     * input could cause a match to be found.
     * If hitEnd is false and a match was found, then more input
     * will not change the match.
     * If hitEnd is false and a match was not found, then more
     * input will not cause a match to be found.
     */
    booleanhitEnd;

    /*** Boolean indicating whether or not more input could change
     * a positive match into a negative one.
     *
     * If requireEnd is true, and a match was found, then more
     * input could cause the match to be lost.
     * If requireEnd is false and a match was found, then more
     * input might change the match but the match won't be lost.
     * If a match was not found, then requireEnd has no meaning.
     */
    booleanrequireEnd;

    /*** If transparentBounds is true then the boundaries of this
     * matcher's region are transparent to lookahead, lookbehind,
     * and boundary matching constructs that try to see beyond them.
     */
    boolean transparentBounds = false;

    /*** If anchoringBounds is true then the boundaries of this 
     * matcher's region match anchors such as ^ and $.
     */
    boolean anchoringBounds = true;

构造器

Matcher() {
    }

Matcher(Pattern parent, CharSequence text) {
        this.parentPattern =parent;
        this.text =text;

        //Allocate state storage
        int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
        groups = new int[parentGroupCount * 2];
        locals = new int[parent.localCount];

        //Put fields into initial states
reset();
    }

构造器有包访问权限，可知不能在包外通过new创建Matcher对象。

如何在自定义的包中得到Matcher类的实例？

Matcher类中没有合适的方法，查阅Pattern类有：

    publicMatcher matcher(CharSequence input) {
    if (!compiled) {
        synchronized(this) {
        if (!compiled)
            compile();
        }
    }
        Matcher m = new Matcher(this, input);
        returnm;
    }

可知需要通过Pattern对象调用matcher方法来返回Matcher 类的实例。

对照Matcher构造器源码，可知构造器将Pattern对象的引用赋于Matcher中变量parentPattern，目标字符串赋于变量text；并创建了数组groups和locals 。

数组groups是组使用的存储。存储的是当前匹配的各捕获组的first和last信息。

groups[0]存储的是组零的first，groups[1]存储的是组零的last，groups[2]存储的是组1的first，groups[3]存储的是组1的last，依次类推。关于捕获组的信息请看java之Pattern类详解中的组和捕获。

初始化后状态表：（具体分析见以下reset()方法）

变量	类型	值
first	int	-1
last	int	0
oldLast	int	-1
lastAppendPosition	int	0
from	int	0
to	int	text.length()
groups	int[]	locals[i] = -1
locals	int[]	locals[i] = -1
parentPattern	Pattern	构造器传入的Pattern对象
text	CharSequence	构造器传入的目标字符串

部分方法：

1、publicStringtoString()

返回匹配器的字符串表示形式。包含可用于调试的信息的Matcher字符串表示形式。未指定确切格式。

源码：

    publicString toString() {
        StringBuffer sb = newStringBuffer();
    sb.append("java.util.regex.Matcher");
    sb.append("[pattern=" +pattern());
    sb.append(" region=");
    sb.append(regionStart() + "," +regionEnd());
        sb.append(" lastmatch=");
        if ((first >= 0) && (group() != null)) {
            sb.append(group());
        }
    sb.append("]");
    returnsb.toString();
    }

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        System.out.println(m);

打印：

java.util.regex.Matcher[pattern=(w+)%(d+) region=0,11 lastmatch=]

2、publicMatcherreset()

重置匹配器。

    publicMatcher reset() {
        first = -1;
        last = 0;
        oldLast = -1;
        for(int i=0; i<groups.length; i++)
            groups[i] = -1;
        for(int i=0; i<locals.length; i++)
            locals[i] = -1;
        lastAppendPosition = 0;
        from = 0;
        to =getTextLength();
    return this;
    }

   intgetTextLength() {
        returntext.length();
    }

可知reset()方法改变了变量first 、last 、oldLast、lastAppendPosition、from、to的值并将数组groups、locals初始化。

状态变化：

变量	类型	新值
first	int	-1
last	int	0
oldLast	int	-1
lastAppendPosition	int	0
from	int	0
to	int	text.length()
groups	int[]	locals[i] = -1
locals	int[]	locals[i] = -1

测试1:

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        if(m.find()) {
            System.out.println("开始索引：" + m.start());//开始索引：0
            System.out.println("group():" + m.group());//group():ab%12
}
        if(m.find()) {
            System.out.println("开始索引：" + m.start());//开始索引：6
            System.out.println("group():" + m.group());//group():cd%34
        }

测试2：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        if(m.find()) {
            System.out.println("开始索引：" + m.start());//开始索引：0
            System.out.println("group():" + m.group());//group():ab%12
}
        m.reset();
        if(m.find()) {
            System.out.println("开始索引：" + m.start());//开始索引：0
            System.out.println("group():" + m.group());//group():ab%12
        }

由测试1和测试2可知reset方法可将Matcher 对象状态初始化。

3、publicMatcherreset(CharSequenceinput)

重置此具有新输入序列的匹配器。

   publicMatcher reset(CharSequence input) {
        text =input;
        returnreset();
    }

可知此方法在reset()方法的基础上改变了目标字符串的值。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        m.reset("ef%56-gh%78");
        while(m.find()) {System.out.println("group():" +m.group());
        }

打印：

group():ef%56group():gh%78

4、publicPatternpattern()

返回由此匹配器解释的模式。

源码：

    publicPattern pattern() {
        returnparentPattern;
    }

pattern()返回parentPattern，即构造器传入的Pattern对象。

5、public intgroupCount()

返回此匹配器模式中的捕获组数。根据惯例，零组表示整个模式。它不包括在此计数中。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        System.out.println(m.groupCount());//2

6、publicStringgroup()

返回当前查找而获得的与组匹配的所有子串内容。

查看group()源码：

    public String group() {
        return group(0);
    }

可知group()实际调用了group(intgroup)方法，参数group为0。组零表示整个模式。

7、publicStringgroup(intgroup)

返回当前查找而获得的与组匹配的所有子串内容。

8、public intstart()

返回当前匹配的子串的第一个字符在目标字符串中的索引位置。

源码：

    public intstart() {
        if (first < 0)
            throw new IllegalStateException("No match available");
        returnfirst;
    }

可知start()方法返回的是匹配器的状态first。

9、public intstart(intgroup)

返回当前匹配的指定组中的子串的第一个字符在目标字符串中的索引位置。

10、public intend()

返回当前匹配的子串的最后一个字符的下一个位置在目标字符串中的索引位置。

源码：

    public intend() {
        if (first < 0)
            throw new IllegalStateException("No match available");
        returnlast;
    }

可知end()方法返回的是匹配器的状态last。

11、public intend(intgroup)

返回当前匹配的的指定组中的子串的最后一个字符的下一个位置在目标字符串中的索引位置。

12、public booleanfind()

在目标字符串里查找下一个匹配子串。如果匹配成功，则可以通过start、end和group方法获取更多信息。

源码：

    public booleanfind() {
        int nextSearchIndex =last;
        if (nextSearchIndex ==first)
            nextSearchIndex++;

        //If next search starts before region, start it at region
        if (nextSearchIndex <from)
            nextSearchIndex =from;

        //If next search starts beyond region then it fails
        if (nextSearchIndex >to) {
            for (int i = 0; i < groups.length; i++)
                groups[i] = -1;
            return false;
        }
        returnsearch(nextSearchIndex);
    }

从源码中可知nextSearchIndex为下次查找匹配的开始位置；nextSearchIndex的值有三次判定：

1、last==first时，nextSearchIndex++；

2、nextSearchIndex<from时，nextSearchIndex=from;

3、nextSearchIndex>to时，return false;

可通过region(intstart,intend)方法修改from和to，以此来影响下次查找匹配的开始位置。

注意：此方法会改变匹配器的状态：first、last和oldLast。

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        while(m.find()) {
            System.out.println("group():" +m.group());
            System.out.println("start():" +m.start());
            System.out.println("end():" +m.end());
            System.out.println("group(1):" + m.group(1));
            System.out.println("start(1):" + m.start(1));
            System.out.println("end(1):" + m.end(1));
            System.out.println("group(2):" + m.group(2));
            System.out.println("start(2):" + m.start(2));
            System.out.println("end(2):" + m.end(2));
            System.out.println();
        }

打印：

group():ab%12start():0end():5group(1):ab
start(1):0end(1):2group(2):12start(2):3end(2):5
group():cd%34start():6end():11group(1):cd
start(1):6end(1):8group(2):34start(2):9end(2):11

可知find()方法匹配了两个子串：ab%12和cd%34；每个子串有2组。

13、public booleanfind(intstart)

重置此匹配器，然后尝试查找匹配该模式，从指定的位置开始查找下一个匹配的子串。如果匹配成功，则可以通过start、end和group方法获取更多信息。

注意：此方法会改变匹配器的转态。

源码：

    public boolean find(intstart) {
        int limit =getTextLength();
        if ((start < 0) || (start >limit))
            throw new IndexOutOfBoundsException("Illegal start index");
        reset();
        returnsearch(start);
    }

从源码可知此方法首先重置匹配器，然后搜索匹配，下次查找匹配的开始位置为指定的start参数。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        if (m.find(1)) {
            System.out.println("开始索引：" + m.start());//开始索引：1
            System.out.println("group():" + m.group());//group():b%12
}
        if (m.find(0)) {
            System.out.println("开始索引：" + m.start());//开始索引：0
            System.out.println("group():" + m.group());//group():ab%12
}
        if(m.find()) {
            System.out.println("开始索引：" + m.start());//开始索引：6
            System.out.println("group():" + m.group());//group():cd%34
        }

当有m.find(1)时，重置匹配器，从索引1处开始匹配，匹配的子串为“b%12”；

当有m.find(0)时，重置匹配器，从索引0处开始匹配，匹配的子串为“ab%12”；

当有m.find()时，并没有重置匹配器，从索引6处开始匹配，匹配的子串为“cd%34”；

14、public int regionStart()

报告此匹配器区域的开始索引。

源码：

    public intregionStart() {
        returnfrom;
    }

可知end()方法返回的是匹配器的状态from。

15、public int regionEnd()

报告此匹配器区域的结束索引（不包括）。

源码：

    public intregionEnd() {
        returnto;
    }

可知end()方法返回的是匹配器的状态to。

16、public Matcherregion(intstart,intend)

设置此匹配器的区域限制。重置匹配器，然后设置区域，使其从 start 参数指定的索引开始，到 end 参数指定的索引结束（不包括end索引处的字符）。

    public Matcher region(int start, intend) {
        if ((start < 0) || (start >getTextLength()))
            throw new IndexOutOfBoundsException("start");
        if ((end < 0) || (end >getTextLength()))
            throw new IndexOutOfBoundsException("end");
        if (start >end)
            throw new IndexOutOfBoundsException("start > end");
        reset();
        from =start;
        to =end;
        return this;
    }

从源代码中可知region方法首先调用reset()重置，然后对from 和to赋值，来设置匹配的目的字符串的范围。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        m.region(0, 4);    
        while(m.find()) {
            System.out.println("group():" +m.group());
            System.out.println("regionStart():" +m.regionStart());
            System.out.println("regionEnd():" +m.regionEnd());
        }

打印：

group():ab%1regionStart():0regionEnd():4

17、public boolean lookingAt()
从目标字符串开始位置进行匹配。只有在有匹配且匹配的某一子串中包含目标字符串第一个字符的情况下才会返回true。

源码：

    public booleanlookingAt() {
        returnmatch(from, NOANCHOR);
    }

从源码中可知下次查找匹配的开始位置为from，可通过region(intstart,intend)方法修改from的值。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        System.out.println(m.lookingAt());//true
        m = p.matcher("%ab%12-cd%34");
        System.out.println(m.lookingAt());//false

18、public boolean matches()
只有完全匹配时才会返回true。

源码：

    public booleanmatches() {
        returnmatch(from, ENDANCHOR);
    }

对比上一个方法lookingAt()，从代码上看差别很小，调用的match方法只有一个参数不一样；lookingAt中使用的NOANCHOR，而matches中使用的ENDANCHOR。

NOANCHOR表示不必匹配所有的输入；ENDANCHOR表示必须匹配所有的输入。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("%ab%12");
        System.out.println(m.matches());//false
        m = p.matcher("ab%12%");
        System.out.println(m.matches());//false
        m = p.matcher("ab%12");
        System.out.println(m.matches());//true

19、public Matcher appendReplacement(StringBuffer sb, String replacement)

将当前匹配子串替换为指定字符串，并将从上次匹配结束后到本次匹配结束后之间的字符串添加到一个StringBuffer对象中，最后返回其字符串表示形式。

注意：对于最后一次匹配，其后的字符串并没有添加入StringBuffer对象中，若需要这部分的内容需要使用appendTail方法。

20、public StringBufferappendTail(StringBuffersb)
将最后一次匹配工作后剩余的字符串添加到一个StringBuffer对象里。

源码：

    publicStringBuffer appendTail(StringBuffer sb) {
        sb.append(getSubSequence(lastAppendPosition, getTextLength()).toString());
    returnsb;
    }

查看源码有getSubSequence(lastAppendPosition, getTextLength())，即获取从lastAppendPosition索引处开始，到目的字符串结束索引处之间的子串。

lastAppendPosition为何值？

查阅Matcher类代码后，发现appendReplacement方法中有：

    lastAppendPosition = last;

last即目前最后一次匹配结束后的索引。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("前ab%12中cd%34后");
        StringBuffer s = newStringBuffer();
        while(m.find()) {
            m.appendReplacement(s, "app");
        }
        System.out.println(s);//前app中app
m.appendTail(s);
        System.out.println(s);//前app中app后

21、public StringreplaceAll(Stringreplacement)

将匹配的子串用指定的字符串替换。

    publicString replaceAll(String replacement) {
        reset();
        boolean result =find();
        if(result) {
            StringBuffer sb = newStringBuffer();
            do{
                appendReplacement(sb, replacement);
                result =find();
            } while(result);
            appendTail(sb);
            returnsb.toString();
        }
        returntext.toString();
    }

查看源码可知此方法首先重置匹配器，然后判断是否有匹配，若有，则创建StringBuffer 对象，然后循环调用appendReplacement方法进行替换，最后调用appendTail方法并返回StringBuffer 对象的字符串形式。

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        StringBuffer s = newStringBuffer();
        System.out.println(m.replaceAll("app"));//app-app

22、public StringreplaceFirst(Stringreplacement)

将匹配的第一个子串用指定的字符串替换。

    publicString replaceFirst(String replacement) {
        if (replacement == null)
            throw new NullPointerException("replacement");
        StringBuffer sb = newStringBuffer();
        reset();
        if(find())
            appendReplacement(sb, replacement);
        appendTail(sb);
        returnsb.toString();
    }

查看源码可知此方法其实是replaceAll方法的减配版本，只对第一次匹配做了替换。

测试：

        Pattern p = Pattern.compile("(\w+)%(\d+)");
        Matcher m = p.matcher("ab%12-cd%34");
        StringBuffer s = newStringBuffer();
        System.out.println(m.replaceFirst("app"));//app-cd%34

23、public Matcher usePattern(Pattern newPattern)

更改匹配器的匹配模式。

测试：

    public static voidmain(String[] args) {
        Pattern p = Pattern.compile("[a-z]+");
        Matcher m = p.matcher("111aaa222");
        System.out.println(piPei(m));//（模式[a-z]+）：匹配子串:aaa;开始位置:3;结束位置:6;
        m.usePattern(Pattern.compile("\d+"));
        System.out.println(piPei(m));//（模式d+）：匹配子串:222;开始位置:6;结束位置:9;
}

    public staticString piPei(Matcher m) {
        StringBuffer s = newStringBuffer();
        while(m.find()) {
            s.append("匹配子串:" + m.group() + ";");
            s.append("开始位置:" + m.start() + ";");
            s.append("结束位置:" + m.end() + ";");
        }
        if (s.length() == 0) {
            s.append("没有匹配到！");
        }
        s.insert(0, "（模式" + m.pattern().pattern() + "）：");
        returns.toString();
    }

可以看到更改匹配模式为"\d+"后，只匹配到了"222"，若需要匹配所有数字字符，应对匹配器初始化。

        Pattern p = Pattern.compile("[a-z]+");
        Matcher m = p.matcher("111aaa222");
        System.out.println(piPei(m));//（模式[a-z]+）：匹配子串:aaa;开始位置:3;结束位置:6;
        m.usePattern(Pattern.compile("\d+"));
        m.reset();
        System.out.println(piPei(m));//（模式d+）：匹配子串:111;开始位置:0;结束位置:3;匹配子串:222;开始位置:6;结束位置:9;

更多与正则表达式相关内容：

java正则规则表

java正则表达式之Greedy、Reluctant和Possessive

java之Pattern类详解

java之Matcher类详解

相关文章

spring5 源码深度解析----- AOP代理的生成

wordpress源码解析-数据库表结构（转）

C# 文件操作详解（一）File类

Linux-3.14.12内存管理笔记【建立内核页表（1）】

二维码扫描开源库ZXing定制化【转】

多渠道打包工具Walle源码分析

最新文章

随机推荐

思享工具箱导航

JSON工具

格式化转换

加解密编码

文本数字

网络

站长

计算

其他

对照列表