regex
This regex implementation is backwards-compatible with the standard ‘re’ module, but offers additional functionality.
案例一:从术语表中抽取文档中能够匹配的最长字符
regex包扩展了re中的基本语法,支持POSIX matching (leftmost longest) 的语法匹配最长的词组
>>> # Normal matching.
>>> regex.search(r'Mr|Mrs', 'Mrs')
<regex.Match object; span=(0, 2), match='Mr'>
>>> regex.search(r'one(self)?(selfsufficient)?', 'oneselfsufficient')
<regex.Match object; span=(0, 7), match='oneself'>
>>> # POSIX matching.
>>> regex.search(r'(?p)Mr|Mrs', 'Mrs')
<regex.Match object; span=(0, 3), match='Mrs'>
>>> regex.search(r'(?p)one(self)?(selfsufficient)?', 'oneselfsufficient')
<regex.Match object; span=(0, 17), match='oneselfsufficient'>
中文举例,术语表中有精华和精华露两个词,在下面的句子中期望返回精华露:
import regex as re
mat=re.compile('(?p)(精华|精华露)')
mat.findall("倩碧焕妍活力精华露")
>>> 精华露
案例二:从术语表中抽取文档中能够匹配的全部词组,包括同一位置上的嵌套结构
同上,期望返回精华和精华露
import regex as re
mat=re.compile('(?<=(精华|精华露))')
mat.findall("倩碧焕妍活力精华露")
>>> 精华露
案例三:抽取目前模式前后含某些触发词的字符
方法一:
m=re.search(r'电话号码:(\d+)', '电话号码:14412234111')
m.group(1)
>>> 14412234111
方法二:如果触发词在后面
m=re.search(r'(\d+)(?=\s86+)', '14412234111 86+')
m.group(1)
>>> 14412234111
Comments