Compiles and matches regular-expression patterns.
Standard C Library ( libc.a )
Programmers Workbench Library (libPW.a)
#include <libgen.h>
char *regcmp ( String [, String, . . . ], (char *) 0)
const char *String, . . . ;
const char *regex ( Pattern, Subject [, ret, . . . ])
char *Pattern, *Subject, *ret, . . . ;
extern char *__loc1;
The regcmp subroutine compiles a regular expression (or Pattern) and returns a pointer to the compiled form. The regcmp subroutine allows multiple String parameters. If more than one String parameter is given, then the regcmp subroutine treats them as if they were concatenated together. It returns a null pointer if it encounters an incorrect parameter.
You can use the regcmp command to compile regular expressions into your C program, frequently eliminating the need to call the regcmp subroutine at run time.
The regex subroutine compares a compiled Pattern to the Subject string. Additional parameters are used to receive values. Upon successful completion, the regex subroutine returns a pointer to the next unmatched character. If the regex subroutine fails, a null pointer is returned. A global character pointer, __loc1, points to where the match began.
The regcmp and regex subroutines are borrowed from the ed command; however, the syntax and semantics have been changed slightly. You can use the following symbols with the regcmp and regex subroutines:
Item | Description |
---|---|
[ ] * . ^ | These symbols have the same meaning as they do in the ed command. |
- | The minus sign (or hyphen) within brackets used with the regex subroutine
means "through," according to the current collating sequence. For
example, [a-z] can be equivalent to [abcd . . . xyz] or [aBbCc . . . xYyZz].
You can use the - by itself if the - is the last or first character.
For example, the character class expression [ ] -] matches the ] (right
bracket) and - (minus) characters. The regcmp subroutine does not use the current collating sequence, and the minus sign in brackets controls only a direct ASCII sequence. For example, [a-z] always means [abc . . . xyz] and [A-Z] always means [ABC . . . XYZ] . If you need to control the specific characters in a range using the regcmp subroutine, you must list them explicitly rather than using the minus sign in the character class expression. |
$ | Matches the end of the string. Use the \n character to match a new-line character. |
+ | A regular expression followed by + (plus sign) means one or more times. For example, [0-9] + is equivalent to [0-9] [0-9] *. |
{ m} {m,} {m, u} | Integer values enclosed in {} (braces) indicate the number of times to apply the preceding regular expression. The m character is the minimum number and the u character is the maximum number. The u character must be less than 256. If you specify only m, it indicates the exact number of times to apply the regular expression. {m,} is equivalent to {m,u} and matches m or more occurrences of the expression. The + (plus sign) and * (asterisk) operations are equivalent to {1,} and {0,}, respectively. |
( . . . )$n | This stores the value matched by the enclosed regular expression in the (n+1)th ret parameter. Ten enclosed regular expressions are allowed. The regex subroutine makes the assignments unconditionally. |
( . . . ) | Parentheses group subexpressions. An operator, such as *, +, or [ ] works on a single character or on a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0. |
All of the preceding defined symbols are special. You must precede them with a \ (backslash) if you want to match the special symbol itself. For example, \$ matches a dollar sign.
/* . . . Your Program . . . */
malloc(n)
int n;
{
static int rebuf[256] ;
return ((n <= sizeof(rebuf)) ? rebuf : NULL);
}
The regcmp subroutine produces code values that the regex subroutine can interpret as the regular expression. For instance, [a-z] indicates a range expression which the regcmp subroutine compiles into a string containing the two end points (a and z).
The regex subroutine interprets the range statement according to the current collating sequence. The expression [a-z] can be equivalent either to [abcd . . . xyz] , or to [aBbCcDd . . . xXyYzZ], as long as the character preceding the minus sign has a lower collating value than the character following the minus sign.
The behavior of a range expression is dependent on the collation sequence. If you want to match a specific set of characters, you should list each one. For example, to select letters a, b, or c, use [abc] rather than [a-c] .
Item | Description |
---|---|
Subject | Specifies a comparison string. |
String | Specifies the Pattern to be compiled. |
Pattern | Specifies the expression to be compared. |
ret | Points to an address at which to store comparison data. The regex subroutine allows multiple ret String parameters. |