Class RegEx() Foundation

Class for regular expression processing with PCRE compatibility.

Description

The RegEx class implements PCRE (Perl Compatible Regular Expressions) for text processing tasks. It supports pattern matching, text searching and replacement, string tokenization, and data extraction using standard PCRE syntax.

The pattern matching functionality includes :match() for detailed results with capture groups, :test() for quick pattern checking, and :matchAll() for extracting all occurrences from a text. The matching behavior is controlled through configuration methods such as :setIgnoreCase(), :setMultiline(), and :setUnicode(), which adjust case sensitivity, anchor handling, and character encoding respectively. Helper functions such as :getMatchText() and :getMatchPos() simplify extraction of matched text and position data from the match results.

Text manipulation is accomplished through :replace() for literal substitutions, :replaceCallback() for dynamic replacements using callback functions, and :split() for dividing strings at match locations.

Batch processing is supported by the class via the methods :batchMatch(), :batchTest(), and :batchReplace(), which compile patterns once and apply them efficiently to multiple subject (input) strings.

By default, the RegEx class compiles patterns automatically on first use. If this lazy compilation approach is not desired, either because the validity of the pattern must be ensured at a certain point in time, or if the compilation costs should be moved outside of the execution code, the method :precompile() can be used.

Execution speed can be improved with the :optimize() method. Pattern execution then occurs either using bytecode or native machine code depending on the optimization method that is used.

Dedicated constructor methods provide pre-configured RegEx instances for common validation scenarios including email addresses, URLs, and IP addresses. For single-use operations, convenience class methods like :quickMatch() and :quickTest() perform pattern operations without requiring object creation.

Error information from compilation and matching operations is accessible through :getLastError() and related methods.

When done using a RegEx instance, the method :destroy() must be called to clear the pattern and free compiled data. Otherwise, this memory will remain allocated and cause a memory leak.

Constructors
:create()
Creates a new RegEx instance.
:createEmail()
Creates a RegEx instance for email address validation.
:createIPAddress()
Creates a RegEx instance for IP address validation.
:createURL()
Creates a RegEx instance for URL validation.
Class Methods
:getGroupCount()
Gets the number of capture groups in a match result.
:getMatchPos()
Gets the match position from a match result.
:getMatchRange()
Gets the match range from a match result.
:getMatchText()
Gets the matched text from a match result.
:getPCREVersion()
Gets the PCRE library version.
:quickMatch()
Performs a one-time pattern match.
:quickReplace()
Performs a one-time search and replace.
:quickTest()
Performs a one-time pattern test.
Life Cycle
:destroy()
Frees resources.
Pattern Management
:getPattern()
Gets the current regular expression pattern.
:isCompiled()
Checks if the pattern has been compiled.
:optimize()
Optimizes the pattern for faster execution.
:precompile()
Forces immediate pattern compilation.
Pattern Matching
:count()
Counts all matches.
:find()
Finds the first match position.
:findAll()
Finds all match positions.
:match()
Finds the first match.
:matchAll()
Finds all matches.
:test()
Tests for a match.
Text Manipulation
:replace()
Replaces matches with replacement text.
:replaceCallback()
Replaces matches using a callback code block.
:split()
Splits a string using the pattern as delimiter.
Configuration
:getOptions()
Returns current PCRE compilation options.
:setDotAll()
Sets dot-all mode.
:setExtended()
Sets extended mode for readable patterns.
:setIgnoreCase()
Sets case-insensitive matching mode.
:setMaxCaptureGroups()
Sets the maximum number of capture groups.
:setMultiline()
Sets multiline mode for anchors.
:setOptions()
Sets PCRE compilation options directly.
:setUngreedy()
Sets ungreedy (lazy) quantifier mode.
:setUnicode()
Sets Unicode/UTF-8 mode.
Error Handling
:getErrorPos()
Returns the error position in the pattern.
:getLastError()
Returns the last error code.
:getLastMessage()
Returns the last error message.
Batch Processing
:batchMatch()
Matches a pattern against multiple subjects.
:batchReplace()
Performs replacement in multiple subject strings.
:batchTest()
Tests a pattern against multiple subjects.
Examples
Simple matching
// 
// Use the method :match() to find a number in a string 
// 
PROCEDURE Main() 
   LOCAL oRegEx, aMatch 

   // Match numbers 
   oRegEx := RegEx():create("[0-9]+") 
   aMatch := oRegEx:match("The answer is 42") 

   // Output: 
   // 42 
   ? RegEx():getMatchText(aMatch) 

RETURN 


Validate and extract email
// 
// Use the class method :getMatchText() to extract the 
// text from a match result 
// 
PROCEDURE Main() 
   LOCAL oRegEx, cSubject, cText 

   cSubject := "Contact: john.doe@example.com" 

   oRegEx := RegEx():createEmail() 

   // Output: 
   // Valid email: john.doe@example.com 

   IF oRegEx:test( cSubject ) 
      cText := RegEx():getMatchText( oRegEx:match( cSubject ) ) 
      ? "Valid email:", cText 
   ELSE 
      ? "No valid e-mail" 
   ENDIF 

RETURN 


Data Import with Validation
// 
// Use two RegEx() objects to extract information 
// from a string 
// 
PROCEDURE Main() 
   LOCAL i, aRaw, aCustomers 

   aRaw := { ; 
      "John (555) 123-4567 john@example.com", ; 
      "Jane 001 555 987-6543 jane@example.com", ; 
      "Bob invalid-phone bob@test.com", ; 
      "Scott +1 (555) 223-6543 scott@example.com" ; 
   } 

   aCustomers := ImportCustomerData(aRaw) 

   // Output: 
   // Imported 3 of 4 records 
   // John 5551234567 john@example.com 
   // Jane +15559876543 jane@example.com 
   // Scott +15552236543 scott@example.com 
   FOR i := 1 TO Len(aCustomers) 
      ? aCustomers[i][1], aCustomers[i][2], aCustomers[i][3] 
   NEXT 

RETURN 

FUNCTION ImportCustomerData(aRawData) 
   LOCAL oRXPhone, oRXEmail, aMatch 
   LOCAL nRaw, cLine, cName, cEmail 
   LOCAL cNormPhone, cPhonePattern 
   LOCAL aCustomers 

   // Simple US and Canada phone number expression 

   cPhonePattern := "(?<!\d)"      + ; // # No digit comes before 
                    "(\+1|001)?"   + ; // # Group 1: +1 or 001 
                    "[\s.-]?"      + ; // #    Optional seperator 
                    "\(?"          + ; // #    Optional opening brace 
                    "(\d{3})"      + ; // # Group 2: Three digits 
                    "\)?"          + ; // #    Optional closing brace 
                    "[\s.-]?"      + ; // #    Optional seperator 
                    "(\d{3})"      + ; // # Group 3: Three digits 
                    "[\s.-]?"      + ; // #    Optional seperator 
                    "(\d{4})"      + ; // # Group 4: Four digits 
                    "(?!\d)"           // # No digits follow 

   // Factory methods 
   oRXEmail := RegEx():createEmail() 
   oRXPhone := RegEx():create(cPhonePattern) 

   aCustomers := {} 

   FOR nRaw := 1 TO Len(aRawData) 
      cLine := aRawData[nRaw] 

      cNormPhone := "" 

      // Extract name (first word) 
      cName := Left(cLine, At(" ", cLine) - 1) 

      // Extract phone 
      IF oRXPhone:test( cLine ) 
         aMatch := oRXPhone:match( cLine ) 
         cNormPhone := RegEx():getMatchText( aMatch, 1 ) 
         cNormPhone := StrTran( cNormPhone, "001", "+1"  ) 
         cNormPhone += RegEx():getMatchText( aMatch, 2 ) 
         cNormPhone += RegEx():getMatchText( aMatch, 3 ) 
         cNormPhone += RegEx():getMatchText( aMatch, 4 ) 
      ENDIF 

      // Extract email 
      IF oRXEmail:test( cLine ) 
         cEmail := RegEx():getMatchText( oRXEmail:match(cLine) ) 
      ENDIF 

      // Add customer if valid 
      IF !Empty(cNormPhone) .AND. !Empty(cEmail) 
         AAdd(aCustomers, {cName, cNormPhone, cEmail}) 
      ENDIF 
   NEXT 

   ? "Imported", Var2Char(Len(aCustomers)), "of", ; 
               Var2Char(Len(aRawData)), "records" 

RETURN aCustomers 


Feedback

If you see anything in the documentation that is not correct, does not match your experience with the particular feature or requires further clarification, please use this form to report a documentation issue.