Document Type Master's Dissertation Author Coetser, Rayner Johannes Lodewikus URN etd-08252010-133710 Document Title Finite state automaton construction through regular expression hashing Degree MEng Department Computer Science Supervisor
Advisor Name Title Prof D G Kourie Co-Supervisor Prof B W Watson Supervisor Keywords
- regular expression hashing
- finite state
- approximate automaton
Date 2010-04-12 Availability unrestricted Abstract
In this study, the regular expressions forming abstract states in Brzozowski’s algorithm are not remapped to sequential state transition table addresses as would be the case in the classical approach, but are hashed to integers. Two regular expressions that are hashed to the same hash code are assigned the same integer address in the state transition table, reducing the number of states in the automaton.
This reduction does not necessarily lead to the construction of a minimal automaton: no restrictions are placed on the hash function hashing two regular expressions to the same code. Depending on the quality of the hash function, a super-automaton, previously referred to as an approximate automaton, or an exact automaton can be constructed. When two regular expressions are hashed to the same state, and they do not represent the same regular language, a super-automaton is constructed.
A super-automaton accepts the regular language of the input regular expression, in addition to some extra strings. If the hash function is bad, many regular expressions that do not represent the same regular language will be hashed together, resulting in a smaller automaton that accepts extra strings. In the ideal case, two regular expressions will only be hashed together when they represent the same regular language. In this case, an exact minimal automaton will be constructed. It is shown that, using the hashing approach, an exact or super-automaton is always constructed.
Another outcome of the hashing approach is that a non-deterministic automaton may be constructed. A new version of the hashing version of Brzozowski’s algorithm is put forward which constructs a deterministic automaton.
A method is also put forward for measuring the difference between an exact and a super-automaton: this takes the form of the k-equivalence measure: the k-equivalence measure measures the number of characters up to which the strings of two regular expressions are equal. The better the hash function, the higher the value of k, up to the point where the hash function results in regular expressions being hashed together if and only if they have the same regular language.
Using the k-equivalence measure, eight generated hash functions and one hand coded hash function are evaluated for a large number of short regular expressions, which are generated using G¨odel numbers. The k-equivalence concept is extended to the average k-equivalence value in order to evaluate the hash functions for longer regular expressions. The hand coded hash function is found to produce good results.
Copyright © 2009, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
Please cite as follows:
Coetser, RJL 2009, Finite state automaton construction through regular expression hashing, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-08252010-133710/ >
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access dissertation.pdf 653.13 Kb 00:03:01 00:01:33 00:01:21 00:00:40 00:00:03