- We compile a regular expression into a recognizer by constructing a generalized transition diagram called finite automata.
- A finite Automata or fin state machine is a 5-tuple(S, ∑, S0, F, δ) where finite, δ)
S is the finite set of states
∑ is a finite alphabet of an input symbol
S0 ∈ S (Initial state)
F (set of accepting states)
δ is a transition function
There are two types of finite automata
Deterministic finite automata (DFA) have for each state (circle in the diagram) exactly
- one edge leaving out for each symboling Nondeterministic finite automata (NFA) is the other kind. There are no restrictions.
- The edges leaving a state. There can be several with the same symbol as a label and some he There edges can be labeled with ε.
Here we address how to recognize token.
We use the language generated by the following grammar,
stmt → if expr then stmt | if expr then stmt else stmt | ∈
Reorganization of Token
expr → term relop term | term
term → id | num
- Where the terminals if, then, else, relop, id and num generates the set of strings given by the following regular definitions,
if → if
then → then
relop → < | <= | = | <> | > | >=
letter → A | B | . . . | Z | a | b | . . . | z
digit → 0 | 1 | 2 | . . . | 9
id → letter (letter | digit)*
num → digit+ ()?(E(+/-)?digit+ )?
- For this language, the lexical analyzer will recognize the keyword if, then, else, as well as the lexeme denoted by relop, id, and num.
- num represents the unsigned integer and real numbers of Pascal.
- Lexical analyzer will isolate the next token in the input buffer and produces token and attribute value as an output.
Token: Sequence of the character having a collective meaning is known as a token.
Typical tokens are,
- special symbols
Pattern: The set of rules called pattern associated with a token.:
Lexeme: The sequence of character in a source program matched with a pattern for a token is: called lexeme.
Token, Pattern, Lexemes
total = sum + 12.5
Tokens are: total (id),
Lexemes are: total, =, sum, +, 12.5