2023-10-10

Lexer

[!question]- What is the input and output of a Lexer?

Input: Stream of Characters

Output: Stream of tokens

[!question]- Lexer is a $\underline{\hspace{3cm}}$ and $\underline{\hspace{3cm}}$ process Aggregation and Classification

[!question]- What is a microsyntax It is the lexical structure of a language It specifies how to group characters into words and how to split up words that run together

[!question]- Which is the only part of the compiler that manipulates every character of the input program? Lexer

[!question]- Differentiate token, lexeme and pattern

Token is the smallest meaningful part of a program

The can’t be broken down further

The can be thought of as categories

Lexeme is a sequence of characters that we have matched from the stream of characters

For a Lexeme to be considered a valid token it has to match the token’s pattern

[!example] main is a lexeme For it to be considered a KEYWORD it has to match [a-zA-Z_]{a-zA-Z0-9_}*

[!question]- What are the three operations supported by Regular expressions

Alternation

$R | S = {x | x \in R\ or\ y \in S}$

Concatenation

$RS = {xy |x \in R\ and\ y \in S}$

Self concatenation can be denoted as $R^i$ where $i$ is the number of times $R$ is concatenated with itself

Kleene Star Closure

$R^* = \bigcup_{i=0}^{\infty}R^i$

[!question]- What is finite closure $R^3 = {R | RR |RRR}$

[!question]- What is positive closure $R^+ = RR^*$

[!question]- Draw the cycle of constructions

[!question]- Formally define a NFA $M = <Q, \Sigma, q_o, F, \delta>$ $\delta: Q \times (\Sigma \cup \varepsilon) \rightarrow 2^Q$

[!question]- Formally define a DFA $M = <Q, \Sigma, q_o, F, \delta>$ $\delta: Q \times \Sigma \rightarrow Q$

[!question]- Differentiate between tokenizer and recognizer

Recognizer takes an input string and produces a binary result of Accept/Reject

Tokenizer takes an input string, splits them into tokens and produces an output of the format <Token Name, Token Value>

[!question]- What are the two rules used for disambiguation in a lexer?

Longest Match Rule

Rule Priority

[!question]- What is the Longest Match Rule and give an example? Suppose we have a number of rules that match a prefix of a given string, pick the rule that matches the most number of characters.

[!example] for8 will match both an IDENTIFIER and FOR FOR matches for(3 char) IDENTIFIER matches for8(4 char) Hence we go with IDENTIFIER

[!question]- What is Rule Priority and give an example? Lexer rules are given in order of precedence. If there are multiple matching rules, pick the rule that is given first

[!example] for will match both INDENTIFIER and FOR FOR is given first So we go with FOR