..

Lexer

[!question]- What is the input and output of a Lexer?

  • Input: Stream of Characters
  • Output: Stream of tokens

[!question]- Lexer is a $\underline{\hspace{3cm}}$ and $\underline{\hspace{3cm}}$ process Aggregation and Classification

[!question]- What is a microsyntax It is the lexical structure of a language It specifies how to group characters into words and how to split up words that run together

[!question]- Which is the only part of the compiler that manipulates every character of the input program? Lexer

[!question]- Differentiate token, lexeme and pattern

  • Token is the smallest meaningful part of a program
  • The can’t be broken down further
  • The can be thought of as categories
  • Lexeme is a sequence of characters that we have matched from the stream of characters
  • For a Lexeme to be considered a valid token it has to match the token’s pattern

[!example] main is a lexeme For it to be considered a KEYWORD it has to match [a-zA-Z_]{a-zA-Z0-9_}*

[!question]- What are the three operations supported by Regular expressions

  • Alternation
    • $R | S = {x | x \in R\ or\ y \in S}$
  • Concatenation
    • $RS = {xy |x \in R\ and\ y \in S}$
    • Self concatenation can be denoted as $R^i$ where $i$ is the number of times $R$ is concatenated with itself
  • Kleene Star Closure
    • $R^* = \bigcup_{i=0}^{\infty}R^i$

[!question]- What is finite closure $R^3 = {R | RR |RRR}$

[!question]- What is positive closure $R^+ = RR^*$

[!question]- Draw the cycle of constructions ss 2023-10-10 at 10.17.11 AM.png

[!question]- Formally define a NFA $M = <Q, \Sigma, q_o, F, \delta>$ $\delta: Q \times (\Sigma \cup \varepsilon) \rightarrow 2^Q$

[!question]- Formally define a DFA $M = <Q, \Sigma, q_o, F, \delta>$ $\delta: Q \times \Sigma \rightarrow Q$

[!question]- Differentiate between tokenizer and recognizer

  • Recognizer takes an input string and produces a binary result of Accept/Reject
  • Tokenizer takes an input string, splits them into tokens and produces an output of the format <Token Name, Token Value>

[!question]- What are the two rules used for disambiguation in a lexer?

  • Longest Match Rule
  • Rule Priority

[!question]- What is the Longest Match Rule and give an example? Suppose we have a number of rules that match a prefix of a given string, pick the rule that matches the most number of characters.

[!example] for8 will match both an IDENTIFIER and FOR FOR matches for(3 char) IDENTIFIER matches for8(4 char) Hence we go with IDENTIFIER

[!question]- What is Rule Priority and give an example? Lexer rules are given in order of precedence. If there are multiple matching rules, pick the rule that is given first

[!example] for will match both INDENTIFIER and FOR FOR is given first So we go with FOR