module Sedlexing:sig..end
Runtime support for lexers generated by sedlex.
This module is roughly equivalent to the module Lexing from the OCaml
standard library, except that its lexbuffers handle Unicode code points
(OCaml type: Uchar.t in the range 0..0x10ffff) instead of bytes (OCaml
type: char).
It is possible to have sedlex-generated lexers work on a custom
implementation for lex buffers. To do this, define a module L which
implements the start, next, mark and backtrack functions (See the
Internal Interface section below for a specification). They need not work on
a type named lexbuf: you can use the type name you want. Then, just do in
your sedlex-processed source, bind this module to the name Sedlexing (for
instance, with a local module definition: let module Sedlexing = L in ....
Of course, you'll probably want to define functions like lexeme to be used
in the lexers semantic actions.
type lexbuf
The type of lexer buffers. A lexer buffer is the argument passed to the scanning functions defined by the generated lexers. The lexer buffer holds the internal information for the scanners, including the code points of the token currently scanned, its position from the beginning of the input stream, and the current position of the lexer.
exception InvalidCodepoint of int
Raised by some functions to signal that some code point is not compatible with a specified encoding.
exception MalFormed
Raised by functions in the Utf8 and Utf16 modules to report strings
which do not comply to the encoding.
val create : ?bytes_per_char:(Stdlib.Uchar.t -> int) ->
(Stdlib.Uchar.t array -> int -> int -> int) -> lexbufCreate a generic lexer buffer. When the lexer needs more characters, it will
call the given function, giving it an array of Uchars a, a position pos
and a code point count n. The function should put n code points or less
in a, starting at position pos, and return the number of characters
provided. A return value of 0 means end of input. bytes_per_char argument
is optional. If unspecified, byte positions are the same as code point
position.
val set_position : ?bytes_position:Stdlib.Lexing.position ->
lexbuf -> Stdlib.Lexing.position -> unitset the initial tracked input position, in code point, for lexbuf. If
unspecified, byte postion is set to the same value as code point position.
val set_filename : lexbuf -> string -> unitset_filename lexbuf file sets the filename to file in lexbuf. It also
sets the Lexing.pos_fname field in returned Lexing.position records.
val from_gen : ?bytes_per_char:(Stdlib.Uchar.t -> int) ->
Stdlib.Uchar.t Gen.t -> lexbufCreate a lexbuf from a stream of Unicode code points. bytes_per_char is
optional. If unspecified, byte positions are the same as code point
positions.
val from_int_array : ?bytes_per_char:(Stdlib.Uchar.t -> int) -> int array -> lexbufCreate a lexbuf from an array of Unicode code points. bytes_per_char is
optional. If unspecified, byte positions are the same as code point
positions.
val from_uchar_array : ?bytes_per_char:(Stdlib.Uchar.t -> int) ->
Stdlib.Uchar.t array -> lexbufCreate a lexbuf from an array of Unicode code points. bytes_per_char is
optional. If unspecified, byte positions are the same as code point
positions.
The following functions can be called from the semantic actions of lexer definitions. They give access to the character string matched by the regular expression associated with the semantic action.
val lexeme_start : lexbuf -> intSedlexing.lexeme_start lexbuf returns the offset in the input stream of
the first code point of the matched string. The first code point of the
stream has offset 0.
val lexeme_bytes_start : lexbuf -> intSedlexing.lexeme_start lexbuf returns the offset in the input stream of
the first byte of the matched string. The first code point of the stream has
offset 0.
val lexeme_end : lexbuf -> intSedlexing.lexeme_end lexbuf returns the offset in the input stream of the
character following the last code point of the matched string. The first
character of the stream has offset 0.
val lexeme_bytes_end : lexbuf -> intSedlexing.lexeme_end lexbuf returns the offset in the input stream of the
byte following the last code point of the matched string. The first
character of the stream has offset 0.
val loc : lexbuf -> int * intSedlexing.loc lexbuf returns the pair
(Sedlexing.lexeme_start lexbuf,Sedlexing.lexeme_end lexbuf).
val bytes_loc : lexbuf -> int * intSedlexing.bytes_loc lexbuf returns the pair
(Sedlexing.lexeme_bytes_start lexbuf,Sedlexing.lexeme_bytes_end lexbuf).
val lexeme_length : lexbuf -> intSedlexing.lexeme_length lexbuf returns the difference
(Sedlexing.lexeme_end lexbuf) - (Sedlexing.lexeme_start lexbuf), that is,
the length (in code points) of the matched string.
val lexeme_bytes_length : lexbuf -> intSedlexing.lexeme_length lexbuf returns the difference
(Sedlexing.lexeme_bytes_end lexbuf) - (Sedlexing.lexeme_bytes_start
lexbuf), that is, the length (in bytes) of the matched string.
val lexing_positions : lexbuf -> Stdlib.Lexing.position * Stdlib.Lexing.positionSedlexing.lexing_positions lexbuf returns the start and end positions, in
code points, of the current token, using a record of type Lexing.position.
This is intended for consumption by parsers like those generated by
Menhir.
val lexing_position_start : lexbuf -> Stdlib.Lexing.positionSedlexing.lexing_position_start lexbuf returns the start position, in code
points, of the current token.
val lexing_position_curr : lexbuf -> Stdlib.Lexing.positionSedlexing.lexing_position_curr lexbuf returns the end position, in code
points, of the current token.
val lexing_bytes_positions : lexbuf -> Stdlib.Lexing.position * Stdlib.Lexing.positionSedlexing.lexing_bytes_positions lexbuf returns the start and end
positions, in bytes, of the current token, using a record of type
Lexing.position. This is intended for consumption by parsers like those
generated by Menhir.
val lexing_bytes_position_start : lexbuf -> Stdlib.Lexing.positionSedlexing.lexing_bytes_position_start lexbuf returns the start position,
in bytes, of the current token.
val lexing_bytes_position_curr : lexbuf -> Stdlib.Lexing.positionSedlexing.lexing_bytes_position_curr lexbuf returns the end position, in
bytes, of the current token.
val new_line : lexbuf -> unitSedlexing.new_line lexbuf increments the line count and sets the beginning
of line to the current position, as though a newline character had been
encountered in the input.
val lexeme : lexbuf -> Stdlib.Uchar.t arraySedlexing.lexeme lexbuf returns the string matched by the regular
expression as an array of Unicode code point.
val lexeme_char : lexbuf -> int -> Stdlib.Uchar.tSedlexing.lexeme_char lexbuf pos returns code point number pos in the
matched string.
val sub_lexeme : lexbuf -> int -> int -> Stdlib.Uchar.t arraySedlexing.sub_lexeme lexbuf pos len returns a substring of the string
matched by the regular expression as an array of Unicode code point.
val rollback : lexbuf -> unitSedlexing.rollback lexbuf puts lexbuf back in its configuration before
the last lexeme was matched. It is then possible to use another lexer to
parse the same characters again. The other functions above in this section
should not be used in the semantic action after a call to
Sedlexing.rollback.
These functions are used internally by the lexers. They could be used to
write lexers by hand, or with a lexer generator different from sedlex. The
lexer buffers have a unique internal slot that can store an integer. They
also store a "backtrack" position.
val start : lexbuf -> unitstart t informs the lexer buffer that any code points until the current
position can be discarded. The current position become the "start" position
as returned by Sedlexing.lexeme_start. Moreover, the internal slot is set
to -1 and the backtrack position is set to the current position.
val next : lexbuf -> Stdlib.Uchar.t optionnext lexbuf extracts the next code point from the lexer buffer and
increments to current position. If the input stream is exhausted, the
function returns None. If a '\n' is encountered, the tracked line number
is incremented.
val __private__next_int : lexbuf -> int__private__next_int lexbuf extracts the next code point from the lexer
buffer and increments to current position. If the input stream is exhausted,
the function returns -1. If a '\n' is encountered, the tracked line number
is incremented.
This is a private API, it should not be used by code using this module's API and can be removed at any time.
val mark : lexbuf -> int -> unitmark lexbuf i stores the integer i in the internal slot. The backtrack
position is set to the current position.
val backtrack : lexbuf -> intbacktrack lexbuf returns the value stored in the internal slot of the
buffer, and performs backtracking (the current position is set to the value
of the backtrack position).
val with_tokenizer : (lexbuf -> 'token) ->
lexbuf ->
unit -> 'token * Stdlib.Lexing.position * Stdlib.Lexing.positionwith_tokenizer tokenizer lexbuf given a lexer and a lexbuf, returns a
generator of tokens annotated with positions. This generator can be used
with the Menir parser generator's incremental API.
module Latin1:sig..end
module Utf8:sig..end
module Utf16:sig..end