Difference between revisions of "Owl Lua API"
(32 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | The Owl Lua API is a platform for building parsers in Owl. It consists of instantiable classes and static objects. | |
− | |||
− | The Owl Lua API is a platform for building parsers in Owl. It consists of instantiable classes and static objects | ||
== <code>regexp</code> Class == | == <code>regexp</code> Class == | ||
Line 7: | Line 5: | ||
The <code>regexp</code> class exposes a POSIX implementation of regular expressions. The class's API is modeled after Qt 4.x's <code>QRegExp</code> implementation. | The <code>regexp</code> class exposes a POSIX implementation of regular expressions. The class's API is modeled after Qt 4.x's <code>QRegExp</code> implementation. | ||
− | === <code>regexp.new() === | + | === <code>regexp.new()</code> === |
Initializes a new instance of the <code>regexp</code> class. | Initializes a new instance of the <code>regexp</code> class. | ||
Line 13: | Line 11: | ||
== <code>sgml</code> Class == | == <code>sgml</code> Class == | ||
− | The <code>sgml</code> class parses Html markup and builds an SGML DOM. '''Note:''' Parsing of even well formed documents can be slow for very large files. | + | The <code>sgml</code> class parses Html markup and builds an SGML DOM. |
+ | |||
+ | '''Note:''' Parsing of even well formed documents can be slow for very large files. | ||
== <code>utils</code> Class == | == <code>utils</code> Class == | ||
Line 19: | Line 19: | ||
The <code>utils</code> class contains static methods to make common routines in parsers more accessible. | The <code>utils</code> class contains static methods to make common routines in parsers more accessible. | ||
− | ''' | + | === <code>utils.md5()</code> === |
− | <syntaxhighlight lang="lua"></syntaxhighlight> | + | |
+ | Returns an ASCII md5 encoding of the given string. | ||
+ | |||
+ | '''Signature''' | ||
+ | |||
+ | <code>encodedString utils.md5(rawstring)</code> | ||
+ | |||
+ | '''Parameters''' | ||
+ | * rawstring (string) - The string to be encoded. | ||
+ | |||
+ | '''Return Value''' | ||
+ | * encodedString (string) - The ASCII'd md5 string | ||
+ | |||
+ | ''' Example ''' | ||
+ | <syntaxhighlight lang="lua"> | ||
+ | local md5 = utils.md5("password") | ||
+ | print (md5); | ||
+ | -- output: 5f4dcc3b5aa765d61d8327deb882cf99 | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | == <code>webclient</code> Class == | ||
+ | |||
+ | The <code>webclient</code> class is used to make HTTP requests. To help reduce load and improve loading time, this class uses an internal cache. Cache requests expire in by default in 30 seconds. Each response is processed through [http://tidy.sourceforge.net/ HTML Tidy] so that all HTML returned is valid XHTML. | ||
+ | |||
+ | === <code>webclient.new()</code> === | ||
+ | |||
+ | Initializes a new instance of the <code>webclient</code> class. | ||
+ | |||
+ | === <code>webclient.get()</code> === | ||
+ | |||
+ | Requests the webpage of the specified webpage using the GET method. | ||
+ | |||
+ | ''' Signature ''' | ||
+ | |||
+ | <code>html webclient.get(url [, skipCache])</code> | ||
+ | |||
+ | ''' Parameters ''' | ||
+ | |||
+ | * url (string) - The url of the web page to receive. | ||
+ | * skipCache (optional) (boolean) - If true the internal cache is bypassed | ||
+ | |||
+ | ''' Return Value ''' | ||
+ | |||
+ | * html (string) - HTML source of the requested page | ||
+ | * status (numeric) - HTTP status code result of the request. | ||
+ | * isError (boolean) - TRUE if there was an error with the request, otherwise FALSE. | ||
+ | |||
+ | === <code>webclient.post()</code> === | ||
+ | |||
+ | Requests the webpage of the specified webpage using the POST method. | ||
+ | |||
+ | ''' Signature ''' | ||
+ | |||
+ | <code>html webclient.post(url, payload [, skipCache])</code> | ||
+ | |||
+ | ''' Parameters ''' | ||
+ | |||
+ | * url (string) - The url of the web page to receive. | ||
+ | * payload (string) - The POST payload of the request. | ||
+ | * skipCache (optional) (boolean) - If true the internal cache is bypassed | ||
+ | |||
+ | ''' Return Value ''' | ||
+ | |||
+ | * html (string) - HTML source of the requested page | ||
+ | * status (numeric) - HTTP status code result of the request. | ||
+ | * isError (boolean) - TRUE if there was an error with the request, otherwise FALSE. | ||
+ | |||
+ | ''' Example ''' | ||
+ | |||
+ | <syntaxhighlight lang="lua"> | ||
+ | -- print the HTML source code of www.google.com | ||
+ | local client = webclient.new() | ||
+ | local source = client:get("http://www.google.com") | ||
+ | print (source) | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | == Error Handling == | ||
+ | |||
+ | When a Lua parser encounters an error there are a few different ways the error can be dispatched. **Note:** None of these methods will explicitly abort the Lua script, this must be done in the Lua implementation. | ||
+ | |||
+ | === <code>error.warn(params)</code> === | ||
+ | |||
+ | The error-text will be displayed in the status bar and, if logging is turned on, the error will be written to the log file. **Note**: If Owl is in **Debug** mode then an error packet is created. | ||
+ | |||
+ | ''' Parameters ''' | ||
+ | * params (table) - String index dictionary of data. | ||
+ | |||
+ | === <code>error.throw(params)</code> === | ||
+ | |||
+ | This will throw an exception in the Owl client causing the request to stop. The error-text will be reported in the status bar and in a pop-up dialog. | ||
+ | |||
+ | ''' Parameters ''' | ||
+ | * params (table) - String index dictionary of data. | ||
+ | |||
+ | === Params === | ||
+ | |||
+ | The following indexes in the `params` table are used in Owl. | ||
+ | * <code>params["error-text"]</code> - (required) Displayed as the error's text | ||
+ | * <code>params["html"]</code> - The html being parsed, can be partial Html from the last request. | ||
+ | * <code>params["url"]</code> - The url that resulted in the html. | ||
+ | |||
+ | [[Category:Lua]] | ||
+ | [[Category:Parsers]] |
Latest revision as of 11:18, 10 June 2014
The Owl Lua API is a platform for building parsers in Owl. It consists of instantiable classes and static objects.
Contents
regexp
Class
The regexp
class exposes a POSIX implementation of regular expressions. The class's API is modeled after Qt 4.x's QRegExp
implementation.
regexp.new()
Initializes a new instance of the regexp
class.
sgml
Class
The sgml
class parses Html markup and builds an SGML DOM.
Note: Parsing of even well formed documents can be slow for very large files.
utils
Class
The utils
class contains static methods to make common routines in parsers more accessible.
utils.md5()
Returns an ASCII md5 encoding of the given string.
Signature
encodedString utils.md5(rawstring)
Parameters
- rawstring (string) - The string to be encoded.
Return Value
- encodedString (string) - The ASCII'd md5 string
Example <syntaxhighlight lang="lua"> local md5 = utils.md5("password") print (md5); -- output: 5f4dcc3b5aa765d61d8327deb882cf99 </syntaxhighlight>
webclient
Class
The webclient
class is used to make HTTP requests. To help reduce load and improve loading time, this class uses an internal cache. Cache requests expire in by default in 30 seconds. Each response is processed through HTML Tidy so that all HTML returned is valid XHTML.
webclient.new()
Initializes a new instance of the webclient
class.
webclient.get()
Requests the webpage of the specified webpage using the GET method.
Signature
html webclient.get(url [, skipCache])
Parameters
- url (string) - The url of the web page to receive.
- skipCache (optional) (boolean) - If true the internal cache is bypassed
Return Value
- html (string) - HTML source of the requested page
- status (numeric) - HTTP status code result of the request.
- isError (boolean) - TRUE if there was an error with the request, otherwise FALSE.
webclient.post()
Requests the webpage of the specified webpage using the POST method.
Signature
html webclient.post(url, payload [, skipCache])
Parameters
- url (string) - The url of the web page to receive.
- payload (string) - The POST payload of the request.
- skipCache (optional) (boolean) - If true the internal cache is bypassed
Return Value
- html (string) - HTML source of the requested page
- status (numeric) - HTTP status code result of the request.
- isError (boolean) - TRUE if there was an error with the request, otherwise FALSE.
Example
<syntaxhighlight lang="lua"> -- print the HTML source code of www.google.com local client = webclient.new() local source = client:get("http://www.google.com") print (source) </syntaxhighlight>
Error Handling
When a Lua parser encounters an error there are a few different ways the error can be dispatched. **Note:** None of these methods will explicitly abort the Lua script, this must be done in the Lua implementation.
error.warn(params)
The error-text will be displayed in the status bar and, if logging is turned on, the error will be written to the log file. **Note**: If Owl is in **Debug** mode then an error packet is created.
Parameters
- params (table) - String index dictionary of data.
error.throw(params)
This will throw an exception in the Owl client causing the request to stop. The error-text will be reported in the status bar and in a pop-up dialog.
Parameters
- params (table) - String index dictionary of data.
Params
The following indexes in the `params` table are used in Owl.
-
params["error-text"]
- (required) Displayed as the error's text -
params["html"]
- The html being parsed, can be partial Html from the last request. -
params["url"]
- The url that resulted in the html.