Difference between revisions of "Owl Lua API"

From Owl
Jump to: navigation, search
(utils.md5())
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Owl Lua API =
+
The Owl Lua API is a platform for building parsers in Owl. It consists of instantiable classes and static objects.  
 
 
The Owl Lua API is a platform for building parsers in Owl. It consists of instantiable classes and static objects. Each response is processed through [http://tidy.sourceforge.net/ HTML Tidy] so that all HTML returned is valid XHTML.
 
  
 
== <code>regexp</code> Class ==  
 
== <code>regexp</code> Class ==  
Line 13: Line 11:
 
== <code>sgml</code> Class ==
 
== <code>sgml</code> Class ==
  
The <code>sgml</code> class parses Html markup and builds an SGML DOM. '''Note:''' Parsing of even well formed documents can be slow for very large files.
+
The <code>sgml</code> class parses Html markup and builds an SGML DOM.  
 +
 
 +
'''Note:''' Parsing of even well formed documents can be slow for very large files.
  
 
== <code>utils</code> Class ==
 
== <code>utils</code> Class ==
Line 28: Line 28:
  
 
'''Parameters'''
 
'''Parameters'''
* ''rawstring'' (string) - The string to be encoded.
+
* rawstring (string) - The string to be encoded.
  
 
'''Return Value'''
 
'''Return Value'''
* ''encodedString'' (string) - The ASCII'd md5 string
+
* encodedString (string) - The ASCII'd md5 string
  
 
''' Example '''
 
''' Example '''
Line 39: Line 39:
 
-- output: 5f4dcc3b5aa765d61d8327deb882cf99
 
-- output: 5f4dcc3b5aa765d61d8327deb882cf99
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
== <code>webclient</code> Class ==
 +
 +
The <code>webclient</code> class is used to make HTTP requests. To help reduce load and improve loading time, this class uses an internal cache. Cache requests expire in by default in 30 seconds. Each response is processed through [http://tidy.sourceforge.net/ HTML Tidy] so that all HTML returned is valid XHTML.
 +
 +
=== <code>webclient.new()</code> ===
 +
 +
Initializes a new instance of the <code>webclient</code> class.
 +
 +
=== <code>webclient.get()</code> ===
 +
 +
Requests the webpage of the specified webpage using the GET method.
 +
 +
''' Signature '''
 +
 +
<code>html webclient.get(url [, skipCache])</code>
 +
 +
''' Parameters '''
 +
 +
* url (string) - The url of the web page to receive.
 +
* skipCache (optional) (boolean) - If true the internal cache is bypassed
 +
 +
''' Return Value '''
 +
 +
* html (string) - HTML source of the requested page
 +
* status (numeric) - HTTP status code result of the request.
 +
* isError (boolean) - TRUE if there was an error with the request, otherwise FALSE.
 +
 +
=== <code>webclient.post()</code> ===
 +
 +
Requests the webpage of the specified webpage using the POST method.
 +
 +
''' Signature '''
 +
 +
<code>html webclient.post(url, payload [, skipCache])</code>
 +
 +
''' Parameters '''
 +
 +
* url (string) - The url of the web page to receive.
 +
* payload (string) - The POST payload of the request.
 +
* skipCache (optional) (boolean) - If true the internal cache is bypassed
 +
 +
''' Return Value '''
 +
 +
* html (string) - HTML source of the requested page
 +
* status (numeric) - HTTP status code result of the request.
 +
* isError (boolean) - TRUE if there was an error with the request, otherwise FALSE.
 +
 +
''' Example '''
 +
 +
<syntaxhighlight lang="lua">
 +
-- print the HTML source code of www.google.com
 +
local client = webclient.new()
 +
local source = client:get("http://www.google.com")
 +
print (source)
 +
</syntaxhighlight>
 +
 +
== Error Handling ==
 +
 +
When a Lua parser encounters an error there are a few different ways the error can be dispatched. **Note:** None of these methods will explicitly abort the Lua script, this must be done in the Lua implementation.
 +
 +
=== <code>error.warn(params)</code> ===
 +
 +
The error-text will be displayed in the status bar and, if logging is turned on, the error will be written to the log file. **Note**: If Owl is in **Debug** mode then an error packet is created.
 +
 +
''' Parameters '''
 +
* params (table) - String index dictionary of data.
 +
 +
=== <code>error.throw(params)</code> ===
 +
 +
This will throw an exception in the Owl client causing the request to stop. The error-text will be reported in the status bar and in a pop-up dialog.
 +
 +
''' Parameters '''
 +
* params (table) - String index dictionary of data.
 +
 +
=== Params ===
 +
 +
The following indexes in the `params` table are used in Owl.
 +
* <code>params["error-text"]</code> - (required) Displayed as the error's text
 +
* <code>params["html"]</code> - The html being parsed, can be partial Html from the last request.
 +
* <code>params["url"]</code> - The url that resulted in the html.
 +
 +
[[Category:Lua]]
 +
[[Category:Parsers]]

Latest revision as of 12:18, 10 June 2014

The Owl Lua API is a platform for building parsers in Owl. It consists of instantiable classes and static objects.

regexp Class

The regexp class exposes a POSIX implementation of regular expressions. The class's API is modeled after Qt 4.x's QRegExp implementation.

regexp.new()

Initializes a new instance of the regexp class.

sgml Class

The sgml class parses Html markup and builds an SGML DOM.

Note: Parsing of even well formed documents can be slow for very large files.

utils Class

The utils class contains static methods to make common routines in parsers more accessible.

utils.md5()

Returns an ASCII md5 encoding of the given string.

Signature

encodedString utils.md5(rawstring)

Parameters

  • rawstring (string) - The string to be encoded.

Return Value

  • encodedString (string) - The ASCII'd md5 string

Example <syntaxhighlight lang="lua"> local md5 = utils.md5("password") print (md5); -- output: 5f4dcc3b5aa765d61d8327deb882cf99 </syntaxhighlight>

webclient Class

The webclient class is used to make HTTP requests. To help reduce load and improve loading time, this class uses an internal cache. Cache requests expire in by default in 30 seconds. Each response is processed through HTML Tidy so that all HTML returned is valid XHTML.

webclient.new()

Initializes a new instance of the webclient class.

webclient.get()

Requests the webpage of the specified webpage using the GET method.

Signature

html webclient.get(url [, skipCache])

Parameters

  • url (string) - The url of the web page to receive.
  • skipCache (optional) (boolean) - If true the internal cache is bypassed

Return Value

  • html (string) - HTML source of the requested page
  • status (numeric) - HTTP status code result of the request.
  • isError (boolean) - TRUE if there was an error with the request, otherwise FALSE.

webclient.post()

Requests the webpage of the specified webpage using the POST method.

Signature

html webclient.post(url, payload [, skipCache])

Parameters

  • url (string) - The url of the web page to receive.
  • payload (string) - The POST payload of the request.
  • skipCache (optional) (boolean) - If true the internal cache is bypassed

Return Value

  • html (string) - HTML source of the requested page
  • status (numeric) - HTTP status code result of the request.
  • isError (boolean) - TRUE if there was an error with the request, otherwise FALSE.

Example

<syntaxhighlight lang="lua"> -- print the HTML source code of www.google.com local client = webclient.new() local source = client:get("http://www.google.com") print (source) </syntaxhighlight>

Error Handling

When a Lua parser encounters an error there are a few different ways the error can be dispatched. **Note:** None of these methods will explicitly abort the Lua script, this must be done in the Lua implementation.

error.warn(params)

The error-text will be displayed in the status bar and, if logging is turned on, the error will be written to the log file. **Note**: If Owl is in **Debug** mode then an error packet is created.

Parameters

  • params (table) - String index dictionary of data.

error.throw(params)

This will throw an exception in the Owl client causing the request to stop. The error-text will be reported in the status bar and in a pop-up dialog.

Parameters

  • params (table) - String index dictionary of data.

Params

The following indexes in the `params` table are used in Owl.

  • params["error-text"] - (required) Displayed as the error's text
  • params["html"] - The html being parsed, can be partial Html from the last request.
  • params["url"] - The url that resulted in the html.