Professional Documents
Culture Documents
String Manipulation
String Manipulation
Abstract
The “String Manipulation Library” adds a set of general
purpose functions to the pawn scripting language. The
functions support both packed and unpacked strings.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Packed and unpacked strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
UU-encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Implementing the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Native functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
ITB CompuPhase
ii
The information in this manual and the associated software are provided “as is”.
There are no guarantees, explicit or implied, that the software and the manual
are accurate.
Requests for corrections and additions to the manual and the software can be
directed to ITB CompuPhase at the above address.
Typeset with TEX in the “Computer Modern” and “Palatino” typefaces at a base size of 11 points.
1
Introduction
The “pawn” programming language depends on a host application to provide an
interface to the operating system and/or to the functionality of the application.
This interface takes the form of “native functions”, a means by which a pawn
script calls into the application. The pawn “core” toolkit mandates or defines no
native functions at all (the tutorial section in the manual uses only a minimal set
of native functions in its examples). In essence, pawn is a bare language to which
an application-specific library must be added.
That non-withstanding, the availability of general purpose native-function li-
braries is desirable. The “String Manipulation Library” discussed in this doc-
ument intends to be such a general-purpose module.
This application note assumes that the reader understands the pawn language.
For more information on pawn, please read the manual “The pawn booklet —
The Language” which is available from the company homepage.
UU-encoding
For transmitting binary data over communication lines/channels or protocols that
do not support 8-bit transfers, or that reserve some byte values for special “control
characters”, a 6-bit data encoding scheme was devised that uses only the standard
ascii range. This encoding is called “UU-encoding”.
This daemon can encode a stream of binary data into ascii strings that can be
transmitted over all networks that support ascii.
The basic scheme is to break groups of 3 eight bit bytes (24 bits) into 4 six bit
characters and then add 32 (a space) to each six bit character which maps it into
the readily transmittable character. As some transmission mechanisms compress
or remove spaces, spaces are changed into back-quote characters (ascii 96) —this
is a modification of the scheme that is not present in the original versions of the
UU-encode algorithm.
Another way of phrasing this is to say that the encoded 6 bit characters are
mapped into the set:
‘!"#$%&’()*+,-./012356789:;<=>?@ABC...XYZ[\]^_
for transmission over communications lines.
A small number of eight bit bytes are encoded into a single line and a count is
put at the start of the line. Most lines in an encoded file have 45 encoded bytes.
When you look at a UU-encoded file note that most lines start with the letter “M”.
“M” is decimal 77 which, minus the 32 bias, is 45. The purpose of this further
chopping of the byte stream is to allow for handshaking. Each chunk of 45 bytes
(61 encoded characters, plus optionally a newline) is transferred individually and
the remote host typically acknowledges the receipt of each chunk.
Some encode programs put a check character at the end of each line. The check
is the sum of all the encoded characters, before adding the mapping, modulo 64.
Some encode programs have bugs in this line check routine; some use alternative
UU-encoding 3
methods such as putting another line count character at the end of a line or
always ending a line with an “M”. The functions in this module encode byte
arrays without line check characters, and the decoder routine ignores any “check”
characters behind the data stream.
To determine the end of a stream of UU-encoded data, there are two common
conventions:
⋄ When receiving a line with less that 45 encoded bytes, it signals the last line. If
the last line contains 45 bytes exactly, another line with zero bytes must follow.
A line with zero encoded bytes is a line with only a back-quote.
⋄ A stream must always be ended with a line with 0 (zero) encoded bytes. Re-
ceiving a line with less than 45 encoded bytes does not signal the end of the
stream — it may indicate that further data is only delayed.
4
The “String Manipulation Library” consists of the two files amxstring.c and
string.inc. The C file may be “linked in” to a project that also includes the pawn
abstract machine (amx.c), or it may be compiled into a DLL (Microsoft Windows)
or a shared library (Linux). The .inc file contains the definitions for the pawn
compiler of the native functions in amxstring.c. In your pawn programs, you
may either include this file explicitly, using the #include preprocessor directive,
or add it to the “prefix file” for automatic inclusion into any pawn program that
is compiled.
The “Implementer’s Guide” for the pawn toolkit gives details for implementing
the extension module described in this application note into a host application.
The initialization function, for registering the native functions to an abstract
machine, is amx_StringInit and the “clean-up” function is amx_StringCleanup.
In the current implementation, calling the clean-up function is not required.
If the host application supports dynamically loadable extension modules, you may
alternatively compile the C source file as a DLL or shared library. No explicit
initialization or clean-up is then required. Again, see the Implementer’s Guide for
details.
5
Usage
Depending on the configuration of the pawn compiler, you may need to explicitly
include the string.inc definition file. To do so, insert the following line at the
top of each script:
#include <string>
The angle brackets “<...>” make sure that you include the definition file from
the system directory, in the case that a file called string.inc or string.p also
exists in the current directory.
From that point on, the native functions from the string manipulation library are
available.
Several functions have a parameter that specifies the maximum number of cells
that a destination buffer can hold. The purpose of this parameter is to avoid an
accidental buffer overrun. Note that this parameter always gives the buffer size in
cells, even for packed strings. The rationale behind this choice is that the sizeof
operator of pawn also returns the size of buffers in cells.
6
Native functions
Resources
The pawn toolkit can be obtained from www.compuphase.com in various for-
mats (binaries and source code archives). The manuals for usage of the language
and implementation guides are also available on the site in Adobe Acrobat format
(PDF files).
Documentation on Unicode and the Basic Multilingual Plane (BMP) appears on
http://www.unicode.org.
18 Resources
19
Index
! #include, 4 R Registering, 4