Text processing

Strings

Strings are vars that contain text.  Strings can also store binary data.

Creating strings

Strings are either "wide" or "narrow", which refers to the number of bytes used to represent each character.  Narrow strings are created using double-quotes, and wide strings are prefixed with an L as follows:

var s1 = "";       // Empty narrow string
var s2 = "Hello";  // Narrow string
var s3 = L"Hello"; // Wide string
var s4 = +s3;      // Clone s3

Using strings

The length of the string is returned by the var::size() method.  A string converts to true (using var::as_bool()) if it is non-empty, or false if it is empty.  Converting to an integer (using var::as_int()) parses the integer in the string, and if parsing fails, then as_int() returns 0.

Individual characters can be accessed using the square bracket notation, starting from index 0.  If the requested index is out of range, then the space character is returned.  Negative indexes are counted from the back of the string.  e.g.

var s2 = "Hello";
writeln( s2[0] );   // Output: H
writeln( s2[-1] );  // Output: o
writeln( s2[100] ); // Output: ' '

The characters of the string can be enumerated using foreach:

foreach( ch, "hello" ) write(ch); // Output: hello

The unary + operator clones a string, and the binary + operator concatenates two strings.  The multiplication operator repeats a string a given number of times.

var s2 = "Hello";
writeln( s2 + " world!"); // Output: Hello world!
writeln( s2 + 123 );      // Output: Hello123
writeln( s2 * 2 );        // Output: HelloHello

Modifying strings

The square brackets operator can be used to set individual characters in the string.  If the index is larger than the current size of the string, then the string is enlarged, padding with spaces if necessary.

var s = "abc";
s[3]='d';
writeln(s); // abcd

The += operator appends text to the end of the string, and the *= operator repeats the string.  e.g.

var s = "abc";
s+="def";
writeln(s); // abcdef
s*=2;
writeln(s)  // abcdefabcdef

The var::resize() method changes the size of the string.  If the string is extended, then the space character is used to pad the end of the string.

The var::clear() method erases the entire string (same as resize(0)).

The var::insert() method inserts one string into another.  If the var inserted isn't a string, it is converted to a string using var::as_string() first.  The first argument is the position, and the second argument is the string to be inserted.  The var::erase() method erases the given range of characters.  e.g.

var s="hello";
s.insert(2, "xx");
writeln(s); // hexxllo
s.erase( range(4,5) );
writeln(s); // hexxo

See here for an explanation of the range() function.

Comparing strings

Strings are compared in lexographic (dictionary) order.  It is not possible to compare narrow strings with wide strings - you'd need to convert the narrow string to wide using var::as_wstring().

Substrings

The substring() function returns a section of a string.  Its first argument is the string, and the second argument is a range.  e.g.

writeln( substring("hello", range(1,2) ) ); // Output: el

Searching strings

Example (split.cpp):

#include <cppscript>
 
var script_main(var)
{
    var text = "The cat sat on the mat";
    var is_not_space = bind( is_not_one_of, " \t\r\n" );
    foreach( word, split_chars( is_not_space, text ) )
        writeln(word);
    return 0;
}

Files

There are various types of file, corresponding to files on disk, in-memory files and the console.  The functions used to create files are:

Files are objects with the following methods (not all files have all methods, e.g. the output files don't have input methods etc).

For random-access to files, the platform, C and C++ file functions must be used (though this may come in a later version).

The "io_error" exception (a var with class_name "io_error") is thrown if there is an IO error (other than end-of-file which returns null).

Files can be enumerated using the lines() and characters() function.

This example implements a simple "grep" program to list occurrences of a string in a set of files.  (grep.cpp)

#include <cppscript>
 
void search_for_string_in_file(var str, var filename)
{
   try
   {
          var file = read_file(filename);
          finally( file["close"] );
          var line_no = 1;
          foreach(line, lines(file))
          {
                  if( string_find( line, str ) )
                          writeln( filename + ":" + line_no + " " + line );
                  ++line_no;
          }
   }
   catch( var )
   {
          err()["writeln"]("Error reading from file " + filename);
   }
}
 
void search_for_string_in_files(var str, var files)
{
   foreach( file, files ) search_for_string_in_file(str, file);
}
 
var script_main(var input)
{
   if(input.size()<2) 
   {
          err()["writeln"]("Usage: grep search_term file1 file2 ...");
          return 1;
        }
   search_for_string_in_files(input[0], tail(input));
   return 0;
}

Pickling

Pickling is the process of converting a var into a string (where it can be saved in a file, or transmitted across the network).  Pickling makes it easy to save program state without fussing about file formats.

Pickling saves the object structure, for example if you have complex data structure with loops (also objects with closures have loops), then the object structure is preserved.

The slight complication comes when you pickle an object which contains methods.  The "methods" cannot be pickled, and the only thing that can be saved is a reference to the function.  However the "identity" of a function in C++ is its address, which could easily change between versions and platforms.  Therefore storing the function address in the pickled object is not practical for long term storage.

For this reason, functions to be pickled must be declared (registered) to C++Script using the macro allow_pickle(function_name), written outside of a function definition.  Any attempt to pickle a function not declared in this way throws an exception not_found.  Beware of function overloading (where the same function name is used to denote different functions with different arguments).

The pickle() function converts any var (with a few exceptions, such as iterators and native C++ types) into a string, and the unpickle() function converts a pickled string back into a var.  If unpickle() encounters an error decoding the string, it throws the exception of type "invalid_string".

e.g. (history.cpp)

#include <cppscript>
 
void clear_history(var history) { history["list"].clear(); }
enable_pickle(clear_history);
 
void add_history(var history, var line) { history["list"].push_back(line); }
enable_pickle(add_history);
 
void print_history(var history)
{
    if(history["list"])
         foreach( line, history["list"] ) writeln(line);
    else
          writeln("(history is empty)");
}
enable_pickle(print_history);
 
var command_history()
{
    return object("command_history").extend
         ("clear", clear_history)
         ("add", add_history)
         ("print", print_history)
         ("list", array());
}
 
var script_main(var lines)
{
    var history;
 
    try 
    { 
         history=unpickle_file("history.dat"); 
    }
    catch(var) 
    { 
          history = command_history(); 
    } 
 
    if(lines[0] == "clear") 
         history["clear"]();
    else 
         foreach(line, lines) history["add"](line);
 
    history["print"]();
    pickle_file("history.dat", history);
    return 0;
}

pickle() and unpickle() provide a way to deep-copy vars.