TestML

A Software Testing Meta Language


Status

Working Draft - 2 August, 2010

This document is the TestML Language Specification. Version 1.0 Pre-release.

This is not the final 1.0 release. It is very new, under review, under revision and undergoing initial implementation and testing.


Overview

TestML is a meta language for writing tests that define how a piece of software should behave, regardless of the programming language the software is written in.

It is primarily intended for generic libraries or modules that are relevant in many languages, or that have multiple implementations in the same language. In these scenarios the different implementations can all use the exact same tests. However, TestML's clarity and ease of use may make it desirable for testing all types of software.

TestML documents define a set of data points and a program written in an abstract transform/assertion language that invokes the application being tested. The transform functions alter given data points into new states. Then assertions can be made between the altered data points and expected result data points. Here is a simple, but complete, TestML document example:

    %TestML: 1.0            # This is a TestML 1.0 test document.
    # This statement asserts that the 'input' data points, transformed
    # to upper case, match the corresponding 'output' data points.
    *input.uppercase == *output;
    # The following lines define 2 data blocks, with 2 data points each.
    === Test mixed case string
    --- input: I Like Pie
    --- output: I LIKE PIE
    === Test lower case string
    --- input: i love lucy
    --- output: I LOVE LUCY

In this case, a programmer of any given language would do the following:


History

The concept of TestML was heavily inspired by Ward Cunningham's FIT test framework. The primary difference is that FIT's test documents are table/spreadsheet based, where TestML's are text file based. This reflects the premise that FIT caters to business application development (where spreadsheets are heavily used), while TestML caters to open source library authors (where everything is in text files).

The specifics of TestML (especially the data format), evolved directly from Ingy döt Net's data-driven Perl testing framework, Test::Base. Test::Base was written in 2004 and later ported to JavaScript. As Ingy ported various libraries to other languages, he realized the potential value of making the test suites reusable.


Design Goals

TestML has the following Design Goals:

Platform Agnosticism

TestML strives to make no assumptions about the programming language, environment, or testing framework it will be used in. In this way, the same corpus of test documents can be used against multiple implementations of equivalent software.

Readability

TestML lets you define tests that are easy for both you and others to write, read and maintain. The format is intended to let the data points and test conditions stand out, while keeping the implementation specific details in the bridge class.

Extensibility

TestML has been designed with acknowledgement that it will need to evolve to meet many various testing requirements. For this reason, every document requires a TestML version number. Also the original grammar has been made quite strict, leaving a lot of room for future extension.

Ease of Implementation

TestML is designed to be fairly easy to implement in various programming languages.

The TestML project has a set of tests (written in TestML) that a TestML implementation must pass to be compliant. See http://testml.org/tests/.


Terminology

TestML uses a number of specific terms. The following glossary lists the terms and defines their meanings.

Application

The software that is being tested.

Assertion

A test assertion is a statement in the TestML document that compares data expressions using an assertion operator.

Block

A data block is an object that contains a set of named data points. The block usually has a short label phrase. When the tests are run, the runner class will run each of the test assertions against each of the data blocks that contain the data points needed by the assertion.

Bridge

Every test setup defines a class that connects named transform functions to the software being tested. This is known as a bridge class, because it bridges the test to the application or library.

Document

A TestML document is a file containing meta information, test assertions and data blocks. This specification describes the format of such a document.

Expression

A data expression is a calling sequence of data point references, transform function calls and string constants.

A test assertion compares the result of one or two data expressions.

Meta

A TestML document can define a number of key/value pairs which are considered meta data and are used to instantiate a meta object that the test code can later access.

Operator

A testing operator is a method that compares the final state of two data points.

The currently defined operators are OK, EQ(...) (or == ...) and HAS(...) (or ~~ ...).

Point

A data point is a named piece of data that belongs to a given data block. All data points are assumed to start out as unicode strings, although a transform function may turn them into anything else.

Runner

The TestML class that is responsible for running all the appropriate tests. The runner class is subclassed to integrate into various existing test frameworks in different programming language environments.

Section

Every TestML document has three sections, a meta section, a test section and a data section. The first specifies meta information and the second defines the test assertions that the document is declaring. The last section defines the data blocks against which the test assertions are applied.

Standard Library

TestML defines a set of standard transform functions to perform typical data manipulations or provide well known constant values.

Statement

A test section consists of a sequence of test statements. Statements representing a line of executable code. Generally statements consist of data points, transform functions and test assertions, all chained together in execution order.

Transform

A transform function is a function that takes a context object as its first argument, and an optional list of test expressions, as its remaining arguments. The context object contains the result of the last transform. Thus transforms are chained together to form test expressions.

Transforms are supplied by the TestML standard libarary or a bridge class that usually changes a data point from one form into another. A transform can invoke functionality of the application it is testing. The whole idea of TestML is that by passing a data point through one or more transforms, you can make it equivalent to some other data point, causing the test to pass.


The Specification

The TestML Specification it is defined by a Pegex (http://www.pegex.org) grammar. The grammar can fully parse a TestML document unless the data section is in XML, YAML or JSON, in which case a separate parse for the data section would be needed.

NOTE: The grammar is slightly self-modifying. Certain meta settings in the document can change how the rest of the document is parsed.

Document Encoding

All TestML documents are composed of the printable unicode character set and must be encoded in UTF8. Any character that does not meet these requirements must raise an exception.

Pegex Syntax Review

This is a quick intro to the Pegex syntax used in the specification.

Pegex Grammar Primitives

The following primitives are defined by Pegex in terms of their PCRE regexp equivalents. They are listed here for completeness.

PRINTABLE: /[\x09\x0A\x0D\x20-\x7E\xA0-\x{D7FF}\x{E000}-\x{FFFD}]/

NOTE: Non-PRINTABLE characters are not allowed in a TestML stream. The following units do not allow them either, but they have been left out of the regexps for simplicity sake.

ALWAYS: // # Always match
ALL: /[\s\S]/ # Every unicode character
ANY: /./ # Any character except newline
BLANK: /[\ \t]/ # A space or tab character
BLANKS: /\ \t/ # For inclusion in character classes
BREAK: /\n/ # A newline character
EOL: /\r?\n/ # A Unix or DOS line ending
LOWER: /[a-z]/ # Lower case ASCII alphabetic character
UPPER: /[A-Z]/ # Upper case ASCII alphabetic character
WORD: /\w/ - ie '[A-Za-z0-9_]' # A ``word'' character
DIGIT: /[0-9]/ # A numeric digit
EQUAL: /=/ # An equals sign
TILDE: /~/ # A tilde character
STAR: /\*/ # An asterisk
DASH: /-/ # A dash character
DOT: /\./ # A period character
COLON: /:/ # A colon character
SEMI: /;/ # A semicolon character
HASH: /#/ # An octothorpe (or hash) character
BACK: /\\/ # A backslash character
SINGLE: /'/ # A single quote character
DOUBLE: /``/ # A double quote character
LPAREN: /\(/ # A left parenthesis
RPAREN: /\)/ # A right parenthesis
LSQUARE: /\[/ # A left square bracket
LANGLE: /</ # A left angle bracket

Grammar Tokens

This section defines common tokens used throughout the grammar.

escape: /[0nt]/

One of the escapable string characters

line: <ANY>* <EOL>

A line is a string of zero or more non break characters followed by a line ending.

blank_line: <BLANK>* <EOL>

Blank lines are useful to space things out visually.

comment: <HASH> <line>

Comments start with a '#' and go to the end of the line. Comments are only allowed where specifically permitted by the grammar. Comments are treated like insignificant whitespace and ignored by the parser.

ws: <BLANK> | <EOL> | <comment>

The ws token represents characters in the stream that can be thrown away.

quoted_string: <single_quoted_string> | <double_quoted_string>

A quoted string can use either single or double quotes.

single_quoted_string: /(?:<SINGLE>(([^<BREAK><BACK><SINGLE>]|<BACK><SINGLE>|<BACK><BACK>)*?)<SINGLE>)/

A single quoted string encodes the characters placed between the pair of single quotes. A backslash is used to encode a backslash or a single quote character.

    'Won\'t you scratch\\slash my back?'
double_quoted_string: /(?:<DOUBLE>(([^<BREAK><BACK><DOUBLE>]|<BACK><DOUBLE>|<BACK><BACK>|<BACK><escape>)*?)<DOUBLE>)/

A double quoted string encodes the characters placed between the pair of double quotes. A backslash is used to encode a backslash or a double quote character, and also a tab, a newline or a null using the characters 't', 'n' or '0' respectively.

    "\tline 1\n\"line\" 2\n"
unquoted_string: /([^<BLANKS><BREAK><HASH>](?:[^<BREAK><HASH>]*[^<BLANKS><BREAK><HASH>])?)/

AN unquoted string is the first to last non-space characters on a line before an optional comment.

Top Level Document Grammar

document: <meta_section> <test_section> <data_section>?

A TestML Document consists of 3 sections, a meta section followed by a test section followed by an optional inline data section.

Here is an example document:

    # TestML document includes:
    # Meta statements,
    %TestML: 1.0
    # And test statements.
    *foo.upper == *bar;
    *foo == *bar.lower;
    # And a data section that defines data blocks.
    === Test vowels
    --- foo: i ie ie
    --- bar: I IE IE
    === Test consonants
    --- foo
    lk p
    --- bar
    LK P

Meta Section of the Grammar

meta_section: [ <comment> | <blank_line> ]* <meta_testml_statement> [ <meta_statement> | <comment> | <blank_line> ]*

The meta section is the first section of the document. It must contain a testml version statement, and that statement must be the first meta statement declared.

The parsing of the meta section should instantiate a Meta object.

meta_testml_statement: /%TestML:/ <BLANK>+ <testml_version> [ <BLANK>+ <comment> | <EOL> ]

The meta testml statement is required, must be the first meta statement, and must specify a valid TestML release version number.

testml_version: <DIGIT> <DOT> <DIGIT>+

Currently the only valid version number is '1.0'.

meta_statement: /%/ <meta_keyword> /:/ <BLANK>+ <meta_value> [ <BLANK>+ <comment> | <EOL> ]

Meta statements are key/value pairs separated by a colon and a BLANK.

NOTE: A given meta setting may affect the document parsing/grammar from that point forward.

<meta_keyword>: <core_meta_keyword> | <user_meta_keyword>

Meta settings come in two forms. TestML defined core settings, and user defined settings.

core_meta_keyword: /(Title|Data|Plan|BlockMarker|PointMarker)/

The listed keywords represent the core meta settings supported in TestML 1.0. See Valid Meta Keywords for the definitions of them.

user_meta_keyword: <LOWER> <WORD>*

A user can make up their own meta settings, as long as they begin with a lower case letter. These values are then available to the runtime code.

meta_value: <quoted_string> | <unquoted_string>

A meta value is the value of the meta setting being defined. Some settings like data are stored as arrays. Multiple statements with they same keyword add an element to the array.

Test Section of the Grammar

This section of TestML is the mini, generic programming language. In a nutshell, the language consists of a list of chained expressions, some of which make assertions. The following statements are valid syntax:

    # Expressions:
    'string';
    function();
    Constant;
    *data_point;
    *data_point.transform;
    *data_point.transform1.transform2;
    *data.trans('string', "string");
    *data1.trans1(*data2, 'str').trans2(*data3.trans3);
    # Assertions:
    <expression> == 'string';
    <expression> == Constant;
    <expression> == <expression>;
    <expression>.EQ(<expression>);

Lines beginning with '#' are comments. The first eight statements are test expressions. The last four lines are test assertions which test the results of two expressions. The last two are equivalent operations using different syntax. (Note: Replace '<expression>' with any valid TestML expression. '<foo>' is not valid TestML syntax)

test_section: [ <ws> | <test_statement> ]*

The test section contains a small program written in the TestML programming language. This language is used to describe transform functions and the assertions that should hold. In other words, the testing program.

The parsing of the meta section should instantiate a Test object. The Test object will contain an abstract syntax tree that the TestML runtime can execute.

test_statement: <test_expression> <assertion_expression>? /<SEMI>/

A test statement is a processing directive for running tests. A statement can either be a single test expression, which causes code to run but doesn't execute tests, or it can be an assertion which runs a test operation by comparing the results of two data expressions.

test_expression: <sub_expression> [ <!assertion_call_test> <call_indicator> <sub_expression> ]*

A test expression is a series of one or more sub expressions which are evaluated in order, passing the result of each to the next one.

sub_expression: <point_call> | <string_call> | <transform_call>

A sub expression is a chained set of transform function calls. Points and strings are special cases of these calls. All sub expressions are turned into transform calls after compilation. For example:

   *foo.bar("baz") == True;

becomes (internally):

   Point('foo').bar(String("baz")).EQ(True());
point_call: <STAR> <LOWER> <WORD>*

A data point is a user defined data name, preceded by a '*' and beginning with a lower case letter.

A data point is really a short form for a transform call to 'Point' with the point name. These two statements are the same:

    *foo.bar;
    Point('foo').bar();

except that the asterisk point names are also used to select the data blocks that will apply to that expression.

Statements with points in them apply to all matching blocks. For example, this TestML statement:

    *foo.bar == *baz;

would behave like this TestMLesque pseudo-code:

    for block in SelectDataBlocks('foo', 'baz') {
        Assert(block.Point('foo').bar() == block.Point('baz'));
    }

NOTE: The above code is NOT valid TestML.

string_call: <quoted_string>

A quoted string can be used anywhere in a test expression, but it is usually used as an argument passed to a transform function.

    'O my word!'.Split.Join('-') == 'O-my-word';
transform_call: <transform_name> <transform_argument_list>?

A transform call consists of a transform name, possibly followed by parenthesized arguments.

transform_name: <user_transform> | <core_transform>

A transform is either a standard core one, or a user defined transform.

user_transform: <LOWER> <WORD>*

A user data transform name is an identifier that begins with a lower case letter. These transforms are user defined in the bridge class.

core_transform: <UPPER> <WORD>*

A core data transform is a name beginning with an upper case letter. It refers to a constant value defined in the TestML Standard Library, which contains things like True, False and Null.

call_indicator: [<DOT> <ws>* | <ws>* <DOT>]

A call indicator is a period between two sub expressions. The period must be adjecent to one of the sub expressions.

transform_argument_list: <LPAREN> <ws>* <transform_arguments>? <ws>* <RPAREN>

A parenthesized list of zero or more arguments, where each argument is a sub-expression.

transform_arguments: <transform_argument> [ <ws>* <COMMA> <ws>* <transform_argument> ]*

The above two rules can be combined when Pegex has better syntax for lists.

transform_argument: <sub_expression>

Any sub-expression can be a transform function argument.

assertion_call_test: /<call_indicator>(?:EQ|OK|HAS)<LPAREN>/

This rule is a quick test to see if an assertion follows.

assertion_call: <assertion_eq> | <assertion_ok> | <assertion_has>

There are three types of assertions: EQ, OK and HAS. An assertion call is a declaration of a testing operation. A typical assertion looks like:

    *input_point.transform1.transform2(arguments) == *expected_output_point;

An assertion expression can either be an inline operator expression:

    *foo == *bar;

or a function call:

    *foo.EQ(*bar);
assertion_eq: <assertion_operator_eq> | <assertion_function_eq>

The EQ assertion asserts that two expression result values are equal.

assertion_operator_eq: <ws>+ <EQUAL> <EQUAL> <ws>+ <test_expression>
    *foo == *bar;
assertion_function_eq: <call_indicator> /EQ/ <LPAREN> <test_expression> <RPAREN>
    *foo.EQ(*bar);
assertion_ok: <assertion_function_ok>

The OK assertion asserts that an expression result is true.

assertion_function_ok: <call_indicator> /OK/ <empty_parens>?
    *foo.OK;
assertion_has: <assertion_operator_has> | <assertion_function_has>

The HAS assertion asserts that the result value string on one expression contains the result value string of a second expression.

assertion_operator_has: <ws>+ <TILDE><TILDE> <ws>+ <test_expression>
    *foo ~~ *bar;
assertion_function_has: <call_indicator> /HAS/ <LPAREN> <test_expression> <RPAREN>
    *foo.HAS(*bar);
empty_parens: <LPAREN> <ws>* <RPAREN>

An optional empty set of parentheses is allowed on assertion calls that don't take a second argument.

Data Section of the Grammar

The data section defines a sequence of data blocks, each of which is mapping consisting of an optional short label phrase and a set of named data points.

data_section: <testml_data_section> | <yaml_data_section> | <json_data_section> | <xml_data_section>

TestML defines a default data section syntax, but this section can also be specified in YAML, JSON or XML.

The parsing of the data section should instantiate a Data object.

testml_data_section: <&block_marker> <data_block>*

The TestML data section starts when the data block marker is detected at the beginning of a line and continues to the end of the file. This section is parsed into 0-n data block objects.

Here is an example data section in TestML:

    === Test one
    --- input: abc
    --- output: 123
    === Test two
    --- input: xyz
    --- output: 321
yaml_data_section: ( <DASH> <DASH> <DASH> <BLANK>* <EOL> <rest> )

Here is an example data section in YAML:

    ---
    - -label: Test one
      input: abc
      output: 123
    - -label: Test two
      input: xyz
      output: 321
json_data_section: ( <LSQUARE> <rest> )

Here is an example data section in JSON:

    [
      {
        "-label": "Test one",
        "input": "abc",
        "output": "123"
      },
      {
        "-label": "Test two",
        "input": "xyz",
        "output": "321"
      }
    ]
xml_data_section: ( <LANGLE> <rest> )

Here is an example data section in XML:

    <testml>
      <block label="Test one">
        <input>abc</input>
        <output>123</output>
      </block>
      <block label="Test two">
        <input>xyz</input>
        <output>321</output>
      </block>
    </testml>
rest: <ALL>+

The rest token is simply the remainder of the text in the file. The data sections are all parsed by separate parsers/grammars.

data_block: <block_header> [ <blank_line> | <comment> ]* <block_point>*

A data block in TestML syntax looks like:

    === Block Label
    --- point1: phrase data
    --- point2
    line
    data
block_header: <block_marker> [ <BLANK>+ <block_label> ]? <BLANK>* <EOL>

A block header marks the start of a new block. The first one also marks the start of the data section. It contains an optional label.

    === The next big test
block_marker: <EQUAL> <EQUAL> <EQUAL> | Meta.BlockMarker

A block marker is usually ===, but it is configurable via a meta statement, like this:

    BlockMarker: ***
block_label: <unquoted_string>

A label is a short description of the block. It can be used to label tests at runtime.

block_point: <lines_point> | <phrase_point>

Data block points are the pieces of raw data that TestML transforms and compares. They come in two flavors, lines and phrases.

lines_point: <point_marker> <BLANK>+ <point_name> <BLANK>* <EOL> <point_lines>

A ``lines'' data point encodes a string containing zero or more lines. If it has one or more lines it always ends with a newline.

    --- lines
    line1
    line2 (3 is blank)
    
    line4
point_lines: [ [! <block_marker> | <point_marker> ] <line> ]*

All the lines before the next block or point header.

phrase_point: <point_marker> <BLANK>+ <point_name> <COLON> <BLANK> <point_phrase> <EOL> [ <comment> | <blank_line> ]*

A ``phrase'' data point encodes a string with no newlines.

    --- phrase: This string is one line with no newline at the end.
point_marker: <DASH> <DASH> <DASH> | Meta.PointMarker

A point marker is usually ---, but it is configurable via a meta statement, like this:

    %PointMarker: +++
point_name: <LOWER> <WORD>*

The name of this point is 'rudy':

    --- rudy: can't fail
point_phrase: <unquoted_string>

The point phrase above is ``can't fail''.


Core Meta Keywords

The following meta section keywords are supported by TestML 1.0.

TestML - required

The TestML meta statement is required, must be the first meta statement and must specify a valid TestML release version number.

Once the version number is known by the parser, it controls how the rest of the document is parsed. If the parser does not support the specified version number, it must raise an exception.

For example:

    %TestML: 1.0

As a guideline, the TestML specification authors will attempt to be backwards compatible with version numbers that have the same first number. So version 1.3 would be backwards compatible with 1.2, 1.1 and 1.0. Version 2.0 would probably break compatability.

Title - optional

The Title meta option specifies a short title phrase to describe the entire TestML document.

    %Title: These are a few of my Favorite Tests!

The TestML implementation is free to use this meta field as it sees fit.

Data - optional

The Data meta option can be specified zero, one or multiple times. It names a relative file path that should be read and parsed as a data section. The resulting sequences of all the data files are concatenated together into the Data object.

By default, the inline data section comes first when other data sources are specified. A special file path name '_' can be used to indicate the inline data section.

The file extension is used to determine the format of the data section encoded in the file.

Example:

    %Data: file1.tml
    %Data: file2.xml
    %Data: _
    %Data: file4.yaml

The Data attribute of the Meta object is always stored as an array.

Plan - optional

The Plan meta option specifies the expected number of tests that will be run by the runner. If the actual number differs, the implementation should complain or raise an exception.

    %Plan: 13
BlockMarker - optional

If you choose to use the TestML data markup to encode your data section, you can choose the character sequence to replace the default ('==='). You would wnat to do this primarily if you had '===' at the beginning of a line in your data.

Setting this option causes the parser to immediately start looking for the specified pattern to indicate the start of the data section.

    %BlockMarker: *=*
PointMarker - optional

This option is the same as the BlockMarker option above, except it is for the data point marker.

    %PointMarker: +++


TestML Standard Library

TestML defines a standard library specification. The standard library must be implemented and available in every TestML implementation. The library specification has a version number that is kept in sync with this specification's version number.

The Standard Library Specification is defined in a separate document. The latest version is available at http://testml.org/specification/standard-library/.


TestML Runtime Specification

In order to develop a TestML implementation, one must correctly implement the runtime processing. TestML defines a runtime specification. The runtime specification has a version number that is kept in sync with this specification's version number.

The TestML Runtime Specification is defined in a separate document. The latest version is available at http://testml.org/specification/runtime/.


Authors

TestML Version 1.0 was created by Ingy döt Net <ingy@ingy.net>

The spec was reviewed and/or contributed to by the following people:


License

This work is licensed under the Creative Commons Attribution Share Alike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/legalcode; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.


Copyright

Copyright (c) 2009, 2010. Ingy döt Net.