YLoader User Manual
Regular expression formatting

Introduction

Please note that although Regex Formatting (RXF) is fully functional, it is still an experimental feature and may be subject to change.

RXF is a very flexible way to generate output data in virtually any format. RXF can be used by those with special requirements that cannot be met by merely applying settings such as date format, date separator, field separator etc. For example, using RXF, fields can be rearranged in any order, and some can be repeated or omitted altogether, dates can be represented in any format, multiple different date or field separators can be used on the same line, numbers can be truncated, etc

RXF and the usual settings based formatting cannot be applied at the same time - the user must select one or the other.

RXF has two elements: a regex (or regular expression by its long name), and a format string. The regex defines what substrings to extract from the input data (before transformation), while the format string describes how to arrange these extracted substrings, and optionally other strings, to generate the desired output data format.

RFX can be enabled, and the regex and format string can be set either on the command line or in the Settings dialog (just for the GUI version).

RXF is applied repeatedly to each bar of the input data, until all the data is transformed.

When RXF is used, the line of text received in input by the regular expression has a fixed format, unaffected by other data formatting settings:

<symbol>,<date>,<open>,<low>,<high>,<close>,<volume>

with the date represented as mm/dd/yyyy. This allows consistent and repeatable results across various installations of YLoader, and makes sharing of RXF configurations possible. Here's an example of an input line when RXF is used:

xyz,10/25/2010,20.5,19.75,21.20,21,10000000

The regex

Since regular expressions are a rather broad subject, only information relevant in the context of YLoader will be presented here and functionality will be described mostly through examples. For detailed regex description and other related topics, please consult any of the many web sites or books on the subject. A good introductory article can be found here.

YLoader uses the regex language as defined by the Perl programming language. The complete description of the YLoader supported regex language can be found here.

In the absence of a user set regex, YLoader uses the following default regex:

(.*),(.*)/(.*)/(.*),(.*),(.*),(.*),(.*)

It is clear that each of the () groups (or marked sub-expressions as they are called in the regex jargon) corresponds to one of the fields in the input data, with the characters between these groups corresponding to the various separators. Each of these () groups in the regex extracts the corresponding substring and associates it with a number matching its position in the original input line. This information will be used by the format string to generate the output data.

Applying this default regex to the sample line above, we extract the following substrings, each with the associated index:

  1. symbol - xyz
  2. month - 10
  3. day - 25
  4. year - 2010
  5. open - 20.5
  6. low - 19.75
  7. high - 21.20
  8. close - 21
  9. volume - 10000000

Here is a very brief description of how the default regex works (this paragraph can be skipped by those not interested in more technical details). Each marked sub-expression (MS) extracts a substring which matches a pattern defined by the sub-expression between (). In the above regex, all MS have the format ".*". In the Perl regex language used by YLoader, the dot character is used to match any character in the input string, and the * is used to represent 0 or more repetitions of the previous character, so .* means any character repeated any number of times. The characters between MS match the various separators. The expression ",(.*)," will match and extract a substring containing all characters in the input string between two ",". When multiple such ME are put together, the regex can isolate and extract each component of the input string. More information can be found here.

While the default regex is sufficient for most cases, it can be modified if the desired output format requires it. However, if you have special formatting requirements but are not interested in learning regexes, you can post your desired format on the YLoader forums and other users will provide you with a regex and format string for your specific format.

The format string

For a complete description of the syntax and capabilities of the format string supported by YLoader, go here

The format string takes the substrings extracted by the regular expression and combines them to generate the final output line. Each substring extracted by the regex is specified by $N, where N is the substring index. So in case of the default regex, $1 will specify the symbol, while $9 will specify the volume.

The default format string, which is used by YLoader in the absence of a user set format string is:

$1,$2/$3/$4,$5,$6,$7,$8,$9

which generates the output data in the exact same format as the input data, if the default regex is used.

This format string must be adjusted to achieve more useful results. For example, the following format string:

$1,$2/$3/$4,$8

will generate the output in the format:

<symbol>,<date>,<close>

with the date represented as mm/dd/yyyy.

or using the same sample line above:

xyz,10/25/2010,21