OptiDoc™ ReportServer

OptiDoc™ ReportServer Manual

Click here to view the versions and updates log.

Table of Contents

Overview

The main purpose of this application program is to accept input from various feeds and to add them into the document imaging system. This procedure is sometimes recognized in the industry as COLD (computer output to laser disk) processing. Input types now include Mainframe reports, PDF documents, TIFF documents, OptiDoc™ IRF documents, XML documents, SAP IDOC documents, as well as other document types.

This application recognizes document types and selected work modes and processes the inbound metadata into document fields according a set of rules. It can break up large print job files into variable length documents and intelligently locate the necessary field data for each database record and document. Finally it inserts the resulting documents into the OptiDoc™ database. This application is specifically designed to be resilient to infrastructure technology environments with intermittent network connections and file systems.

Channels

A channel of processing is contained within the context of a child window within the standard multiple document interface framework. In this framework a main application window surrounds any number of internal child windows. In this case, the child windows represent workflow channels or independent threads. The channel is the main unit of processing, and one or many channels can be configured and controlled at the same time.

Multi-Threading & Processing

This program supports multiple simultaneous channels of processing, and can spread processing across multiprocessor and multi core CPU server machines. Within a channel, processing activity is divided up into small processing intervals, so that each channel is responsible for each step that it must make, as well as to cleanup problems, before the channel can perform its next step. Using a robust threading model, each channel is responsible for its own state, actions, transactions, maintenance, logging, and cleanup.

Real World Networks

This application has the ability to function in less than optimal network environments where database connections and file system connections are not always reliable. For example, this application provides options to automatically connect back into a database if the database is disconnected from the network at short intervals. As another example, this inserter system will always attempt to find, mount, and remount network volumes if they have failed, or somehow become disconnected.

Real World Reports

Sometimes, when a mainframe file is being picked up from the input folder and converted into a COLD file, or rendered into a TIFF image, the mainframe file is not well formed and has formatting errors. For example, as a typical rule, the end of each page should always be marked with a standard 0x0C, end of page, character. However in reality, sometimes, the mainframe places this character in the correct location, and sometimes it places it at random location, and sometimes forgets them altogether. Sometimes mainframe files are created that contain random zero characters in the file. This is a violation of the definition of an ASCII file, but it still happens. Problems such as these are common with COBOL language printer output.

To solve this kind of problem, inside of the Parser tab, there are options for finding page breaks by using the standard page break character, or using a token search and replace algorithm. There are also options for fixing up all of the carriage returns and line feeds, so that they will work properly with older and newer Microsoft software. This option can also replace all zero characters with space characters, insuring that the file will be a valid ASCII file. There is also an option for expanding column one for processing COBOL markers. For example, sometimes COBOL formatting markers show up in the output file as "-" characters and "0" characters and "1" characters in the first column. If you turn the COBOL marker processing option on in the Parser tab, then these characters will be expanded into the expected visual ASCII format.

System Requirements

This application requires a Windows XP or better operating system and a minimum of 512MB of available RAM in order to operate. If you are trying to run this application on an operating system older than Windows XP, or with less than 512MB of available RAM memory, and you encounter problems, then update your computer. When a channel has been configured with a work mode that converts COLD documents into TIFF documents with overlaid text, the RAM memory demands are significantly increased.

Data Formats

OptiDocXML

All TIFF files that are generated by this program contain an additional special TIFF tag number, 32933. This tag contains an XML description of the index data that was applied at the time that the TIFF file was inserted into the database. This tag number is currently under reservation by Adobe Corporation, the keepers of the TIFF specification. Advanced Technology Services has utilities and tools that work with this special TIFF tag. This TIFF tag is called the OptiDocXML tag. This tag allows partial database recovery in the event of a catastrophe. This tag is also used with a quick viewer that is burned onto self searchable media. With a rudimentary knowledge of XML, the contents of the OptiDocXML tag are self-explanatory.

<?xml version="1.0"?> 
<OptiDocRecord> 
     <Collection>UAB_HR_Records</Collection> 
     <FileName>New Channel</FileName> 
          <OptiDocField> 
              <SQL_Name>FName</SQL_Name>                                     
              <Display_Name>Fname</Display_Name> 
              <Value>Khanolkar, Aaruni</Value> 
              <Type>CHAR</Type> 
          </OptiDocField> 
          <OptiDocField> 
              <SQL_Name>Idnumber</SQL_Name> 
              <Display_Name>ID Number</Display_Name> 
              <Value>1021062</Value> 
              <Type>CHAR</Type> 
          </OptiDocField> 
</OptiDocRecord>

COLD File Format

This is a file format that is used to quickly locate pages and offsets in a multiple page text file. This is a deprecated OptiDoc™ file format. An OptiDoc™ COLD file has a small header at the beginning. This mini header is composed of the two ASCII characters ‘CC’. This can be used as a quick way to identify the binary contents of a COLD file. After the mini header comes a 4-byte integer value in Intel format that gives the number of jump table entries. Let us call this value N. After the value N comes the actual jump table. Each entry in the jump table is a 4-byte value in Intel format that gives the offset from the end of the overall cold header to the start of a particular text page. There will be either N or N-1 entries in the jump table, depending on how buggy the program was that generated the particular COLD files that you may be looking at. There should be N entries in the jump table, but be prepared to find N-1. The first jump table entry always has the value of zero. After all of the jump table entries, comes a zero byte terminated (c-style) string that is the name of the template that applies to this COLD file. The extension of the template name is ignored in all modern COLD file implementations. When writing a COLD file you should omit the template name extension. After the template name string comes a 4-byte value in Intel format that provides the size of the font that the text is to be displayed in. Let us refer to this value as F. This value is ignored in all modern COLD file implementation because this property has become an attribute of the modular overlay. After the value F, are written into the COLD file the pages of textual information. The position of the file after the value F, is the zero relative position for the jump table entries. Since the first jump table entry is always zero, and then body of the first page comes right at this position within the file.

Parse Template Files

All documents that require field information to be extracted and manipulated must have a parse template file. Parse files are stored in a folder that is created inside the same folder as the application. The name of this folder is "Parse Templates". Parse files have the extension .DAT and appear inside of this folder. Parse files are created and configured using a property pane of the channel. A parse file contains binary data that represent rules and other configuration information that describe how to handle the extraction of text information from a document. A parse file can be used to extract data from PDF documents, mainframe documents, COLD documents, or any document that can be organized into an ASCII file. Technically, the parse engine is an LL1 computer language parser that interprets command strings to produce field output. The parse engine contains many commands and new commands are easily incorporated into the architecture.

COLD Overlay Files

When COLD documents are created as the output from a channel, the channel provides an option to apply an overlay, or render a background image, for each document as a TIFF, before it is inserted into the database. As an alternative option and for backward compatibility there still remains the ability to insert the text as an OptiDoc™ COLD document, and have the workstations apply the overlay in a delayed manner at view time. There are advantages and disadvantages to each technique.

Overlay files are composite binary object files that contain the background image TIFF and all of the other parameters and information that is necessary to create the complete image of a parsed mainframe document. In this sense, overlay files are complete modules themselves, and do not need any other supporting files, or pointers to files, or names, or references to other files, in order to function.

The background image in an overlay can be a 1-bit binary black and white TIFF, or it can be an 8-bit grayscale TIFF, or it can be a 32-bit RGB TIFF. The grayscale and color TIFF images take up considerably more space than the 1-bit black and white TIFF background images. However, a background overlay that is color allows you to have color logos or colored text rendered into the overlay. Overlay files are intended to apply a background image to a single page, such as a watermark, or a single page form, so that the background is merged with the text. By changing an overlay file, the output of a channel is changed. The designer of the channel reserves the option to use overlay files or not.

Pre-Rendering At Insertion Time

When the overlay files are rendered into TIFF files before they are inserted into the database, the problem of updating all of the client applications with a new overlay is avoided, even if the input data changes. This kind of problem becomes relevant if, for example, you have an overlay named W2, that changes format from year to year and you have hundreds of users to update. The storage of documents in TIFF format is somewhat larger than the storage of documents in OptiDoc™ COLD format.

Post-Rendering At View Time

In the case where the overlay is applied on demand by the workstation, a new overlay will need to be built and deployed to all workstations every year when the new W2 data comes into use. Keep in mind that every different overlay must have a different overlay name in order to differentiate between the W2 forms from each year. This older methodology requires changes at each client workstation as well as changes to the server application. However, this method is backwards compatible with the older OptiDoc™ desktop client, and therefore this method is sometimes preferential because, when older style templates are already deployed, no changes need to be made at the client workstations.

IRF file format

This is an OptiDoc™ indexing file format that links up index data with a document. Older versions of IRF file format contained many sets of index data with many links to outside documents, and so there were many correspondences between a single IRF file and the documents to which it referred. Newer version of the IRF file format, contain exactly one IRF file per document, and the document to which the IRF file refers is assumed to be located in the same folder along with the IRF file. This change was made so that processing could be done in small chunks. Channels that are configured to process IRF files are backwards compatible with the old IRF format as well as the newer IRF formats. Here is a simple example of an IRF file document that shows the different sections and the related document. This sample should be self-explanatory.

BEGINBATCH 
BEGINSETTINGS 
IMAGE_ONLY 
IMAGE_TYPE TIF 
ENDSETTINGS 
BEGINHEADER 
Last_Name URSULLA 
First_Name GREEN 
SSN 409494864 
ENDHEADER 
BEGINIMAGE 
d1j92kiq.TIF 
ENDIMAGE 
ENDBATCH

Terms & Definitions

Channel

A channel is contained and managed by a document window. To create a new document window, or channel, for processing click on the “Create New Channel” icon in the toolbar, or use the same functionality directly from the menu. A new channel appears as a new window inside of the application. Multiple channel windows can be created, resized, moved, tiled, and cascaded within the main application window. For example, one channel can be assigned to process PDF forms that appear in a drop folder, and another channel can be assigned to process COLD downloaded as text from a mainframe. A channel can look for different types of work to do. Each type of work that a channel can work on can be turned on or off from its properties. A single channel can look for multiple kinds of work and act accordingly. In the case where multiple work types are selected for a channel, the work types are selected in a round robin fashion, skipping over types of work when no work of that type exists.

Channel Properties

A channel is associated with a large set of properties. Setting up the channel properties configures the channel. To access the properties of a channel, select a channel by clicking on the channel window in order to pop it to the foreground in the MDI interface and then press the "Properties" button in the toolbar, or choose "Properties" from the main menu. The channel properties dialog is used to configure all aspects of a particular channel. Each tab inside of the channel properties dialog box provides settings for each phase of a particular process from the first phase until the last phase. The tabs are arranged inside of the properties dialog box in linear time order of events. In other words, the first tabs represent decisions and properties that are needed at the start of channel processing, and the last tabs represent decisions and properties that are needed at the end of channel processing. Every property within every tab will be discussed later on in this document.

Log Files

When you look at a channel the window is filled with log events. The log for the channel contains date stamps and messages, and icons that indicate if a message is for information, or represents an error condition. For example, a light bulb indicates information, a green checkmark indicates a critical step, and a red triangle indicates an error. The log can be exported to a text file or cleared by choosing a menu item. The log is persistent. The log will reappear, preserved and in order on the screen, even if you launch and quit the program repeatedly. The log automatically limits itself to 1000 (the default) log entries. The oldest entries are removed after more than 1000 (the default) entries enter the log.

Input File Stability

File system folders are monitored in order to locate work for processing. A file must be stable in order for it to become a candidate for channel processing. Stability is defined by the file size, the file modification date and time, as well as an exclusive share lock, all be maintained by a candidate file for several seconds. If any of these parameters varies during the examination interval, then the file is considered to be in flux and is passed over as a candidate for channel processing. If a file passes the stability test, then it is passed into the channel for processing. Slow files systems, overlapping file systems, and Internet file systems, all make the reliable selection of stable files for processing a critical application feature.

Timeouts

This application program uses several external processes in order to do its job. Once external process converts PDF documents into TIFF documents (e.g. Ghostscript). Another external process converts PDF documents into TXT documents. The maximum amount of time that a channel will wait for an external process to complete its work is called a timeout. This amount of time is processor and job dependent. You will have to test and setup your timeouts according to how fast the computer is, and also how large your job size is going to be. The time out tab in the preferences dialog provides an environment that you can use to experiment to obtain good values for your channel timeouts. For example, a channel that regularly handles processing 5000 page documents will need more time than a channel that only handles the processing of single page documents.

Consoles

This application program uses several external processes in order to do its job. These external processes run as external consoles (e.g. DOS windows) in coordination with the channel that invokes the process. You can choose to view these console windows or not view these console windows as the channel attempts to do its work. These additional console windows can be used to view the progress and output of the console application in a window. Sometimes this is a useful thing to do in order to figure out what is actually happening within a console application via standard input and standard output messages. The user can choose not to view the console windows also. Typically the console windows are hidden.

Working Folders

This application automatically creates and manages working folders for each channel when the channel needs to use a working folder. Each channel manages its own independent set of working folders, so there is no chance of one channel using a temporary name or overwriting a file that was created by different channel. There is a temporary input folder and a temporary output folder for each channel. These folders are dynamically created inside the folder along with the application. Stable files are copied into the temporary input folder before processing begins. When processing has completed all temporary output files are held in the temporary output folder. When a channel completes a processing job, it cleans up its own input and output temporary working folders. When you delete or rename a channel, new temporary folders are automatically created, but old temporary folders are left alone. If you delete a channel, or rename a channel, then you should remember to delete the temporary input and output folders associated with the old channel. The temporary input and output folders are named with a prefix that matches the channel name, and the suffix indicates if it is a temporary input, or a temporary output, folder.

Work Modes

Look for Structured PDF Files, Parse TEXT, Insert TIFF

This is a description of one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the channel preferences dialog to instruct a channel to perform this type of processing.

In order to handle this particular form of processing you must download and install ghostscript onto the machine that is handling the PDF conversion. Ghostscript is free to use and distribute in its unmodified form. It can be downloaded from the following link:

gs850w32.exe

This particular sequence of tasks first monitors a specified folder for files with a PDF extension. When the channel finds a stable file of type PDF, it first copies the file to a temporary local location. Then it converts the entire PDF file into a multi-page TIFF. This work mode is intended to accept structured PDF files. A structured PDF file contains text and rendering information in a manner similar to the postscript language. This implies the text can be accurately extracted from the document without involving OCR. Beware that some PDF files are simply containers for bitmaps. For example a raster PDF must be emitted from a FAX server, because the FAX protocol can only transmit and receive pixel data.

Once the PDF to TIFF conversion is done, the channel converts the PDF into a memory based ASCII memory structure that is prepared for the parsing engine. Text parsing is the process where fields of index data are extracted textual information contained within the document. The rules engine is actually an LL1 language processor that can perform search operations, anchor operations, regular expression operations, tokenization operations, and compound operations to be applied to the text of the input document. Simple x,y location operations are also included; however, using anchor text and searching rules makes the collection of document fields resilient to subtle changes in the document.

Once field data has been extracted for a particular document, then the rules in the text parse engine are used to determine how many pages in the overall document are suppose to be in an individual document. Pages are concatenated together until the overall document is ready. The number of pages in a specific document can be a consistent number, or the number of pages in a specific document can vary according to a rule. This provides for the condition where a single massive input files results in many smaller individual documents.

Once an output document has been prepared the OptiDocXML tag is updated, and the final product, a multi-page TIFF of the parsed input PDF document is inserted into the database.

This mode of work is particularly useful for processing document reports delivered as PDF from Oracle, PeopleSoft, and other accounting programs into nice neat TIFF images files. The output TIFF images retain all of the visual layout consistency of the input PDF.

Look for Specific Files, Parse Text, Insert COLD

This is one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform this type of processing.

This particular sequence of tasks first monitors a folder for files with a specified extension. The specified extension edit field becomes available when this mode of work is selected. If you want the channel to only pick up files with a TXT extension (the default) then enter ".TXT" into the edit field. If you want the channel to only pick up files with a LPR extension, then enter ".LPR" into the edit field. To pick up all files regardless of the extension enter a DOS wildcard "*" into the edit field. If you want the channel to pick up all files that have the letter Q as the first letter in the extension then enter the DOS wildcard "Q*" into the edit field. It works just like a DOS shell "dir" command.

When the application finds a stable file of the specified type, it first copies the file to a temporary local location. Then it converts the entire mainframe file into a memory structure that is prepared for the parsing engine. Text parsing is the process where fields of index data are extracted from arbitrary, and inconsistent, units of textual information.

Once field data has been extracted for a particular document, then the rules in the text parse engine are used to determine how many pages in the overall document relate to an individual document. Pages are concatenated together until the overall document is made ready. The number of pages in a specific document can be a consistent number, or the number of pages in a specific document can vary according to a rule.

As an option for this mode of work a background overlay can be applied. If a background overlay is supplied then the text information will be superimposed onto a TIFF image of the background overlay. In this case a TIFF document is the output of the channel and the OptiDocXML tag is added, and the document is inserted into the database. If no background overlay has been chosen, then the text is gathered into an OptiDoc™ COLD file, and that is the completed document that is inserted into the database.

Look for IRF Files, Insert Documents

This is one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform this type of processing.

This particular sequence of tasks first monitors a folder for files with the extension ".IRF". IRF files are files that are generated by older OptiDoc™ products. The file format is a text file that contains a carriage return delimited list of field names, and a pointer to a document. The document can be of any type. When an IRF file is found then the index information is extracted from the text file, and the referring document is inserted into the database. This mode of work has been provided so that existing products that produce IRF files can be integrated into this application program.

Look for XML Files, Parse XML, insert as overlay TIFF

This is one of the possible work modes for a channel. Check the box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform XML document processing.

This type of processing has been designed to receive input documents that originate from general XML documents, and specifically from SAP systems as IDOCs. The configuration screens provide the ability to pick XML data out of complex unordered XML structures and draw them into TIFF overlay templates as forms. This type of channel allows for dynamic information to flow from the SAP system directly into the OptiDoc™ system. Template configuration is a point and click screen that lets the user map XML tags, fields, and attributes onto locations on a background TIFF image. These fields can optionally be mapped to searchable index values in the database.

Look for raster PDF, OCR text, insert as TIFFs

This is a description of one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the channel preferences dialog to instruct a channel to perform this type of processing.

In order to handle this particular form of processing you must download and install ghostscript onto the machine that is handling the PDF conversion. Ghostscript is free to use and distribute in its unmodified form. It can be downloaded from the following link:

gs850w32.exe

This particular sequence of tasks first monitors a specified folder for files with a PDF extension. When this channel finds a stable file of type PDF, it first copies the file to a temporary local location. Then it runs an OCR engine on the raster data contained within the PDF document. Therefore, OCR creates the text data that is forwarded into the text parser. This mode is intended to work with raster PDF files. In contrast, a structured PDF file contains text and rendering information in a manner similar to the postscript language. Since OCR must be used to extract the text from the raster bitmap, the result may be slightly inaccurate. This mode is intended to work with computer generated raster PDF files. For example, these kinds of PDF files might come from a FAX server, since the FAX protocol only allows the transmission and reception of pixel data.

Text parsing is the process where fields of index data are extracted textual information contained within the document. The rules engine is an LL1 language processor that can perform search operations, anchor operations, regular expression operations, tokenization operations, and compound operations to find fields in the text of the input document. Simple x,y location operations are also included; however, using anchor text and searching rules makes the collection of document fields resilient to subtle changes in the document.

Once field data has been extracted for a particular document, then the rules in the text parse engine are used to determine how many pages in the overall document are suppose to be in an individual document. Pages are concatenated together until the overall document is ready. The number of pages in a specific document can be a consistent number, or the number of pages in a specific document can vary according to a rule. This provides for the condition where a single massive input files results in many smaller individual documents.

Once an output document has been prepared the OptiDocXML tag is updated, and the final product, a multi-page TIFF of the parsed input PDF document is inserted into the database.

Look for TIFF, OCR text, insert as TIFFs

This is a description of one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the channel preferences dialog to instruct a channel to perform this type of processing.

This particular sequence of tasks first monitors a specified folder for files with a TIFF extension. When it finds a stable file of type TIFF, it first copies the file to a temporary local location. Then it runs an OCR engine on the raster data contained within the TIFF document. Therefore, OCR creates the text data that is sent into the text parser. Since OCR is used to extract the text from the bitmap, the results may be slightly inaccurate. This mode is intended to work with computer generated raster TIFF files.

Text parsing is the process where fields of index data are extracted textual information contained within the document. The rules engine is an LL1 language processor that can perform search operations, anchor operations, regular expression operations, tokenization operations, and compound operations to find fields in the text of the input document. Simple x,y location operations are also included; however, using anchor text and searching rules makes the collection of document fields resilient to subtle changes in the document.

Once field data has been extracted for a particular document, then the rules in the text parse engine are used to determine how many pages in the overall document are suppose to be in an individual document. Pages are concatenated together until the overall document is ready. The number of pages in a specific document can be a consistent number, or the number of pages in a specific document can vary according to a rule. This provides for the condition where a single massive input files results in many smaller individual documents.

Once an output document has been prepared the OptiDocXML tag is updated, and the final product, a multi-page TIFF of the parsed input PDF document is inserted into the database.

Look for TIFF, XML headers, insert as TIFFs

This is a description of one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the channel preferences dialog to instruct a channel to perform this type of processing.

This is the disaster recovery mode for rebuilding a system when only TIFF data files are left. Meta data is stored within an XML block inside a TIFF tag, whenever a TIFF document is inserted into the database. This mode will insert TIFF documents into a collection and use the bound XML meta data for the index fields. Please remember to make sure the embedded XML field data matches the collection data in name and type.

Look for TIFF, OptiCapture, inserts documents:

This is one of the modes of work that a channel can do. This channel captures documents that have OptiDoc PDF417 coversheets as separators. This channel type is only available if the optional "OptiCapture" DLL module has been purchased and installed. To install the OptiCapture module, simply place the DLL module into the application folder along with the application executable module.

Check this box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform this type of processing. This particular sequence of tasks first monitors a folder for files with the extension ".TIF".

Opening the Report Server

The first time that the Report Server is opened, the following message will appear.

The reason for this is that the Report Server is looking for its preference file which has yet to be created. To create the file, simple close the Report Server. The preference file will be written to the same directory where PDFINSERTER.exe has been installed.

The Menu Bar

File Menu

Below are listed the options available under the file menu.

Note: Until at least one channel has been created, the only options available are “New Channel” and “Exit”.

New Channel: Creates a new Channel in the Report Server

Delete Channel: Gives the administrator the option to delete the selected Channel.

Edit Channel: Opens the Channel properties for the selected Channel.

Start Channel: Gives the command to the selected channel to start monitoring the work folder.

Stop Channel: Gives the command to the selected channel to stop monitoring the work folder.

Clear Log: Clears the selected Channel’s log

Save Log
Saves the selected Channel’s log to a text file

View Menu

Toolbar: Toggles the Toolbar on and off. A check indicates that the toolbar is on.

Status bar: Toggles the Status Bar on and off. A check indicates that the Status Bar is on.

Window Menu

Cascade: Cascades Channel windows so that the title bar for each Channel is viewable

Tile: Tiles each Channel window within the Report Server

Arrange Icons: Arranges any minimized Channels at the bottom of the Report Server

Note: Any Channels created will also be listed in this menu. A check mark will be displayed beside the selected Channel.

Help Menu

Technical Documentation: Displays the Report Server help file.

About Report Server: Displays the product information.

The Tool Bar

There are five buttons available with the tool bar.

Note: Until at least one Channel is created, only “Create New Channel” and “Info” are available.

Create New Channel
Creates a new Channel in the Report Server

Edit Channel Properties
Opens the Channel properties for the selected Channel.

Start Active Channel
Gives the command to the selected channel to start monitoring the work folder.

Stop Active Channel
Gives the command to the selected channel to stop monitoring the work folder.

Displays the Report Server help file.

Setup Tab

Enter the name of your channel:

Type the name of the channel into this edit field. The name of a channel can be anything that helps you to remember what the channel is used for.

Click below to choose a background color for your channel:

Click the color bar to choose a color for the channel. The background of the log view for the channel as well as the background color for the text that appears inside of the log view can be set. This is a simple and visual way to differentiate the various channels on the screen.

Set this value to limit the amount of log file entries that are stored:

Enter a number into this edit field. This is the maximum number of log file entries that will be stored by the log view for the channel. The oldest entries will get removed after this maximum number of entries is accumulated into the log view for a channel.

Work Tab

Specify the full drive path or UNC path of a folder or network folder to monitor:

This channel will look for work in the folder that is indicated here. You may enter a local drive path or a network UNC path into this edit field. This path must resolve in order for the channel to look for work in this location. A file must be stable inside of this input folder before it will be considered to be a candidate for work.

Work Mode(s)

Each channel can process one or several work modes at the same time. You can check on or off a particular mode of work by checking or clearing the checkbox in the scrolling list of available work modes. Each mode of work is described in detail in the definitions section above.

specify (e.g. file extension)

When one of the chosen work modes is "Look for specific files, parse text, insert COLD" then you can specify the extension of the file(s) to look for. Standard DOS wild card characters are supported.

How often should this channel look for work to do?

When a channel runs it has a choice of settings for how often the input folder should be polled for work. It can be polled once per hour, at the top of each hour. Or it can be polled once per day, at a specified time. It can be polled in a constant manner. Constant polling requires more system resources because the channel is constantly scanning the file system for work to do.

Specify path to GNU Ghostscript

In order to process PDF documents into TIFF documents GNU Ghostscript must be installed on the workstation. The full path to the executable console application gswin32c.exe must be provided in this edit field. Currently the most updated version of GNU Ghostscript is version 8.14. This module is launched and managed by a channel that needs to perform operations on PDF files.

Show Console Windows for testing conversion engines

When PDF to TIFF conversion is taking place the channel manages an external console process. If the console window is turned on then you will see a black console window pop up to the top of the screen every time the channel invokes that particular external console process. If the console window is turned off, then you will not see any console window while the channel manages the process. Being able to see the console and the output to the console window can assist you in debugging problems that might arise. This section gives you the ability to turn on or off the console windows for the PDF to TIFF conversion as well as the PDF to TEXT conversion.

Timeouts Tab

PDF To TIFF Maximum Timeout Seconds

This is the maximum number of seconds that the channel will allow the external console process that converts PDF documents into TIFF documents to run before shutting it down. This timeout value is very specific to the size of the jobs being run on the channel and the speed of the workstation that this application program is being executed on.

Testing Timeouts – PDF To TIFF

If you place a valid file in the input folder for this channel and then press the "Test" button on this properties tab, the console application will attempt to perform the specified conversion within the specified timeout period. This dialog tab provides a way for you to determine a valid timeout setting by simply testing the time it takes to process a set of representative documents. You can temporarily enable or disable the console output of the external processes while you are testing to determine a good value for the process time out.

Show Console

This is a temporary way to view or not view the console window while you are testing timeouts. These settings obtain their initial values from the console settings in the work tab, but can be changed at will with out affecting the real console setting values.

PDF To TEXT Maximum Timeout Seconds

This is the maximum number of seconds that the channel will allow the external console process that converts PDF documents into TEXT documents to run before shutting it down. This timeout value is very specific to the size of the jobs being run on the channel and the speed of the workstation that this application program is being executed on.

Testing Timeouts – PDF To TEXT

If you place a valid file in the input folder for this channel and then press the "Test" button on this properties tab, the console application will attempt to perform the specified conversion within the specified timeout period. This dialog tab provides a way for you to determine a valid timeout setting by simply testing the time it takes to process a set of representative documents. You can temporarily enable or disable the console output of the external processes while you are testing to determine a good value for the process time out.

Login Tab

DSN

This is the name of a system DSN, or Data Source Name, that the current channel will use to establish the connection to the database. Use the ODBC control panel in administrative tools to setup a Data Source Name.

User

This is the name of a an OptiDoc™ user account that the current channel will use when it establishes a connection to the database. All OptiDoc™ user permissions are acquired from the user permissions and collection combination for the specified user account.

Pwd

This is the password for the specified OptiDoc™ user account.

Collection

Once the login has completed, the collection dropdown list is filled in, and the current collection is selected from the dropdown list. Choose a collection from this list to tell the channel the name of the collection that the channel will be working with.

Login and select collection

Click on this button to test the DSN, user name, and password that you have just entered into the login tab of the properties dialog. This action may take a few minutes to complete, depending on how many collections and permissions the chosen OptiDoc has.

ODBC Timeout

This is the maximum amount of time in seconds that any particular SQL transaction is allowed to take. If any SQL transaction takes more time than this specified amount, then an ODBC timeout error will occur and the connection will be rejected. The default time out value is 15 seconds.

Creator ID

This is a marker value that is inserted into the main database table that associates a particular application, or a particular workstation and application with an identification number. This ID is used with every record that is inserted with this application, and can be unique to each channel. This feature can be used to identify which application has created a particular record, or which workstation, or which channel. The default ID for any channel in this application is 2005.

Parser Tab

Template

The text parser interpreter has to do with obtaining text fields in a document. The parser interpreter has the ability to locate text document fields by absolute positions, or by highly flexible rules. All of these details are stored inside a single binary file. This binary file is referred to as a parser template file. The drop down menu shows a directory of all available template files that exist inside of the "Parse Templates" folder. This folder exists inside of the application folder. All licensed versions of this software can import parse templates. In "editors enabled" versions of this software, parse templates can be created and edited.

The parse template file contains the set of rules that are fed into the parse language interpreter to determine what database field shall be filled in with what values of text. The input text are usually derived from the input mainframe pages. Use this dropdown menu to choose the parse template that you wish to assign to the channel. Please note that none of the controls in this pane will be available for use, unless a template has been chosen.

To create a new parse template, first choose the name "NEW TEMPLATE" from the dropdown list. This action will bring up a dialog box that will ask you for the name of the new parse template. This name can be any descriptive name that you desire. Once you have given your parse template a new name, the parse template editor will open up and expand to take over the entire screen. If you want to edit an existing parse template, first choose the name of the parse template from the drop down and then click on the "Edit…" button that is located immediately to the left of the "Import…" button. This action will bring up the parse template editor. If you do not have the "editors enabled" version of the software, then you can only import parse files into the system. You must have the "editors enabled" version in order to create or edit parse files. You can install parse files by simply copying parse files into the "Parse Template" folder. You can remove parse files by simply deleting parse files from the "Parse Templates" folder.

Import Template

Click this button to import a parse file into the "Parse Templates" folder. This action simply copies the parse file into the parse templates folder. To export a parse file, simply copy the parse file out of the "Parse Templates" folder. Please keep in mind that a parse file is a binary file, and if you send the file using FTP to another user, the FTP transmission must be done in binary mode.

Edit Template

If you have the "editors enabled" version of this software then you can create and edit your own parse templates. If you do not have the "editors enabled" version of this product then the edit button will always be grey. When you click on the edit button the entire screen is taken over by the parser generator user interface. Please refer to appendix A at the end of this document for details regarding the use of the parse template editor and the parser language itself.

Page Break Detection

A stream of ASCII text may or may not have reliable and normal page break indicators. Typically, a page break is indicated in a stream of ASCII characters by introducing the character with the hex code 0x0C. The translation of this character according to the ASCII tables tells us that this character is intended to represent a page break. This is the generally accepted standard. Under most circumstances the page break character is the best and most useful way for this application program to understand when a break between pages has happened.

If the input text stream has come from the integrated PDF to TEXT conversion engine then standard page break characters will always be included at the proper locations in the text. However, many mainframe applications do not place page breaks at regular locations and sometimes place erroneous page breaks inside of the text stream where they do not belong. A common mistake made by mainframe report programmers is that they often include the zero 0x00 character into the output of a text stream. Obviously, this is an illegal character in an ASCII stream, but it happens anyway. The quality of mainframe reports is non-standard and subject to the abilities of the mainframe programmer.

This application program does its best to work with badly formed and poor input files. In the case where page breaks are not properly indicated by page break characters, then the user of this application can specify a string of characters that will be replaced with a page break character. These values are used internally by the text processing engine to preprocess the input text so that proper page breaks end up in the proper places.

Standard Page Break Detection

Typically an input text stream is divided up into pages by scanning for page break characters. Good input documents will have page break characters in proper locations. Use this option for processing input documents that have standard page break characters setup properly within the documents.

Scan For Characters

If the input stream has invalid, non-existent, or randomly distributed page break characters, then this option will provide a setup for proper page breaks. You must find a sequence of characters, or a hexadecimal sequence of characters that will denote the beginning of each page in the document.

Multiple marker sequences can be used by entering more than one marker sequence into the field with a comma separating each sequence. If the body of the sequence includes a comma, then enter two consecutive commas, (as an escape sequence), and they will be treated as one single comma within the marker sequence. The page detection algorithm will look for the specified sequence of characters and will place a page break character in front the specified marker sequence. If the specified marker sequence is detected at the start of the document, then a page break character will not be inserted.

Remember that you can have more than one marker sequence if you separate each marker sequence with a comma. This option removes spurious page break characters in addition to placing proper page break characters into the marked locations. Hexadecimal strings of characters can be used, so that new line characters, form feed characters, and other nonprintable characters can be used as the page break marker sequence. For example, the hexadecimal marker string

0x5041474520303031

represents the marker string 'PAGE 001'. Please note that hexadecimal marker strings must be an even number of characters in length and must start with the prefix '0x'. If you do not start the marker string with the pefix '0x ' then the marker will be used in plain text form. For example, the marker string FReD simply represents the marker string 'FReD' . Marker strings are always case sensitive. As another example, the marker string

0x5041474520303031,FReD

will replace the words PAGE 001, or the words FReD, with a page break character.

Count Pages

When the "count pages" check box is checked in the parser tab of the properties dialog for a channel, the parse engine will count pages inside of the larger overall document in order to split the overall document up into smaller database documents.

It is often true that mainframe reports have exactly the same number of pages in the overall document for each of the database documents contained inside of the overall document. For example, a financial check application report may always contain exactly three pages per database document. The first page might be the image of a check, and the following two pages always contain some kind of a forms based supporting documentation. Therefore, if a report is being processed that always contains three pages per document, then the parser can figure out how many pages there are per document by simply counting up to three pages, extracting those three pages, and using those three pages as the document to insert into the database.

First Page Detection Rule

When the "first page detection rule" check box is checked in the parser tab of the properties dialog for a channel, the parse engine will apply the specified rule to determine how many pages are in one of the database documents.

It is often true that mainframe reports have a variable number of pages per document, and the number of pages per document changes dynamically throughout the overall report because of the amount of information that is included in a specific document depends on the amount of information that is available for that document.

It is for this reason that Human Resources reports tend to be variable in the number of pages that they process. This is because Human Resources documents tend have a variable number of attachments. This problem is overcome by specifying a special parse language rule that is used to detect the first page of the document. This application program then reads pages out of the overall input document and continues to accumulate those pages into a database record until the first page of the next document is reached.

Essentially, when the first page of the next document is recognized by the parse rule engine, the application backs up by one page, and inserts the remaining database record. Using a rule for detecting the number of pages per document will work with mainframe reports that have a variable number of pages per database record.

However, in this case, the mainframe report is required to possess some kind of a static title or header or marker that can be utilized in order to recognize the first page of a document. Often times the header will have a page count that can be compared against the number "1". The following is an example of a sample parse rule that will break input mainframe documents apart into database records of variable length based on the page number section in the header.

First Page Detection Rule: LOOKFOR 'Page#' GETNUMBER

Is Equal To: 1

This rule breaks up overall spool documents into documents when each document is composed of a variable number of pages. It will scan the input page for the case sensitive text "Page#" and then it will skip forward to absorb the following number and see if that numbers is one. Therefore, correct variable number of page documents can be recognized from the overall spool document.

The above example shows one way to use a parse rule to determine the beginning of a document.

Preprocessor

When mainframe documents are being processed, several optional preprocessors are available. These preprocessors take effect after the raw page data has been obtained from the overall mainframe file, and prepares or interprets the contents of the page.

These features provide several options for making mainframe files compatible with other expected file formats, such as COLD. When you are building a parse template, you must define rules for finding the data that will be used for the database fields. An important concept to keep in mind is that the format of the text that you see when you are using the parse template designer is the same as the format that is eventually inserted into the database. This has been done to make the parse template WYSIWYG, and more intuitive, and to allow for relative as well as absolute coordinate positioning.

For example, if your input mainframe file contains COBOL markers, and you choose to expand these markers, then the text that you will build your parse rule for will have COBOL markers expanded. As another example, consider the case where you have no anchor to attach a "find" type of parser rule. This might happen if the mainframe file contains only "filled in" data, with no other markers or tags, that in the past were printed on preprinted forms, such as UB92's. In this case, only absolute positioning can be used to locate field data. Relative anchors using the "find" type of parser rules are always preferred; however, the WYSIWYG approach allows for both relative and absolute positioning.

Fixup CRLFs

Sometimes the carriage return and line feed characters from a mainframe file are incompatible with COLD documents. If you choose the fix up the carriage return and line feeds, then each set of one carriage return, or line feed, or pair of carriage return line feed, or pair of line feed carriage returns, are converted into the normal form of carriage return followed by a line feed.

This is necessary so that document viewers that use a particular "feature" of the Microsoft API CEdit control will work, and so that when the page is drawn another particular "feature" of the Microsoft API DrawText function will also work. This feature might be necessary for backwards compatibility with older applications that make certain assumptions about the exact functionality provided by the Microsoft APIs. This feature also fixes the problem where random zeros have been injected into the input mainframe file. All zero's are replaced by space characters.

Interpret COBOL Markers

Sometimes mainframe data will come from a COBOL program. It is often the case that a single column on the left hand side of the document is a control column and is *NOT* used to provide text information, but instead, is used to provide formatting information for the text. This feature expands this COBOL control column into text data, so that the formatting of the COBOL file will become a visual part of the ASCII text, via standard print and view COBOL drivers. For example, a "-" in the first column is interpreted as a CR and a LF, another CR and LF, then a space, and does *NOT* mean to print an actual "-" in column one. A "1" in the first column is interpreted by the COBOL screen and print driver as a CR followed by a LF followed by a space. A "0" in the first column is interpreted by the COBOL screen and print driver as a CR followed by a LF followed by a space.

ICE COLD

To keep backward compatibility with an older OptiDoc format called the "ICE" format, the checkbox "ICE COLD" has been added. The effect of this checkbox is to keep the IRF data that appears at the top of an ICE file visible for recognition by the parser rules engine, but makes the IRF header disappear when making COLD files, or placing overlays. An ICE file one large report of IRF header(s) followed by COLD data. This checkbox should be turned on for processing the CSS and DMS feeds.

Skip First N Lines

Sometimes mainframe data will contain extra header data at the beginning of the overall document. This option strips off the specified number of lines from the first page of the overall document. Turn this check box on if you want to remove lines of data from the start of the overall mainframe document.

Overlay Tab

Apply The Specified Overlay: (COLD->TIFF) into database

If you are using a work mode that creates COLD documents as an end result, then you will have the option to turn on and apply an overlay file to the COLD document to insert as a TIFF, or to simply insert the COLD document as generalized ASCII.

The TIFF option gives you the ability select and applies a background image where the text is drawn onto the image at the proper locations, much like filling in an empty form. The final output is a TIFF document with a background, possibly of a form, and text that is filled into the background. A TIFF background file can be black and white, grayscale, or even RGB color.

If you choose to work with color TIFF background files, then you can also control the color of the text that is written into the final document. Please note that the visual appearance of a black and white or grayscale form that is filled in with red or blue text is very nice for customers to look at and work with. Overlay output files are always compressed. Black and white overlay TIFF files are compressed using the standard G4 compression scheme. Grayscale and color overlay files are compressed using a Macintosh pack bits compression algorithm. A single page 8.5 X 11 300.00 dpi color overlay file is about 800K after pack bits compression. This is compared to a single page 8.5 by 11 300.00 dpi black and white overlay file is only about 70K after G4 compression. Therefore, the tradeoff for color is of some consequence.

Edit

If you have the "editors enabled" version of this software product then you can create and edit your own overlay templates. If you do not have the "editors enabled" version of this product then the edit button will always be gray. When you click on the edit button the entire screen is taken over by the overlay generator user interface. Please refer to appendix B at the end of this document for details regarding the use of the overlay template editor.

Overlay Name: COLD in database

If you have a work mode that creates COLD documents as an end result, and you want to insert actual old style OptiDoc COLD documents into the database, then use this option. When this option is in effect, each COLD document that is inserted into the OptiDoc database will have, imprinted inside of it, the specified overlay name. The specified overlay name is used by some client applications to convert a text based COLD document into a form, or an image with a background. This feature is residual and is employed for backwards compatibility.

XML Parser Tab

XML Templates

The XML parser tab provides the ability to create new XML overlay templates. An XML overlay template is a binary data file that ends with the .XDAT extension. These files are kept in the XML Templates folder. This folder is automatically created inside the same folder with the application. You can easily create an overlay template on one computer and send it over to a remote machine. Once the .XDAT file has been created on your local machine and then copied into the remote machines XML Templates folder, setup the new channel by adjusting the channel input, output, and login parameters. Once you have the system specific parameters configured, you have a new XML channel up and running on a remote machine.

Importing XML Templates

The import button brings up a standard find file dialog and asks for the location of an XML template to import. Once an XML template has been chosen with this dialog, the XML template is simply copied to the XML Templates folder along with the application. The same thing can be achieved by just copying the XML template into the XML Templates folder. The drop down menu shows a list of XML templates that are found within the folder.

Editing XML Templates

Editing an XML template is very similar to editing any other overlay template. When you choose the Edit… button a screen opens up and takes over the entire monitor. A larger monitor is needed for working with these kinds of overlays, because there is so much information that must be presented. At the upper left hand corner of the dialog box is the Login button. You must login and have a valid database connection before you can begin to configure and XML template.

XML Mode vs TIFF Mode

There are two modes to keep in mind when laying out an XML template. In the TIFF mode the image on the right hand side of the screen shows the current TIFF background image with the data rendered onto it. In the XML mode the image on the right hand side of the screen shows an XML parser where all of the data nodes are displayed. Notice that nodes are enclosed inside of other nodes in a recursive fashion. On the left panel are two buttons TIFF MODE and XML MODE that shift you from one mode to the other. When you are in TIFF MODE you can manipulate text around by dragging to locate a position within the overlay. You can also move elements around by adjusting their absolute positions in pixels. The Load Tiff File button is used to add the overlay TIFF into the overlay. Understand that the actual TIFF image that is used in the overlay is contained inside of the overlay data file, along with everything else that is needed to create the overlay. When you are in XML MODE you can click on XML data nodes that appear inside of the parsed data screen. When you click on an XML data node, the ID and the KEY for the data node are populated into the fields on the left. You use the XML MODE to locate the XML data that you want to position on the TIFF image, and to examine the organization of the XML document.

Color vs Monochrome Overlays

An TIFF overlay can be a compressed black and white TIFF file, or it can be an uncompressed RGB TIFF file. When building an overlay it is sometimes convenient to use a RGB TIFF files for the background, so that you can set the drawing font to a bright color. Another advantage to using an RGB TIFF for working up the overlay is that the text is rendered much cleaner. This process makes it easy to distinguish the text that you are working on versus constant text that is on the overlay. Once you have finished the layout, then you can load a compressed black and white TIFF back into the overlay to reduce the size of the overlay file.

Loading An XML Sample

Once you have set the background TIFF for your XML overlay, the next step is to choose an XML document that will represent the data that will be delivered to the channel feed. This is done by clicking on the LOAD XML FILE button. This action brings up a standard get file dialog box. After you have chosen an XML file, the file will be parsed into logical XML nodes and the system will shift into XML mode. This lets you see all of the XML nodes and fields and attributes that you will be working with. When you click on a section in the XML overlay parser, that sections ID and KEY are automatically put into fields on the left hand bar.

Laying Out A Field

To lay out a field onto the overlay screen you must first choose the field from the drop down list of fields. Initially this drop down list is populated with SQL field names from the database and collection that you have chosen. If you use a field with one of these names the field that is captured will be sent to the database as index data. In order to place data onto the form that is not sent to the database as index data, you click on the ADD button and create a new field with a new name that is unique. Fields of this type are drawn onto the overlay, but are not sent into the database.

Field Types

Conceptually these are the different types of fields that can be configured.

  1. A configured field can be picked up from the XML and not drawn into the overlay, but still inserted into the database as an index field.
  2. A configured field can be picked up from the XML and drawn into the overlay and also inserted into the database as an XML field.
  3. A configured field can also be a rule type field. Rule field types are usually written onto the overlay and not inserted into the database. The primary use of this field type is to format line items onto an invoice overlay.

To associate a sample XML document with the configuration screen, click on the Load XML File button on the left side bar. When you associate a sample XML document with the configuration screen the XML is parsed into a tree of XML node data. You can click items in the tree to select data nodes in the XML tree. When you click on an XML tag structure from the XML MODE window, the ID and the KEY for that structure are automatically attached to the current field. The IDs and KEYs are created by parsing the XML document and indicate the path that the XML evaluator uses to pass through the XML document to reach the value.

The KEY values are important to understand. They are formed using a dot notation that is composed of tag names separated by periods. Each tag is traversed from left to right. The corresponding XML data node is looked up according to the tag name. This specifies a path way from the root of the XML document down to any XML node or XML leaf node. If the XML data node that you have selected resolves to a text value then the XML key identifies the path for how to find that text inside the XML document. If the XML data node that you have selected resolves down to another XML data node, that may contain many other XML data nodes, the resolution of that KEY will be the innerXML for that data node. In other words, a partially specified KEY will resolve to all of the XML data nodes that exist under that node. Most of the time, we want to work with a KEY that resolves down to a single XML data leaf. When you click on the XML parse window the KEY is automatically put into the fields on the left side bar, so that when that field is evaluated, the KEY is used to locate the XML leaf data node. To see how this works, load up and XML document, enter into XML mode, and click on different sections of the XML document and see what KEY values are generated.

Leaf Node Field

A leaf node value is an XML data value that evaluates to some text.


ZINVOIC02.IDOC.E1EDP01.E1EDP19.KTEXT

This KEY says to look inside the ZINVOIC02 XML node, then look inside the IDOC node, then look inside the E1EDP01 node, then look inside the E1EDP19, and finally look inside the KTEXT node for the text. Based upon the standards based definition of XML is not possible that a fully specified XML path can be repeated inside of an XML document.

IDOC XML does not conform to the official standard XML data format because it has many node paths, that have the same path qualifications, but that give back different leaf node data

In order to make this kind of XML work, special array notation has been added to the KEY descriptors. This provides the XML parser with the ability to work with many different non standard XML document formats

For example, if there happen to be two ZINVOIC02 xml data nodes in the XML document, and we wanted the data from the second node, then a KEY that would evaluate to the desired text would be as follows.


ZINVOIC02[1].IDOC.E1EDP01.E1EDP19.KTEXT

Please note, that the array notation is zero based. Also notice that for the very first node, or the node at index zero, no array notation is used. This is because standard XML does not need the array notation.

In other words, do not use the special case for a zero based index such as ZINVOICE[0].IDOC.E1EDP01, but instead use the proper notation ZINVOICE.IDOC.E1EDP01.

As another example, suppose we wanted to find the third KTEXT in the second ZINVOICE02 node. The KEY that would evaluate to the desired text would be as follows.


ZINVOIC02[1].IDOC.E1EDP01.E1EDP19.KTEXT[2]

When you click on any XML leaf node from the XML MODE window the KEY for that node is automatically associated with the selected field from the left side bar. If the selected XML node is a repeat, then the array notation will be used to display with the proper index value.

Constant IDOC Return Values

IDOC-CONSTANT-VALUE:

This is command returns a constant value for an assigned database field. The following example would always return the constant value 'INV'.

Creating IDOC Line Items

IDOC-CONSTANT-VALUE:INV

IDOC-LINE-ITEM:

This is an example command that works for a certain military SAP IDOC document.

IDOC-LINE-ITEM:(8,5,16,44,11,16,13,13,10,92,11)

This rule collects invoice line item columns into a single row, or into multiple rows, as needed, and positions the columns within the rows according to the padding values passed in to the rule. The type and order of the columns are fixed by the rule, but the spacing between the items is variable and is controlled by the numeric values in the parameter. This rule allows a hard coded set of complex items to be found within the IDOC and accurately positioned within the inconsistent overlay forms. For this special rule the number of parameters determines the specific pattern of data that is generated. Only certain numbers of parameters are allowed, and the number of parameters selects the data pattern according to the layout of the generated document.

This rule has been specifically created to handle a particular class of SAP IDOC documents. This rule applies for each line item at the array [] notation, and is further scanned at each line item by the array [i] notation.

For example, the following algorithm describes the line item rule for the pattern with 11 parameters, and how the individual items are discovered and positioned within the line.

  • ZINVOIC02.IDOC.E1EDP01[].MENGE, is aligned on the left and right padded with spaces to the 1st parameter
  • ZINVOIC02.IDOC.E1EDP01[].MENEE, is aligned on the left and right padded with spaces to the 2nd parameter
  • ZINVOIC02.IDOC.E1EDP01[].ZE1EDP01.TDUMATN_EAN, is aligned on the left and right padded with 0's to 14 characters
  • Two spaces are always inserted into the line here. This is because of the alignment change from the previous column.
  • ZINVOIC02.IDOC.E1EDP01[].ZE1EDP01.TDUMATN_EAN, is then aligned on the right and padded with spaces to the 3rd parameter. Please note that the third parameter must be greater than 14.
  • Find the ZINVOIC02.IDOC.E1EDP01[].E1EDP19[i].QUALF, whose value is '002', then get that nodes ZINVOIC02.IDOC.E1EDP01[].E1EDP19[i].KTEXT, this is aligned on the left and right padded with spaces to the 4th parameter.
  • Find the ZINVOIC02.IDOC.E1EDP01[].E1EDP26[i].QUALF, whose value is '001', then get that nodes ZINVOIC02.IDOC.E1EDP01[].E1EDP26[i].BETRG, this value is converted into money format, then this value is right aligned and padded with spaces to the 5th parameter.
  • One space is always inserted into the line here.

Here is where the initial promotion codes and allowances enter into the first logical line item.

  • Look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KSCHL, whose value starts with 'Z', then get that nodes ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].ZE1EDP05.KNUMA_AG, if this value is empty then get that nodes ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KSCHL, then this value is left aligned and padded with spaces to the 6th parameter
  • Look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KRATE, if this value is empty then look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KPERC, take the negative sign from the front of this value, then look for ZINVOIC02.IDOC.E1EDP01[].E1EDP26[i].QUALF whose value is '001', then get ZINVOIC02.IDOC.E1EDP01[].E1EDP26[i].BETRG, then return the value of KPERC times the value of BTERG as a percentage and put the negative sign back onto the front, then this value is left aligned and padded with spaces to the 7th parameter. However if ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KRATE has a value then this value is left aligned and padded with spaces to the 7th parameter.

Now we continue on with the final two values in the first logical line item.

  • Look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KOTXT, whose value is 'NET VALUE', then look inside that node for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KRATE, format this value as money, then this value is left aligned and padded with spaces to the 8th parameter
  • ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KOTXT, whose value is 'NET VALUE', then look inside that node for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].BETRG, format this value as money, then this value is left aligned and padded with spaces to the 9th parameter

At the end of the first logical line item a carriage return is inserted.

Now we continue on with the remaining logical line items. Each additional logical line item is followed with a carriage return. Logical line items are appended onto the initial line item in a loop.

  • Look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KSCHL, whose value starts with 'Z', then get that nodes ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].ZE1EDP05.KNUMA_AG, if this value is empty then get that nodes ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KSCHL, then this value is left aligned and padded with spaces to the 10th parameter
  • Look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KRATE, if this value is empty then look for ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KPERC, take the negative sign from the front of this value, then look for ZINVOIC02.IDOC.E1EDP01[].E1EDP26[i].QUALF whose value is '001', then get ZINVOIC02.IDOC.E1EDP01[].E1EDP26[i].BETRG, then return the value of KPERC times the value of BTERG as a percentage and put the negative sign back onto the front, then this value is left aligned and padded with spaces to the 11th parameter. However if ZINVOIC02.IDOC.E1EDP01[].E1EDP05[i].KRATE has a value then this value is left aligned and padded with spaces to the 11th parameter.

Multi Page Documents

IDOC-PAGE-NUMBER:

This rule is related to the IDOC-LINE-ITEM rule. When a field is created with this rule then the evaluation will be an indicator showing the current page and the total number of pages. The current page value will only increment during document creation, and will always show as "PAGE 1 of N" from within the editor. In order to calculate the total number of pages, then the number of line items per page must be specified. The default number of line items per page is 20 if this rule is not specified.

IDOC-PAGE-NUMBER:(40)

This rule value tells the IDOC parser what is the maximum number of lines that can be used per page. This rule is needed if the IDOC-LINE-ITEM rule is also being used. This value will depend on the chosen font and font size and report layout. Since each line item can be made from a varying number of lines, this variable is needed in order to format the output line items. With this value the IDOC parser can know how many lines fit on each page, and how many lines are in each line item. This information is needed in order for the IDOC parser to split line items between report pages, so that each line item is complete, and all of the lines that make up that line item, are printed on the same report page.

Calculating Sums

IDOC-SUM-VALUE:

The IDOC-SUM-VALUE: rule is used for finding all XML nodes with a given KEY using the array notation. All of the values that are located using this KEY and array notation are summed together for a grand total. The values that each KEY resolves to must be an integer number, and not a word, or a word representation of a number. The final token value is the grand total of all the XML data elements that match the array TAG.

SAP IDOCs

IDOC-RESOLVE:

The IDOC-RESOLVE: rule is used for dealing with SAP's variation of an XML document. Rules of this type scan a main XML node in order to determine the numeric index that is associated with a special qualification value found inside a child node. This type of rule causes searching action to take place every time a node is accessed and is therefore both inefficient and cumbersome. This type of rule allows one to resolve an array index that is qualified by a match to a child node scanning operation into the actual decimal index. The syntax of this rule puts the qualification criteria inside the brackets of the array notation. When the rule runs it scans inside the XML document to locate and resolve the actual numeric index. If an actual decimal number is found within the brackets of an array notation, then that index is used and no resolution takes place.

The following sample rule is an example of the evaluation rule for finding the invoice number in a certain SAP XML IDOC invoice. This rule scans over all repeating nodes at location ZINVOIC02.IDOC.E1EDK02 until a child node called QUALF can be found that contains the value "009". It then returns the value of the BELNR child node at the determined index. Optional quotes or ticks can be used when the value contains a space.

IDOC-RESOLVE:ZINVOIC02.IDOC.E1EDK02[QUALF="009"].BELNR

The IDOC-RESOLVE: rule can be used to formulate a left to right cascade of evaluations that stops when something is found. This is accomplished by putting evaluation terms together, and separating them with commas. The following rule will first determine if ZINVOIC02.IDOC.E1EDK02[QUALF="009"].BELNR evaluates to some value. If the value exists then this value is the return value for the rule. If this value does not exist, then IDOC-RESOLVE:ZINVOIC02.IDOC[XJ="12"].CODE is evaluated, etc.

IDOC-RESOLVE:ZINVOIC02.IDOC.E1EDK02[QUALF="009"].BELNR, IDOC-RESOLVE:ZINVOIC02.IDOC[XJ="12"].CODE

Please note that formatting attributes can be applied to each term in an IDOC-RESOLVE: rule.

Formatting Attributes

The double colons :: located at the end of a KEY indicate special formatting options. If additional options are needed they are added to the end of each XML data KEY. Special options also function properly when the XML data key is part of a rule.

Carriage Return

For example, the special option CR inserts a carriage return into the evaluated xml node value.

Pad & Chop Left Align

For example, the special option PCL(50) indicates a pad and chop value of 50. A pad and chop value insures that an element is left justified within a certain number of characters. If the actual value is longer than the specified value it will be truncated. If the actual value is shorter than the specified value it will be left padded. This functionality is used to make line items line up in columns.

Pad & Chop Right Align

For example, the special option PCR(50) indicates a pad and chop value of 50. A pad and chop value insures that an element is right justified within a certain number of characters. If the actual value is longer than the specified value it will be truncated. If the actual value is shorter than the specified value it will be right padded. This functionality is used to make line items line up in columns.

Search & Replace

For example, the special option SR(STEXT)(TTEXT) indicates a search and replace operation. When a token is found with this special option attached to it, the source text is replaced with the target text in the token before it is rendered into the overlay file. This rule is intended for changing values that are wrong in the XML, and need to be corrected before they are displayed. Please notice that parenthesis is used to delimit the source text and the target text.

Format Zip Code

FZ this rule will insert a dash into a string of numbers conforming to the standard 5-4 USA zip code rules. If there are more than five characters in the number string, then the dash is added. If there are five or less characters then the string is left as is.

Format Money

FM this rule will take a floating point number and format it with commas between each section of three digits, and insures there are two decimal places at the end.

Format Number Commas

FNC this rule will take a decimal number and format it with commas between each section of three digits.

Prefix Negative Sign

PNS this rule will take any value and place a negative sign in front of it

Format Date

For example, this rule will format the date from SAP format into standard month, day, year format.

ZINVOIC02.IDOC.E1EDK02.DATUM::FD(%m/%d/%Y)

The special option FD(FORMAT) is used to take in dates in the format YYYYMMDD, and translate them into any output format that you want. The FORMAT string describes the position and format of the final date string.

  • %a
  • Abbreviated weekday name
  • %A
  • Full weekday name
  • %b
  • Abbreviated month name
  • %B
  • Full month name
  • %c
  • Date and time representation appropriate for locale
  • %d
  • Day of month as decimal number (01 – 31)
  • %H
  • Hour in 24-hour format (00 – 23)
  • %I
  • Hour in 12-hour format (01 – 12)
  • %j
  • Day of year as decimal number (001 – 366)
  • %m
  • Month as decimal number (01 – 12)
  • %M
  • Minute as decimal number (00 – 59)
  • %p
  • Current locale's A.M./P.M. indicator for 12-hour clock
  • %S
  • Second as decimal number (00 – 59)
  • %U
  • Week of year as decimal number, with Sunday as first day of week (00 – 53)
  • %w
  • Weekday as decimal number (0 – 6; Sunday is 0)
  • %W
  • Week of year as decimal number, with Monday as first day of week (00 – 53)
  • %x
  • Date representation for current locale
  • %X
  • Time representation for current locale
  • %y
  • Year without century, as decimal number (00 – 99)
  • %Y
  • Year with century, as decimal number
  • %z, %Z
  • Either the time-zone name or time zone abbreviation, depending on registry settings; no characters if time zone is unknown

SAPs IDOCs And Why Special Techniques Are Necessary

This paragraph provides insight into the structure of the particular XML document that SAP generates for its internal workflow. These documents are called IDOCS. The notable insight into the structure of the non-standard SAP IDOC format is this. You must scan over an unspecified number of parent nodes looking for a child node with a particular identifier. This identifier might include wildcard characters. Once the parent node has been located by scanning over the set of child nodes, then the value is extracted from the identified parent node using a different child node. The brackets can appear at the end of any XML data node, and sometimes, as in this example, the data nodes are broken up at a higher level than is immediately apparent. You must investigate and understand the structure of your XML document in order to properly make use of iterative rules. Please note that SAP also, sometimes, sends dates in an abnormal format. This abnormal format is YYYYMMDD. See the FD special formatting option that is used to fix this problem.

OptiCapture Tab

This tab controls the setup of the optionally available OptiCapture PDF417 coversheet processing module. This tab is only available if the separate OptiCapture DLL has been purchased and installed within the ReportServer application program. An OptiCapture channel is a special kind of channel that monitors one drop folder and is able to insert documents into different collections based on an OptiDoc PDF417 barcode coversheet. Essentially, there is a required PDF417 barcode coversheet field called "collection" that aims the OptiCapture module at a chosen collection. The OptiCapture module also automatically supports "bursting" of multipage TIFF documents that have multiple PDF417 coversheets embedded into the same document. This "bursting" function is necessary for discovery of coversheets that have been delivered via fax, and the users have piled many documents together within a single fax transmission.

Collection Map

The collection map is a drop down list of available field mapping templates for processing PDF417 barcode forms. All templates must be entered and updated manually. The name of a template name must be the name of the collection that the map is associated with. The contents of the map show how incoming fields in the PDF417 barcode are mapped onto SQL fields in the database collection. Please note, there should always be exactly one map per collection.

To add a new collection map template, enter the new template name into the "Collection Map" drop down menu and click the "Add" button. To change a template name, select the desired template, change the name, and click the "Change" button. To delete an existing template, select the desired template and click the "Remove" button.

Note: The name of the Map Template must be identical to the related Collection Name.

Fields List

The "Fields" list displays the mapped fields of the selected Map Template. The left hand column displays the fields coming from the PDF417 barcode and the right hand column displays the related SQL fields in the target collection. A PDF417 barcode coversheet is required to have the "invisible" field named "collection" that gives the name of the template to use.

The From and To fields

When creating a new template, these fields are used to define which PDF417 barcode field relates to which collection field. Once the fields are entered, click the "Add" button and the fields will appear in the Fields window. To make changes to the field names, highlight the desired row; make the appropriate changes and the click the "Change" button. To remove a field name mapping, highlight the appropriate row and click the "Remove" button.

Finish Tab

Any error will cause the input file to be copied into the error folder

This is the location where input files are moved when errors processing those files are encountered. Any file which times out an external process, or has unexpected field data, or for any reason cannot be processed ends up in this folder.

Limit folder size to most recent

This is the maximum number of files that will be stored in the error folder before the oldest files start getting deleted. If this check box is off, then files that cause errors will continue to get copied into the error folder regardless of how many error files are all ready in that folder.

Successfully processed files will be copied into the done folder

As files are processed through the application program with success, they are removed from the input folder and copied into the done folder. This feature might be used as a method for keeping a certain sized set of processed output files for diagnostic analysis or backup.

Limit folder size to most recent

If this checkbox is off, then files that are processed with success will continue to get copied into the done folder regardless of how many error files are all ready in that folder. If this checkbox is on, then the oldest files will start getting deleted out of the done folder after the size limitation has been met.

Delete the monitored file from the input folder when done

The default for this setting is true. If you are testing the processing of an individual file over and over again during a diagnostic analysis, or a setup procedure, then this feature can be used so that the text input file remains inside of the input folder, even if it is processed successfully.

Halt processing on database error (otherwise keep trying to reconnect)

If a database connection disappears during the operation of this program and this feature is enabled, the application program will lightly sleep for ten seconds and then try to login and retry the database operation. If this feature is not enabled, then when a database connection gets disturbed, an error message is posted into the channel log and the processing for that channel stops. The default setting is to attempt to retry database connections when database errors crop up.

Using The Text Rule Parser

The text rule parser is the mechanism that accepts report input and locates the database fields within that document. The text rule parser is a very flexible and powerful way to automatically find and evaluate document index fields. Using anchors and search operations provides a position independent way to find database fields. Although simple x,y character position operations are still supported, it is strongly recommended that search anchor and regular expression strategies are used. Using these advanced features provides a significant degree of protection from variations in the input document.

When the text parser dialog window comes up, it expands to take up the entire window space of the current monitor. The reason for this is to provide you with the most space for working with the text that will appear on the screen. If there is not enough monitor space to see and work with the all of the text that you need, then use the horizontal or the vertical scroll bars to scroll the text around. Another alternative is to use a computer with a larger monitor, or to purchase a larger monitor for this purpose. If you intend to layout a significant number of templates, the ability to see all the text on the screen at once can save time.

Technically, the text parser is a LL1 computer language parser, with a matrix of look ahead token handlers. The set of commands currently maintained by the parser has been adequate for very many forms and mainframe reports and can be easily extended as needed.

The user interface for the text parser provides a way to create the most common type of parser commands, and you can use simple use drag and click operations to create these token processing rules. These processing rules are put into the "Zone" field. When you press the "Add Rule" button the parser associates the selected database field with the token command that is currently in the "Zone" field. As an advanced feature, the parse engine also allows you to directly type in, or program in, powerful commands directly into the "Zone" field.

Initialize Field Rule Editor

The field rule editor must be initialized before you can work with it. There are some upfront requirements that you need to take care of before you can use the field rule editor.

First you must make certain that you have a valid sample document in the input folder for the channel. You must also make certain that the channel has been assigned at least one work mode that expects to find that kind of sample document.

Once you are certain about these two requirements press the "Initialize Field Rule Editor" button at the top left of the properties tab. If you have not met all the requirements a then a warning dialog will inform you of this. When you initialize the field rules editor, it logs into the database and locates all of the field information that is related to the collection that has been assigned to the channel. It also transforms and preprocesses the sample text into normalized format.

Since, when you press the "Initialize Field Rule Editor" button, the system must log into the database as well as transform any text data that it may find into a parser normalized format; this can be a lot of work. Therefore it may take a few minutes to get ready, depending on how big the sample file is, and how many collections the channel OptiDoc™ user has access to. Please be patient when initializing the parser generator field rules editor. Once the parser generator fields rule editor has been initialized, then the right part of the screen will contain a large scrolling edit field populated with text from the first page.

Page Navigation

If the text contains multiple pages then you will be able to navigate from page to page by using the arrow buttons to page to the next or to page to the previous page. You can type in a page number directly into the navigation section and press the go to page button to make a jump to a specific page without using the page next or the page previous buttons. As a suggestion, use these tools to determine if the document that you are parsing has a specific or a variable number of pages per document.

A customer does not often know the true answer to this question. For example, if there are suppose to be six pages per document, then try going to page 312 (a multiple of six) and see if this page looks like it is the beginning of a document. If some error in the page count has happened in the previous 312 pages, then page 312 will not appear like you might expect it to.

At this point you can use the page forward and page backward buttons to narrow down the position where the error has happened. Using this technique, you can prove or disprove that the input data has a dependable number of pages per document. If there are a dependable number of pages per document then you can use a simple page count as a way to break the master document up into database documents, otherwise you must create a "first page rule" to break the master document up into database documents.

Rich Edit Text

The text that shows up in the rich edit field is either obtained from a PDF document, or from a mainframe file, or a COLD file. However, all of the sources of textual information have been normalized into a standard format. The text is normalized so that it can be easily viewed, searched, and made ready for display, rule generation, and parsing using the rich text edit control.

You can control the color, font, and size of the text in the rich edit text control by clicking on the "Font" button at the bottom of the controls section. It is recommended that you use a non proportional font such as courier or machine in order to work with the parse file generator. This makes columns of text line up properly, like on a teletype, and makes counting the number of character spaces easier.

When you normalize text the line spaces have all been padded to equal lengths and tabs characters have been expanded according to the specified parameters. The result is a nice rectangular array of consistent text for the parser to work with.

Pages Per Document

The number of pages per document is an important setting for documents where index data must be obtained from other than the first page. When parse rules are applied to a document, a particular rule is only applied if the page number of the database document matches the page number recorded into the parser rule.

Therefore, if there are normally six pages per document, and you want to extract database information from the second page of the document, then you should enter the number six into the pages per document edit field before you attempt to add the parse rule from the second page. If you do this in the wrong order the number recorded into the parse rule defaults to 1, and this means that this rule will only apply to the first page of a database document.

In order to make parse rules fire on pages other than the first one, the parse rule generator must have some idea about how many pages to expect in a normal representative document. A common mistake is to leave the pages per document field set to one, and not the number of expected pages per document. This makes the parse rule generator consider every page that is brought into the text view to be the first page of a database document. You must set the pages per document edit field to a value larger than one, so that parse rules are applied to the specified pages.

Collection

The collection edit field is a read only field that is the current database collection assigned to the channel. The collection can only be changed by choosing a new collection from the login tab of the preferences dialog for the channel. The collection that is selected changes the available field name values in the drop down menu in the parser generator dialog.

Current Page

This edit field shows the current page of the document that is currently loaded into the parse rule generator screen. If you enter a page number into this field and then press the "go to page" button, the parse rule editor will take you over to the specified page. By default when you add a rule, the rule is connected to the current page. A rule will not fire unless it has been added while viewing the proper document page. For example, on a two page document, rules assigned on page one, will only fire when the text parser examines the first page and rules assigned on page two, will only fire when the text parser is on page two. To force a rule to fire on the last page of a document, when the document has a variable number of pages, you must prefix the rule with the LASTPAGE command.

Visual Document Consistency Checking

You can use the next and previous buttons to move from page to page from within the text parser user interface. This technique can be used to preview a document to see that the input document actually has the same number of pages per document, and that the information and layout is consistent on each document page. It is easy to see if the type of document or field positions is changing while you are flipping rapidly through the document pages.

Next Button

Clicking on this button takes you to the next page in the overall document. It does not take into account if the number of pages inside the internal documents has the same page count or not. You can see if the page numbers are skipping around, or not, by just moving forward a page at a time and looking for inconsistencies. Once you hit the end of the document, you can no longer more forward.

Previous Button

Clicking on this button takes you to the previous page in the overall document. It does not take into account if the number of pages inside the internal documents has the same page count or now. You can see if the page numbers are skipping around or not, just by skipping backward a page at a time and looking for inconsistencies. Once you hit the beginning of the document, you no longer can move backwards.

Field Name

Choose the database name of the field that you want to populate with data that is derived from the parser engine using the attached parse rule. When you choose the field name, the SQL name for the field and the type of the field are also determined and loaded into description fields below the field name drop down menu. This can help you to determine if your database has been setup correctly. For example, date types should have a date database type. Once you have selected a field then you can click on the add rule button to associate the contents of the "Zone" field with the selected database field.

Font

Click on this button to bring up a font picker dialog. You can control the font size, and the font type, and the font color, and all other font attributes with this dialog. You will want to use a font that is non-proportional such as courier or courier new or machine so that columns of text line up and down properly. It makes it easier to count spaces when the font size is large. It also makes it easier to read the text when it has a color that makes the screen look nice. Any changes made to the font dialog are immediately shown on the Rich Edit control when the font dialog is dismissed. Use this feature to make your work space functional, and also pleasurable.

How To Test Parse Rules

Click on the run rules button to execute all of the assigned rules and show the result in a dialog box. The rules that fire are restricted to those rules that apply to the current page. The current page depends on the number of pages per document. Use this feature to observe if the rules that you have in place are actually creating the results that you want. You can navigate around within a document and run the rules at any time. A restricted number of rules will run because every added rule is connected to the page that was in view when the rule was added. Using this knowledge you can test your rules on different pages within a multi page document. Shift from page to page within your document and run the rules. The results for that page will be displayed. By following this test procedure you may discover that you need to strengthen a rule to defend against formatting and layout exceptions in the document.

How To Add A Parse Rule

Click on this button to associate the contents of the "Zone" edit field with the database field that is currently selected in the drop down menu. If a rule has all ready been assigned to this field, then the old rule is overwritten with the new rule. The contents of the "Zone" edit field must be a sequence of parser language tokens. Parser language commands can either be typed directly into the "Zone" edit field, or some simple commands can be manufactured by the user interface by using the mouse to click and drag.

How To Remove A Parse Rule

Click on this button to delete a rule from the currently selected database field. If you have assigned a rule to the wrong field and you need to remove the rule from the currently selected database field then click on this button.

How To Create Easy Parse Rules

There is a powerful but simple method for constructing parse rules. This method allows you to construct simple rules without having to understand the parse rule language grammar in detail.

We will now list the steps for creating a simple LOOKFOR rule using the API.

  • Choose a database field to assign from the drop down menu.
  • Drag to select text to use as an anchor from the large text control on the right hand side.
  • Click on the “Look for what?” button.
  • This will put the text that you want to look for into the "look for what" edit field.
  • The next step is to select the text that you want to capture as it appears after the anchor.
  • Click on the "Get Zone" button.
  • This will create a rule that looks for the anchor, and then gets the text located after the anchor as the value for the database field.
  • Click on the "Add Rule" button
  • Click on the "Test Rules" button to evaluate the new rule.

It is best to select some padding room before and after the captured text, so that the captured text can wander around a little bit between the selection points without affecting how the parser operates. However you must be careful that your extra padding does not interfere with another field.

LOOKFOR 'SSN' SKIP 02 GET 10

The above simple rule requires a description. This rule means that when the first page of the document is available for parsing, the text "SSN" will be found by scanning from the upper left to the lower right in a case sensitive fashion. Once the text is found, then two characters will be skipped forward. Then a ten character unit of text will be extracted from the document. The left and right edges of this text unit are trimmed to produce the value that is associated with the selected field. A text unit is not allowed to wrap a line. Again, this is the most simple parser rule, and much more versatile, compound, and complex rules can be made.

Once you have a rule in the "Get Zone" edit control, click on the "Add Rule" button to associate this rule to the chosen database field. If the field all ready has a rule assigned to it, then this new rule will overwrite the old one.

When you choose fields in the field name drop down menu, any previously attached rule will show up in the "Get Zone" edit control. You can look over all of the assigned rules by choosing each field name from the drop down one at a time.

Pressing the "Run Rules" button will evaluate all assigned rules against the current page.

If the number of pages per document is set to a value greater than one, then rules can be set to apply to different pages of a document. A rule only gets applied if the page number for the rule matches the page number of the document. The test parse rule evaluator that gets run with the "Run Rules" button only runs rules that are programmed to fire on the current page.

If you are already familiar with the parse rule grammar then you can type command language syntax directly into the “Get Zone” edit control and click on the "Add Rule" button to setup rules.

How To Create Strongly Matching Or Compound Parse Rules

LOOKFOR 'SSN' LOOKFOR ':' LOOKFOR 'BIRHDATE' LOOKFOR ':' GETNUMBER

In the above example, the text 'SSN' is found first, and then a variable number of spaces are skipped over until a colon character is found. Then the text 'BIRTHDAY' is found, and then a variable number of spaces are skipped over until a colon character is found. Then a number will be read starting at this location and moving forward, skipping over whitespace until a digit is found, and then acceptiong all following digits, commas, and dots, until the first invalid characters is found. This final number is the return value for this parse rule.

This cascading command technique is used to make rules stronger and better able to locate the needed field, even if the field value occurs in a difficult location. The second lookfor that finds the colons is often used for mainframe files that do not consistently delimit their output. The colon might come right after the delimiter word, or it may come some characters later on. Mainframe output can be very inconsistent.

How To Create Coordinate Parse Rules

LINESDOWN 3 SKIP 25 GET 30

In this example, on the third line starting 25 characters over for 30 characters is some text. This rule will extract that text and trim any left and white space from the resulting text. Text extraction for a text unit is not allowed to pass over a line.

The following is a synopsis of the text language parser. The text language parser is based upon common language parsers. This simple rule language consists of a sequence of commands. One or more commands make up a rule. Parse rules are entered into the zone edit field and associated with a database field.

How Parse Rules Work

Parse rules are used to parse document text into field values. Rules also have access to internal variables. These variables provide additional information about the document. For example the total number of pages, the current page, the name of the input file, and windows summary information, are also available. When a rule is executed it is passed the document page as the initial input element. As a rule executes a command it consumes data from the input element. This new element then becomes the input to the next command. This is also how DOS or UNIX command processors work. A parse rule starts at the left executes to the right and each command consumes input from the previous command until the final value is resolved.

It is the final resolved value that is inserted with the document as a database field. For example, the LOOKFOR command scans the input element using a case sensitive scan for the first occurrence of specified text. When this command finds the text inside the current text unit, the text unit is reduced. Remember the specified text must appear within single quotation marks. If a command fails then the rest of the commands in the command chain are not executed. For example, if the LOOKFOR command cannot find a match the rule is aborted.

Parse Commands - Chains Of Parse Commands Make Up Parse Rules

FIRSTPAGE

The syntax of this command is FIRSTPAGE. Prefix any parse rule with FIRSTPAGE to cause the rule to only fire on the first page of the document.

LASTPAGE

The syntax of this command is LASTPAGE. Prefix any parse rule with LASTPAGE to cause the rule to only fire on the last page of the document.

PRETTY

The syntax of this command is PRETTY. The function of this command is to remove extra white space from a group of words that that have been fetched as the previous unit. For example, if you are parsing a form where the first name, middle initial, and last name are spread across the form because of a column alignment restriction, this feature will allow you to contract all of the words in the unit together neatly with only one single space separating each world in the resulting unit.

MASK 'NNNNN'

The syntax of this command is MASK 'NNNNN' where N is either a '0' or a '1', or an 'X', depending on if you want the character to be left with the result, or replaced with a space in the result, or removed from the result entirely. The mask size must match the number of characters in the previous text unit. For example if you wanted to extract and append together the first and the last characters in a five character text unit, the command would be MASK '1XXX1'. The final result would be two characters long. If you wanted to keep the first original character and set the second character of a four character unit to a space and remove the third character but keep the last character, the command would be MASK '10X1'. The result will be three characters long, and the character in the middle will be a space. Leading and trailing spaces are always removed from the text unit before that text unit is passed forward to the parser.

NOW

The syntax of this command is NOW. The function of this command is to return the current date and time.

GETYEAR

This syntax of this command is GETYEAR where the previous unit is a date. If the previous unit is in some recognizable date format, then this command will return the four digit year associated with the year. For example, to obtain the current year, you would enter the command NOW GETYEAR.

GETMONTH

This syntax of this command is GETMONTH where the previous unit is a date. If the previous unit is in some recognizable date format, then this command will return the month fully spelled out, such as January, February, March, etc. For example, to obtain the current month, you would enter the command NOW GETMONTH .

GETDAY

This syntax of this command is GETDAY where the previous unit is a date. If the previous unit is in some recognizable date format, then this command will return the day of the month for that date as a number between 1 and 32. For example, to obtain the current day, you would enter the command NOW GETDAY.

GETDAYOFWEEK

This syntax of this command is GETDAYOFWEEK where the previous unit is a date. If the previous unit is in some recognizable date format, then this command will return the day of the week, such as, Monday, Tuesday, etc. For example, to obtain the current day of the week, you would enter the command NOW GETDAYOFWEEK.

GETHOUR

This syntax of this command is GETHOUR where the previous unit is a date. If the previous unit is in some recognizable date format then this command will return the hour of the current day in military time as a number between 0 and 23. For example, to obtain the current hour, you would enter the command NOW GETHOUR.

GETMINUTE

This syntax of this command is GETMINUTE where the previous unit is a date. If the previous unit is in some recognizable date format then this command will return the minute of the current hour in as a number between 0 and 59. For example, to obtain the current minute, you would enter the command NOW GETMINUTE.

GETSECOND

This syntax of this command is GETSECOND where the previous unit is a date. If the previous unit is in some recognizable date format then this command will return the second of the current minute as a number between 0 and 59. For example, to obtain the current second, you would enter the command NOW GETSECOND.

GETNUMBER

This syntax of this command is GETNUMBER. No parameters are passed. The input unit is scanned for the first occurrence of a digit. Once a digit has been found, then characters are consumed that include digits, a comma, or a dot. If a CR or a LF or a SPACE or any incompatible character is found the number scanning terminates. The returned output unit is the extracted number.

LOOKFOR 'string'

This syntax of this command is LOOKFOR 'string'. The first parameter is a case sensitive tick delimited search string. From the current cursor the LOOKFOR command scans the text parser data for a match and then returns the reminder of the document as the unit. When the number of spaces between fields of different types vary between documents it is useful to use the LOOKFOR command to locate the proper position within a particular document. If a LOOKFOR command does not find any matching data then the rule is terminated and unit processing stops. When several instances of particular words occur in a scanned document, LOOKFOR commands can be chained together in order to find the correct offsets into the document. For example if you wanted to find the second occurrence of "SSN:" in a document and then obtain a subsequent 10 character string, a rule like this might work:

LOOKFOR 'SSN:' LOOKFOR 'SSN:' SKIP 02 GET 10

LOOKFOR2 'string'

This syntax of this command is LOOKFOR2 'string'. The first parameter is a case sensitive tick delimited search string. From the current cursor the LOOKFOR2 for command scans text parser data for a match and then returns the match along with the remainder of the unit. The only difference between the LOOKFOR command and the LOOKFOR2 command is that the LOOKFOR2 command also returns the expression for which you are looking. The LOOKFOR command is designed to locate anchors or flags in the parser data, while the LOOKFOR2 command is designed to find actual data.

LOOKFOR2 '77' GET 10

LOOKFORGETBELOW 'string'

The syntax of this command is LOOKFORGETBELOW 'string' width. The first parameter is a case sensitive tick delimited search string. The second parameter is a number designating the width of the field value that will be extracted. When this command executes from the current cursor position a line that contains 'string' is found. If the string is found then the value is extracted from the subsequent line centered upon the position of the located string and given the specified width. The returned unit is the input unit stripped down to the line after the line containing the located string. This command is useful for obtaining data from headers where the data is centered below the header on the following line.

Here is an example of a rule that matches the case sensitive string "Invoice Number" and extracts the value that is centered below the matched header on the following line.

LOOKFORGETBELOW 'Invoice Number' 15

REGEXPGETBELOW 'string'

The syntax of this command is REGEXPGETBELOW 'string' width. The first parameter is a regular expression given as a ticked string. The second parameter is the width of the field value to extract that will be centered below on the following line. From the current cursor a line matching the regular expression given by the regular expression 'string' is found. The value is extracted from the subsequent line centered upon the position of the located string and given the specified width. The returned unit is the input unit stripped down to the line that follows the line containing the matching regular expression string. This command is useful for obtaining data from headers, where the data is centered below the header on the next line.

Here is an example of a rule that matches the first of three strings "Invoice Date" or "Debit Memo Date" or "Credit Memo Date" and extracts the value that is centered below the matched header on the following line. Please notice that strings in regular expressions require that spaces be escaped with the backslash character.

REGEXPGETBELOW 'Invoice\ Date | Debit\ Memo\ Date | Credit\ Memo\ Date' 15

GET N

This syntax of this command is GET N where N is number of characters to grab out of the parser file. The left and right sides of any get operation are always trimmed off. It makes good sense to begin a get command a little before a field starts and to end a get command a little after the field ends. This way the field can change position to some degree within the input parse file and still be correctly obtained.

GETUNTIL 'string'

This syntax of this command is GET 'string' where string is any termination string that indicates the end of the normal get operation. Characters will be gathered into the get command until the termination string is reached. The left and right sides of any get operation are always trimmed off.

LINESDOWN N

This syntax of this command is LINESDOWN N where N is the number of lines to descend into the normalized parse text before the next command is invoked. It is through the use of the lines down command and the SKIP command that X,Y data can be extracted from mainframe or COLD files that do not have any tag or header information to use as an anchor position. The LINESDOWN command is used to locate the Y coordinate of a parse rule and is measured in lines down from the top. When parsing text data that has no headers or tags that can be located with the LOOKFOR command, then a combination of LINESDOWN and SKIP can be used to locate absolute positions within the file.

SKIP N

The syntax of the command is SKIP N where N is the number of characters that are skipped over. The SKIP N command is used to locate the X coordinate of a parse command that is measured in characters from the left side of the normalized parser document. When parsing text data that has no headers or tags that can be located with the LOOKFOR command, then a combination of LINESDOWN and SKIP can be used to locate absolute positions within the file.

VALUE 'string'

The syntax of the command is VALUE 'string' where 'string' is the actual value that is assigned to the field. The supplied 'string' can be any characters. Using this rule is a way to specify that constant or known data value be entered into a database field.

LOOKFORVALUES 'string1, string2, string3 …’ ‘value1, value2, value3 …’

The syntax of this command is LOOKFORVALUES 'LOOKFORLIST' 'VALUELIST'. The look for list is a comma delimited list of values to find. The value list is a comma delimited list of constant values to return. When this command executes it progresses from left to right within the look for list. If a match from the look for list hits then the corresponding value is returned.

LOOKFORVALUES 'Debit Memo Date, Credit Memo Date, Invoice Date' 'DBM, CRM, INV'

The above sample rule performs the following steps.

  • Looks for the string "Debit Memo Date" and if there is a match then the value "DBM" is returned.
  • Looks for the string "Credit Memo Date" and if there is a match then the value "CRM" is returned.
  • Looks for the string "Invoice Date" and if there is a match then the value "INV" is returned.

GETFILENAME

The syntax of the command is GETFILENAME and the returned value is the input file name. The input file name does not include any part of the input path. This command provides the input file name as a variable that can be used to populate fields.

TOKENIZE ‘string’ N

The syntax of the command is TOKENIZE ‘string’ N. The returned value is the tokenization of the previous input. ‘string’ is a quoted string of characters that are used to delimit tokens. N is the number of the token element that you want to extract. The first token is number zero. For example if you wanted to extract the extension part from a file name and use that as the value of a database field, the command would be:

GETFILENAME TOKENIZE '.' 1

This command will split the file name into two tokens separated by the dot character. The first token, or token zero, will be the file name stripped of its extension. The second token, or token one, will be the file extension.

FILESUMMARY ‘name’

This command will obtain values from the advanced and simple summary caption fields that are used as fields under the Windows operating system. If you view the properties of a document, there will be a summary tab that contains certain values. This command provides a mechanism to obtain these values and use them as index values from within the text parser engine.

file-summary.gif

The list of advanced and simple caption fields that can contain values are as follows:

  1. Title
  2. Subject
  3. Author
  4. Keywords
  5. Comments
  6. Template
  7. LastAuthor
  8. Revision Number
  9. Edit Time
  10. Last printed
  11. Created
  12. Last Saved
  13. Word Count
  14. Char Count
  15. AppName
  16. Doc Security

FILESUMMARY AUTHOR

This command will return the value of the author field in the file summary properties tab.

REGEXP ‘regexp’ AND REGEXP2 ‘regexp’

Constructing regular expressions

see documentation in Wikipedia

Regular expressions are masks that match strings and are specified using certain symbols. Python users should understand this concept immediately. The power of regular expressions is used to make intelligent parse rules that can find and match any possible kind of string. Some examples follow.

If you want to find the exact string "boogyman" then the regular expression for that match is 'boogeyman'. Please note that all regular expressions must be delimited by tick marks, and that no tick mark should exist within the regular expression. If you need to use a tick mark within a regular expression then use its ASCII equivalent (39 decimal, 27 hex, 47 octal, &#37 html).

If the phrase you are looking for contains a space then you must use the backslash character. If you want to find the exact string "boogy man" then the regular expression of that match is 'boogy\ man'.

If you want to find the next word that begins with an "a" and ends with a "b" then the regular expression would be 'a*b'. Possible matches for this regular expression would be "a-biglongword-b", "acb", or just "ab". If you require that something exist between the characters "a" and "b" then the regular expression would be 'a+b'.

If you want to find the first occurrence of the world "parcel" or the first occurrence of the word "post" then the regular expression would be 'parcel | post'.

Regular expressions are nested by using parens ( ).

Individual characters. e.g. "h" is a regular expression. In the string "this home" it matches the beginning of 'home'. For non printable characters, one has to use either the notation \xhh where h means a hexadecimal digit or one of the escape sequences \n \r \t \v known from "C".

Because the following characters have a special meaning in regular expressions

* + ? . | [ ] ( ) - $ ^

Escape sequences must also be used to specify these characters literally:

\* \+ \? \. \| \[ \] \( \) \- \$ \^

Furthermore, use '\ ' to indicate a space, because this implementation skips spaces in order to support a more readable style.

Character sets enclosed in square brackets

[ ]. e.g. ‘[A-Za-z_$]’

matches any alphabetic character, the underscore and the dollar sign (the dash (-) indicates a range),

e.g. ‘[A-Za-z$_]’

Matches "B", "b", "_", "$" and so on. A ^ immediately following the [ of a character set means 'form the inverse character set'.

e.g. ‘[^0-9A-Za-z]’

matches non-alphanumeric characters.

Expressions enclosed in round parens ( ). Any regular expression can be used on the lowest level by enclosing it in round brackets.

Operators indicating the multiplicity of the preceding element

Any of the above five basic regular expressions can be followed by one of the special characters * + ? /i

  • * meaning repetition (possibly zero times); e.g. ‘[0-9]*’ not only matches "8" but also "87576" and even the empty string "".
  • + meaning at least one occurrence; e.g. ‘[0-9]+’ matches "8", "9185278", but not the empty string.
  • ? meaning at most one occurrence; e.g. ‘[$_A-Z]?’ matches "_", "U", "$", .. and ""
  • i meaning ignore case

CATENATION

The regular expressions described above can be catenated to form longer regular expressions. E.g. "[_A-Za-z][_A-Za-z0-9]*" is a regular expression which matches any identifier of the programming language "C", namely the first character must be alphabetic or an underscore and the following characters must be alphanumeric or an underscore. "[0-9]*\.[0-9]+" describes a floating point number with an arbitrary number of digits before the decimal point and at least one digit following the decimal point. (The decimal point must be preceded by a backslash; otherwise the dot would mean 'accept any character at this place'). "(Hello (,how are you\?)?)\i" matches "Hello" as well as "Hello, how are you?" in a case insensitive way.

SHORT HAND REGULAR EXPRESSIONS

Finally - on the top level - regular expressions can be separated by the | character. The two regular expressions on the left and right side of the | are alternatives, meaning that either the left expression or the right expression should match the source text.

‘[0-9]+ | [A-Za-z_][A-Za-z_0-9]*’ matches either an integer or a "C"-identifier.

Using The Overlay Generator

The overlay generator is used to generate overlay files. Overlay files contain within them TIFF image backgrounds that hold the background information that is to be the background for the final overlay file. Often times these backgrounds represent an empty form that is to be filled in with the text. Overlay files are binary data files that are stored inside of the "Overlay Templates" folder. The "Overlay Templates" folder is in the same folder as the application program.

Overlay files can be moved from workstation to workstation by copying and pasting these files from one "Overlay Templates" folder into a different "Overlay Templates" folder. For example, one master application may be licensed to have all of the generators enabled, allowing that version of the application to create and edit overlay templates. These overlay templates can then be copied and pasted into the regular version of this application program in order to make use of the overlay templates.

An overlay template contains a background TIFF image, either in black and white, grayscale, or RGB color format. The overlay template also contains all of the information for scaling and transforming text data onto the background form and etching the characters into the background form making a final TIFF image of the filled in form that is inserted into the database. When RGB color overlay templates are used, the text that is etched onto the background TIFF can be made also in color so that it can stand out. A G4 compressed overlay file is around 70K in size whereas a pack bits compressed RGB color overlay file is about 700K in size. When a RGB color overlay template is used significantly more disk space is used up, and an informed decision needs to be made regarding the storage requirements, and visual appearance tradeoffs in the final installation.

When an overlay template is first opened in the overlay generator, it will appear in a window that fills up the entire screen. If the view is not large enough for you to work with, then try to open up the overlay file on a workstation that has a larger monitor, or purchase a larger monitor for this purpose. It is very helpful to see the overlay on the screen as big as possible in order to configure an overlay template. Also, the last sample text that was run against the overlay editor will appear on the screen, possibly mapped onto the desired color. Every template remembers its last sample text and this text is used to draw the default preview mode for a newly opened overlay template file.

The overlay generator creates a device context for all of the pixels in a very large off-screen memory buffer. The text is then rendered onto the overlay on a 1 to 1 or unit coordinate transformation. Only when the entire result is ready is it squeezed into the proper location in the screen or device context window. Since the actual data is being squeezed onto the screen, sometimes what you see on the screen does not always look perfect. However, the data is that data that gets written out to disk during real operations, is not squeezed, and therefore, this image data is always as accurate as possible. This methodology has been used so that the transformations are exactly the same when you view an overlay file as when you create an overlay file for insertion into the database. This technique uses more RAM memory from the workstation but insures that the final outcome looks like what you see inside of the overlay generator.

Get Background Image

Click this button to choose a new TIFF image to use as the background image in the selected overlay file. The file that you choose must be a single page TIFF file. Very old overlay templates had the extension ".OVL" but they were simply TIFF files that had been renamed. Also be aware that old overlay templates were compressed using JBIG compression. The get background image button allows you to pick files of type ".OVL" and it treats them as though they were regular TIFF documents. If a JBIG compressed TIFF file is found, it will be automatically recompressed into a G4 TIFF file before it is imported into the overlay template. If a company has only changed the background image for a type of form, then this is an easy way to simply switch out the background image.

Get Text Source

Click this button to choose a new source of text that is going to be fed into the overlay template. A source file may be PDF, or it may be a mainframe report, or it might even be a COLD file. Use this feature to acquire the text that will be fed into the overlay template so that you can instruct the overlay template how to best fit the text onto the overlay. Once a new text source has been specified the text should appear somewhere on the overlay. It may be the wrong size and in the wrong position, but it should definitely be there.

Go To Page

If the source document has many pages of input text then this button can be used to navigate to any page directly. Once you have gotten the layout of the overlay in pretty good shape, it is often a good idea to try the overlay with different samples of text. These different samples can be randomly selected by using the go to page button.

Font

The font is the point size of the font that will be used to draw the text onto the original document. For example, the point size of 35 is reasonable for drawing on an 8.5 by 11 inch image. It is probably best to always choose a font that is non proportional. A warping algorithm is used to draw individual characters onto the background overlay. This algorithm causes a character to be drawn within its own specific grid position. However, proportional fonts will squish some characters to the left or the right or the top or the bottom of the box, even when each individual character grid position is specified. So far it seems best to always choose a non proportional font, but you are allowed to choose any font that you want. The color of the font will be rendered into the overlay, only if the background overlay file is an RGB color file. Otherwise, the color of the font is ignored, and the text is always rendered as black for black and white or grayscale background TIFF overlays.

Tabs

The number of spaces per tab can make a difference when expanding mainframe text into its normalized format. This value lets the overlay know what the best number of characters to expand tabs into will be. Since each character must fit into an individual cell in the warped text grid, a tab character is a real dilemma, since it can expand into any number of character cells.

Height & Width

When the text is first drawn onto the background TIFF overlay, the size of the entire grid is calculated according to a set of font metrics, and an analysis of the text is done to determine what the proper height and width of the grid might be. These values are then used to create default starting values for the height and the width for each character cell. These values are 100 times the normal point size, so that each character of text can be warped on the screen into boxes that are controllable down to the second decimal place. The approximate values are often too small to make the character grid fit onto the background TIFF overlay properly. You can warp the character grid by entering in new values for the overlay generator to use. Remember that in order to use a 16 point tall 20 point wide character box, you must enter 1600 into the height field, and 2000 into the width field. Since the height and the width are independent variables the grid can be warped by any particular set of choices. The width is often several pixels too small, and the height almost always seems to be about half as tall as it really needs to be. You can enter new values into the height and width fields and then press the update button to see how the new values affect the spread of the characters over the background TIFF overlay. You can enter tiny adjustments to the height and width of the character boxes by using the last two decimal places. For example 1625 means a 16 point plus 1/4 of a point for the height or width of the character box. You can also grab the text with your mouse and drag it around and into position, if a change to a height or a width has made the text shift over too much. You must adjust the height and width using the full scale 100 zoom precision of the character boxes, and then drag the text into the right spot manually, until you can get the settings just right.

Scoot

When the text is first drawn onto the background TIFF overlay, the text is drawn at a zero offset. In the real world, this value is often incorrect, and the text really needs to start at some point shifted to the left and down from the zero point. You can enter values into the over and down fields and then press the update buttons to move the text around over the background TIFF overlay and view the results. As an alternative, you can also just click and drag the text around on top of the background TIFF overlay image until you have it in the correct position. You have to adjust all of the parameters available, font size, font type, height, width, over, and down, until you have a perfect match of the mainframe text with the background TIFF overlay.

Using The XML Overlay Generator

The purpose of the XML overlay generator is used to create overlay files using standard TIFF images as the background and data parsed from XML as the overlay text.

Logging in

To start creating an XML overlay, you must first log into the collection where these documents will be stored. Clicking the login button will log in the XML overlay generator using the selected database and collection from the login tab in the channel properties. Once logged in, the collection's index fields will be available in the field drop down menu.

Loading images and xml data

To load the desired tiff image for the document background, click the Load TIFF file button. This will allow you to browse to the location where the image file is stored and load it into the XML overlay. Similarly, to use a sample XML file to create the overlay, click the load XML file button. This will allow you to browse the storage location of the XML file and load it into the XML overlay.

Selecting fields for parsed XML data

It is important to note that unlike the COLD Overlay Generator, where the overlay text being mapped onto the image is provided as one single string of text, each individual xml field in the xml file must be mapped separately onto the image file, and each piece of data being mapped onto the image must correspond with a field from the field drop down menu. To accomplish this, not only will actual collection fields be used, but also rule fields must be created. These are fields which will have XML data attached to it in order for that data to be mapped onto the image; however the data will not be inserted into the database as indexed data.

To create a rule field, click the add button next to the fields drop down menu. Type in the name of the field and click OK. Once this is done, the new Rule field will display in the fields drop down menu. Again, in order to map any piece of the XML data onto the image, the data must be associated with either a collection field or a rule field. To remove a rule field or to remove data from a collection field, simply select the field from the drop down and click the delete button.

Tiff Mode and XML Mode

Tiff Mode

Clicking the Tiff Mode button will bring up the tiff view. Any data that has been set up using either a collection field or a rule field and has been defined in XML Mode will display on the tiff as an overlay. You can manipulate the position and appearance of the data several ways.

To move the data to the desired location on the form, simply click and drag the data to the desired location. You can also use the left, right, up and down arrows or simply typing in a value in to the up and down text boxes.

The font can also be changed by clicking the Font button. From here you can change the font type, style and size. If you are using a color tiff, you can also change the font color.

XML Mode

Clicking the XML mode button will bring up the XML mode. This will display the XML being used for the template. This mode is where you tell the XML template what data to use for the Collection and Rule fields.

To assign data to a field, select the field from the fields drop down box, then simply click on the appropriate data in the XML. With the data selected. The Key, Name and Value boxes will be populated from the data selected. If there is a special rule needed, this will need to be manually entered into the Rule text box.

page_revision: 185, last_edited: 1229467291|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License