OptiDoc™ Report Server
User Manual
OptiDoc™ Report Server
Overview:
This application program accepts input in the form of ASCII mainframe report documents, PDF report documents, or OptiDoc IRF documents, and processes them according a set of programmable rules and inserts the resulting documents into an OptiDoc database. This program supports multiple simultaneous channels of processing, and can take full scale advantage of multiprocessor environments. A channel of processing is contained within the context of a document window within the MDI framework. The channel is the main worker unit of processing, and one or many channels are controlled by this application. This application makes use of the GNU Ghostscript RIP (Raster Image Processor) for imaging PDF forms into TIFF format. Within a channel, processing activity is divided from top to bottom, so that each channel is responsible for each step that it must make, as well as the cleanup for any mistake that it might have made, before the channel can decide what to do next. Multiple channels handle their own threads of activity in parallel, taking full advantage of the scaling powers found in multiprocessor environments. Using a very robust threading model, each channel is responsible for its own state, actions, transactions, maintenance, and cleanup. This application has the ability to function in poor network environments where database connections and file system connections are not reliable. For example, this application provides an option to automatically attempt that a channel attempt to log back into a database if the database gets disconnected from the network at intervals. As another example, this inserter system will always attempt to find, mount, and remount network volumes if they have failed, or somehow become disconnected.
System Requirements:
This application requires a Windows operating system and a minimum of 128MB of available RAM in order to operate properly. When a channel has been configured with a work mode that converts COLD documents into TIFF documents with overlaid text, the RAM memory demands are significantly increased.
Definitions:
Channel: A channel is contained and managed by a document window. To create a new document window, or channel, for processing click on the “Create New Channel” icon in the toolbar, or use the same functionality directly from the menu. A new channel appears as a new window inside of the application. Multiple channel windows can be resized, moved, tiled, and cascaded within the main application window. For example, one channel can be assigned to process PDF forms that appear in a drop folder, and another channel can be assigned to process COLD downloaded as text from a mainframe. A channel can look for different types of work to do. Each type of work that a channel can look for can be checked on or off from its properties. A single channel can look for multiple kinds of work to do and act accordingly. In the case where multiple work types are selected for a channel, the work types are selected in a round robin fashion, skipping over types of work when no work of that type exists.
Channel Properties: A channel is associated with a large set of properties. Setting up the channel properties configures the channel. To access the properties of a channel, select a channel from the MDI interface and then press the “Properties” button in the toolbar, or choose the same operation from the main menu. The channel properties are dialog is to configure all aspects of a particular channel. Each tab inside of the channel properties dialog box provides settings for each phase of a particular process from the beginning until the end. The tabs are arranged inside of the properties dialog box in linear time order of events. In other words, the first tabs represent decisions and properties that are needed at the start of channel processing, and the last tabs represent decisions and properties that are needed at the end of channel processing. Every property within every tab will be discussed further on in this document.
OptiDocXML: All TIFF files that are generated by this program contain an additional special TIFF tag number, 32933. This tag contains an XML description of the index data that was applied at the time that the TIFF file was inserted into the database. This tag number is currently under reservation by Adobe Corporation, the keepers of the TIFF specification. Advanced Technology Services has utilities and tools that work with this special TIFF tag. This TIFF tag is called the OptiDocXML tag. One possible use of this tag would be to enable the recreation of an entire database in the event of a catastrophe. With a rudimentary knowledge of XML, the contents of the OptiDocXML tag are self-explanatory.
<?xml version="1.0"?>
<OptiDocRecord>
<Collection>UAB_HR_Records</Collection>
<FileName>New Channel</FileName>
<OptiDocField>
<SQL_Name>FName</SQL_Name>
<Display_Name>Fname</Display_Name>
<Value>Khanolkar, Aaruni</Value>
<Type>CHAR</Type>
</OptiDocField>
<OptiDocField>
<SQL_Name>Idnumber</SQL_Name>
<Display_Name>ID Number</Display_Name>
<Value>1021062</Value>
<Type>CHAR</Type>
</OptiDocField>
</OptiDocRecord>
Parse File: All documents that require field information to be extracted and manipulated must have a parse file. Parse files are stored in a folder along with the application. The name of this folder is “Parse Templates”. Parse files have the extension .DAT and appear inside of this folder. A parse file contains binary data that represent rules and configuration parameters that describe how to handle the extraction of text information from a document. A parse file can be used to extract data from PDF documents, mainframe documents, COLD documents, or any document that can be organized into an ASCII file. Technically, the parse engine is an LL1 language parser that interprets command strings to produce field output. The parse engine contains many commands and new commands are easily incorporated into the architecture. Parse files can be created by using the parse file generator and parse file editor that is built into the high level version of this application program. You can tell what version of this application that you have by looking in the about box. If the about box tells you that “generators are active”, then you have the high level version, and you can create your own parse files. Parse files can be imported into the high level or the simple version of the application.
Overlay File: When COLD documents are created as the output from a channel, the channel provides an option to apply an overlay, or render the background image, for each document as a TIFF, at the same time that it is inserted into the database. As an alternative option for backward compatibility there still remains the ability to insert the text as a plain COLD document, and have the workstations apply the overlay at a later time. There are advantages and disadvantages to each technique.
When the overlay files are rendered into TIFF files before they are inserted into the database, the problem of updating all of the client applications with a new overlay is avoided, even if the input data changes. This can become a problem if, for example, you have an overlay named W2, that changes format from year to year. In the case where the overlay is applied at the workstation, a new overlay would need to be built and deployed to all workstations every year when the new data comes into use. Keep in mind that every different overlay must have a different overlay name in order to differentiate between the W2 forms from each year. This older methodology requires changes at each client workstation as well as changes to the server application. However, this method is backwards compatible with the older OptiDoc fat client, and therefore this method is sometimes preferential because, when older style templates are already deployed, no changes need to be made at the client workstations. However, if the prefered method of rendering before insertion is chosen, then when the W2 form data changes, the system administrator simply needs to change the overlay at the insertion application at the server, and all of the newer and web based OptiDoc client's will automatically update and function properly. This is because when using the preferred option, the client application does not need to match a template with a COLD file, or anything else, because the newer style of template is object oriented in nature, and will therefore always work without making changes at the client workstations.
Overlay files are found in a folder along with the application. This folder is called “Overlay Templates”. Overlay files are composite binary data files that contain the background image TIFF as well as all of the other parameters that are necessary to create the overlaid image of a parsed mainframe document. In this sense, overlay files are complete objects themselves, and do not need any other files, or pointers to files, or names, or references of other files, in order to function. The background image in an overlay can be a 1-bit binary black and white TIFF, or it can be an 8-bit greyscale TIFF, or it can be a 32-bit RGB TIFF. The greyscale and color TIFF images take up considerably more space than the 1-bit black and white TIFF background images. However, a background overlay that is color allows you to have color logos or colored text rendered directly into the overlay. Overlay files are intended to apply a background image to a single page, or a single page form, so that the background is permanently merged with the text and stored in the database as a TIFF at the time of insertion. By changing an overlay file, the output of a channel is also changed. The designer of the channel reserves the option to use overlay files or not.
When the overlay is not applied, then an OptiDoc formatted COLD file is inserted into the database. An overlay background can be a G4 compressed black and white TIFF, a grayscale TIFF, or a color TIFF. An overlay image must be a TIFF file, and can only be one single page in length. When the background TIFF image for an overlay is in color, then the color of the overlaid text can be configured as an additional parameter. Sometimes resulting TIFF overlays look very nice when the form data is rendered in black, and the form contents is rendered in red, or in some other bold distinctive color.
Sometimes, when a mainframe file is being picked up from the input folder and converted into a COLD file, or rendered into a TIFF image, the mainframe file is not well formed. For example, the end of each page should always be marked with a standard 0x0C, end of page, character. Sometimes the mainframe file has placed this character in the correct locations, and sometimes it has placed them at random, and sometimes it has ignored them altogether. Sometimes mainframe files are created that contain random zero characters in the file. This is a violation of the definition of an ASCII file, but it still happens. Sometimes the files that come from the mainframe are not parsed out properly to begin with, as is common with COBOL language output. Inside of the Parser tab, there are options for finding page breaks by using the standard page break character, or using a token search and replace algorithm. There are also options for fixing up all of the carriage returns and line feeds, so that they will work properly with older and newer Microsoft software. This option also replaces all zero characters with space characters, insuring that the file will be a valid ASCII file. There is also an option for expanding column one for unprocessed COBOL markers. Unprocessed COBOL markers show up in the output file as random "-" characters and "0" characters and "1" characters in the first column. If you turn the COBOL marker processing option on in the Parser tab, then these characters will be expanded into the expected visual ASCII format.
Log: The only view that is associated with a channel is the log for the channel. The log for the channel contains date stamps and messages, and icons that indicate if a message is for information, or represents an error condition. For example, a light bulb indicates information, a green checkmark indicates a critical step, and a red triangle indicates an error. The log can be exported to a text file or cleared by choosing a menu item. The log is persistent. The log will reappear, preserved and in order on the screen, even if you launch and quit the program repeatedly. The log automatically limits itself to 1000 (the default) log entries. The oldest entries are removed after more than 1000 (the default) entries enter the log.
Stability: File system folders are monitored in order to locate work for processing. A file must be stable in order for it to become a candidate for channel processing. Stability is defined by the file size, the file modification date and time, as well as an exclusive share lock, all be maintained by a candidate file for several seconds. If any of these parameters varies during the examination interval, then the file is considered to be in flux and is passed over as a candidate for channel processing. If a file passes the stability test, then it is passed into the channel for processing. Slow files systems, overlapping file systems, and Internet file systems, all make the reliable selection of stable files for processing a critical application feature.
Timeouts: This application program uses several external processes in order to do its job. One is an external process which converts PDF documents into TIFF documents. Another external process converts PDF documents into TXT documents. The maximum amount of time that a channel will wait for an external process to complete its work is called a timeout. This amount of time is processor and job dependent. You will have to test and setup your timeouts according to how fast the workstation computer is, and also how large your average job size is going to be. The time out tab in the preferences dialog provides an environment that you can use to experiment to obtain good values for your channel timeouts. For example, a channel that regularly handles processing 5000 page documents will need more time than a channel that only handles the processing of single page documents.
Consoles: This application program uses several external processes in order to do its job. These external processes run as console applications in coordination with the channel that invokes the process. You can choose to view these console windows or not as the channel attempts to do its work. These additional console windows can be used to view the progress and output of the console application in a window. Sometimes this is a useful thing to do in order to figure out what is actually happening within a console application. The user can choose not to view the console windows also. When all of the console windows are hidden, the application program gives a more integrated appearance.
Working Folders: This application automatically creates and manages working folders for each channel dynamically at the moment in time when the channel needs to use a working folder. Each channel manages its own independent set of working folders, so there can be no chance of one channel using a temporary name or overwriting a file that was created by different channel. There is a temporary input folder and a temporary output folder for each named channel. These folders are dynamically created inside the folder along with the application. Stable files are copied into the temporary input folder before processing begins. When processing has completed all temporary output files are held in the temporary output folder. When a channel completes a processing job, it cleans up its own input and output temporary working folders. When you delete or rename a channel, new temporary folders are automatically created, but old temporary folders are left alone. If you delete a channel, or rename a channel, then you must remember to delete the temporary input and output folders associated with the old channel. The temporary input and output folders are named with a prefix that matches the channel name, and the suffix indicates if it is a temporary input, or a temporary output, folder.
Look for PDF Files, parse TEXT, and insert TIFF: This is one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform this type of processing. This particular sequence of tasks first monitors a folder for files with a PDF extension. When it finds a stable file of this type, it first copies the file to a temporary local location. Then it converts the entire PDF file into a multi-page TIFF. Once the PDF to TIFF conversion is done, the channel converts the PDF into a memory based ASCII array format that is specifically prepared for the parsing engine. Text parsing is the process where fields of index data are extracted from arbitrary, and inconsistent, units of textual information. Once field data has been extracted for a particular document, then the rules in the text parse engine are used to determine how many pages in the overall document relate to an individual document. Pages are concatenated together until the overall document is made ready. The number of pages in a specific document can be a consistent number, or the number of pages in a specific document can vary according to a rule. Once an output document has been prepared the OptiDocXML tag is updated, and the final product, a multi-page TIFF of the segmented input PDF document is inserted into the database. This mode of work is particularly useful for processing document reports delivered as PDF from Oracle and other large accounting programs into overlaid TIFF images files. The output TIFF images retain all of the visual layout consistency of the input PDF, the output TIFF images are entirely in the public domain, and the SQL data that is associated with the output TIFF image, is obtained using the parse rules engine.
COLD file format: This is a file format that is used to quickly locate pages and offsets in a multiple page text file. This is a deprecated OptiDoc file format. An OptiDoc COLD file has a small mini header at the beginning. This mini header is composed of the two ASCII characters ‘CC’. This can be used as a quick way to identify the binary contents of a COLD file. After the mini header comes a 4-byte integer value in Intel format that gives the number of jump table entries. Let us call this value N. After the value N comes the actual jump table. Each entry in the jump table is a 4-byte value in Intel format that gives the offset from the end of the overall cold header to the start of a particular text page. There will be either N or N-1 entries in the jump table, depending on how buggy the program was that generated the particular COLD files that you may be looking at. There should be N entries in the jump table, but be prepared to find N-1. The first jump table entry always has the value of zero. After all of the jump table entries, comes a zero byte terminated (c-style) string that is the name of the template that applies to this COLD file. The extension of the template name is ignored in all modern COLD file implementations. When writing a COLD file you should omit the template name extension. After the template name string is a 4-byte value in Intel format that provides the size of the font that the text is to be displayed in. Let us refer to this value as F. This value is ignored in all modern COLD file implementation because this property has become a part of the overlay. After the value F, are written into the COLD file the pages of textual information. The position of the file after the value F is the zero relative position for the jump table entries. Since the first jump table entry is always zero, the body of the first page comes right at this position within the file.
Look specific files, parse text, insert COLD: This is one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform this type of processing. This particular sequence of tasks first monitors a folder for files with a specified extension. The specified extension input edit field becomes available when this mode of work is selected. If you want the channel to pick up files with a TXT extension (the default) then enter ‘.TXT’ into the edit field. If you want the channel to pick up files with a LPR extension, then enter ‘.LPR’ into the edit field. Enter a ‘*’ in the edit field if you want the channel to pick up all files regardless of the extension.
When it finds a stable file of the specified type, it first copies the file to a temporary local location. Then it converts the entire mainframe file into a memory based ASCII array format that is specifically prepared for the parsing engine. Text parsing is the process where fields of index data are extracted from arbitrary, and inconsistent, units of textual information. Once field data has been extracted for a particular document, then the rules in the text parse engine are used to determine how many pages in the overall document relate to an individual document. Pages are concatenated together until the overall document is made ready. The number of pages in a specific document can be a consistent number, or the number of pages in a specific document can vary according to a rule. As an option for this mode of work a background overlay can be applied. If a background overlay is supplied then the text information will be superimposed onto a TIFF image of the background overlay. In this case a TIFF document is the output of the channel and the OptiDocXML tag is added, and the document is inserted into the database. If no background overlay has been chosen, then the text is gathered into an OptiDoc COLD file, and that is the final document that is inserted into the database.
IRF file format: This is an indexing file format that links up index data with a document. Older versions of IRF file format contained many sets of index data with many links to outside documents, and so there were many correspondences between a single IRF file and the documents to which it referred. Newer version of the IRF file format, contain exactly one IRF file per document, and the document to which the IRF file refers is assumed to be located in the same folder along with the IRF file. This change was made so that processing could be mare granular. Channels that are configured to process IRF files are backwards compatible with the old IRF format as well as the newer IRF formats. Here is a simple example of an IRF file document that shows the different sections and the related document. This sample should be self-explanatory.
BEGINBATCH
BEGINSETTINGS
IMAGE_ONLY
IMAGE_TYPE TIF
ENDSETTINGS
BEGINHEADER
Last_Name URSULLA
First_Name GREEN
SSN 409494864
ENDHEADER
BEGINIMAGE
d1j92kiq.TIF
ENDIMAGE
ENDBATCH
Look IRFfiles, insert documents: This is one of the modes of work that a channel can do. Check the box in the work modes checkbox list in the work tab of the preferences dialog to instruct a channel to perform this type of processing. This particular sequence of tasks first monitors a folder for files with the extension ‘.IRF’. IRF files are files that are generated by older OptiDoc products. The file format is basically a text file that contains a carriage return delimited list of field names, and a pointer to a document. The document can be of any type. When an IRF file is found then the index information is extracted from the file, and the referring document is inserted into the database. This mode of work has been provided so that existing products that produce IRF files can be integrated into this application program.
Opening the Report Server
The first time that the Report Server is opened, the following message will appear.
The reason for this is that the Report Server is looking for its preference file which has yet to be created. To create the file, simple close the Report Server. The preference file will be written to the same directory where PDFINSERTER.exe has been installed.
The Menu Bar
File Menu
Below are listed the options available under the file menu.
Note: Until at least one channel has been created, the only options available are “New Channel” and “Exit”.
New Channel: Creates a new Channel in the Report Server
Delete Channel: Gives the administrator the option to delete the selected Channel.
Edit Channel: Opens the Channel properties for the selected Channel.
Start Channel: Gives the command to the selected channel to start monitoring the work folder.
Stop Channel: Gives the command to the selected channel to stop monitoring the work folder.
Clear Log: Clears the selected Channel’s log
Save Log: Saves the selected Channel’s log to a text file
View Menu
Toolbar: Toggles the Toolbar on and off. A check indicates that the toolbar is on.
Status bar: Toggles the Status Bar on and off. A check indicates that the Status Bar is on.
++Window Menu
Cascade: Cascades Channel windows so that the title bar for each Channel is viewable
Tile: Tiles each Channel window within the Report Server
Arrange Icons: Arranges any minimized Channels at the bottom of the Report Server
Note: Any Channels created will also be listed in this menu. A check mark will be displayed beside the selected Channel.
Help Menu
Technical Documentation: Displays the Report Server help file.
About Report Server: Displays the product information.
The Tool Bar
There are five buttons available with the tool bar.
Note: Until at least one Channel is created, only “Create New Channel” and “Info” are available.
Create New Channel: Creates a new Channel in the Report Server
Edit Channel Properties: Opens the Channel properties for the selected Channel.
Start Active Channel: Gives the command to the selected channel to start monitoring the work folder.
Stop Active Channel: Gives the command to the selected channel to stop monitoring the work folder.
Info: Displays the Report Server help file.
Editing the Channel Properties:
Setup Tab
Enter the name of your channel:
Type the name of the channel into this edit field. The name of a channel can be anything that helps you to remember what the channel is used for.
Choose a background color for your channel:
Click the color bar to choose a color for the channel. The background of the log view for the channel as well as the background color for the message text that appears inside of the log view will become this color. This is just a nice way to differentiate the different channels on the screen using a visual technique.
Set this value to limit the amount of log file entries that are stored:
Enter a number into this edit field. This is the maximum number of log file entries that will be stored by the log view for the channel. The oldest entries will get removed after this maximum number of entries is accumulated into the log view for a channel.
Work Tab
Specify the full drive path or UNC path of a folder or network folder to monitor:
This channel will look for work in the folder that is indicated here. You may enter a local drive path or a network UNC path into this edit field. This path must resolve in order for the channel to look for work in this location. A file must be stable inside of this input folder before it will be considered to be a candidate for work.
Mode(s):
Each channel can process several modes of work at one time. You can check on or off a particular mode of work by checking or clearing the checkbox in the scrolling list of available work modes. Each mode of work is described in detail in the definitions section above.
(Specify)
When one of the chosen work modes is “Look for specific files, parse text, insert COLD” then you can specify the extension of the file(s) to look for. Standard DOS wild card characters are supported.
How often should this channel look for work to do?
When a channel runs it has a choice of settings for how often the input folder should be polled for work. It can be polled once per hour, at the top of each hour. Or it can be polled once per day, at a specified time. Or it can be polled in a constant manner. Constant polling requires more system resources because the channel is constantly scanning the file system for work to do.
Specify path to GNU Ghostscript :
In order to process PDF documents into TIFF documents GNU Ghostscript must be installed on the workstation. The full path to the executable console application gswin32c.exe must be provided in this edit field. Currently the most updated version of GNU Ghostscript is version 8.14. This module is launched and managed by a channel that needs to perform operations on PDF files.
Show Console Windows for testing conversion engines:
When PDF to TIFF conversion is taking place the channel manages an external console process. If the console window is turned on then you will see a black console window pop up to the top of the screen every time the channel invokes that particular external console process. If the console window is turned off, then you will not see any console window while the channel manages the process. Being able to see the console and the output to the console window can assist you in debugging problems that might arise. This section gives you the ability to turn on or off the console windows for the PDF to TIFF conversion as well as the PDF to TEXT conversion.
Timeouts Tab
PDF to TIFF Maximum Timeout Seconds:
This is the maximum number of seconds that the channel will allow the external console process that converts PDF documents into TIFF documents to run before shutting it down. This timeout value is very specific to the size of the jobs being run on the channel and the speed of the workstation that this application program is being executed on.
Test:
If you place a valid file in the input folder for this channel and then press the “Test” button on this properties tab, the console application will attempt to perform the specified conversion within the specified timeout period. This dialog tab provides a way for you to determine a valid timeout setting by simply testing the time it takes to process a set of representative documents. You can temporarily enable or disable the console output of the external processes while you are testing to determine a good value for the process time out.
Show Console:
This is a temporary way to view or not view the console window while you are testing timeouts. These settings obtain their initial values from the console settings in the work tab, but can be changed at will with out affecting the real console setting values.
PDF to TEXT Maximum Timeout Seconds:
This is the maximum number of seconds that the channel will allow the external console process that converts PDF documents into TEXT documents to run before shutting it down. This timeout value is very specific to the size of the jobs being run on the channel and the speed of the workstation that this application program is being executed on.
Test:
If you place a valid file in the input folder for this channel and then press the “Test” button on this properties tab, the console application will attempt to perform the specified conversion within the specified timeout period. This dialog tab provides a way for you to determine a valid timeout setting by simply testing the time it takes to process a set of representative documents. You can temporarily enable or disable the console output of the external processes while you are testing to determine a good value for the process time out.
Login Tab
DSN:
This is the name of a system DSN, or Data Source Name, that the current channel will use to establish the connection to the database.
User:
This is the name of an OptiDoc user account that the current channel will use when it establishes a connection to the database and downloads all of the permissions and collection information for that user.
Pwd:
This is the password for the specified OptiDoc user account.
Collection:
Once the login has completed, the collection dropdown list is filled in, and the current collection is selected from the dropdown list. Choose a collection from this list to tell the channel the name of the collection that the channel will be working with.
Login and select collection:
Click on this button to test the DSN, user name, and password that you have just entered into the login tab of the properties dialog. This action may take a few minutes to complete, depending on how many collections and permissions the chosen OptiDoc database has.
ODBC Timeout:
This is the maximum amount of time in seconds that any particular SQL transaction is allowed to take. If any SQL transaction takes more time than this specified amount, then an ODBC timeout error will occur and the connection will be rejected. The default time out value is 15 seconds.
Creator ID:
This is a marker value that is inserted into the main database table that associates a particular application, or a particular workstation and application with an identification number. This ID is used with every record that is inserted with this application, and can be unique to each channel. This feature can be used to identify which application has created a particular record, or which workstation, or which channel. The default ID for any channel in this application is 2005.
Update IMPORT_LOGS Table:
This checkbox activates a powerful and detailed debug trace function for the insert transaction. When this feature is active every critical section of code that is executed with regard to entering a record into the database is recorded, and every retry attempt is recorded, and every failure and reason code is recorded. The byte size of files are compared for equality as they leave the system and enter the storage subsystem. This entire record of events is saved into an SQL database table that is named "IMPORT_LOGS". If the "IMPORT_LOGS" table does not exist, then it is automatically created.
Parser Tab
Template:
This drop down menu shows all available template files that exist inside of the "Parse Templates" folder that exists inside of the application folder. All licensed versions of this software can import a parse template. The parse template file contains the set of rules that are fed into the parse language interpreter to determine what database field shall be filled in with what values of text that are derived from the input mainframe pages. Use this dropdown menu to choose the parse template that you wish to assign to this channel.
Note: None of the controls in this pane will be available for use, unless a template has been chosen.
To create a new parse template, first choose the name "NEW TEMPLATE" from the dropdown list. This action will bring up a dialog box that will ask you for the name of the new parse template. This name can be any descriptive name that you desire. Once you have given your parse template a new name, the parse template editor will open up and expand to take over the entire screen. If you want to edit an existing parse template, first choose the name of the parse template from the drop down and then click on the "Edit…" button that is located immediately to the left of the "Import…" button. This action will bring up the parse template editor. If you do not have the “editors enabled” version of the software, then you can only import parse files into the system. You must have the “editors enabled” version in order to create or edit parse files. You can install parse files by simply copying parse files into the “Parse Templates” folder. You can remove parse files by simply deleting parse files from the “Parse Templates” folder.
Import Template:
Click this button to import a parse file into the “Parse Templates” folder. This action simply copies the parse file into the parse templates folder. To export a parse file, simply copy the parse file out of the “Parse Templates” folder. Please keep in mind that a parse file is a binary file, and if you send the file using FTP to another user, the FTP transmission must be done in binary mode.
Edit Template:
When you click on the edit button the entire screen is taken over by the parser generator user interface. Please refer to appendix A at the end of this document for details regarding the use of the parse template editor and the parser language itself.
Page Break Detection:
A stream of ASCII text may or may not have reliable and normal page break indicators. Typically, a page break is indicated in a stream of ASCII characters by introducing the character with the hex code 0x0C. The translation of this character according to the ASCII tables tells us that this character is intended to represent a page break. This is the generally accepted standard. Under most circumstances the page break character is the best and most useful way for this application program to understand when a break between pages has happened. If the input text stream has come from the integrated PDF to TEXT conversion engine then standard page break characters will always be included at the proper locations in the text. However, many mainframe applications do not place page breaks at regular locations and sometimes place erroneous page breaks inside of the text stream where they do not belong. A common mistake made by mainframe report programmers is that they often include the zero 0x00 character into the output of a text stream. Obviously, this is an illegal character in an ASCII stream, but it happens anyway. The quality of mainframe reports is subject to the abilities of the mainframe programmer. Sometimes mainframe programmers can create a proper ASCII text file, and sometimes they cannot. This application program does its best to work with badly formed and poor input files. In the case where page breaks are not properly indicated by page break characters, then the user of this application can specify a string of characters that will be replaced with a page break character. These values are used internally by the text processing engine to preprocess the input text so that proper page breaks end up in the proper places.
Standard Page Break Detection:
Typically an input text stream is divided up into pages by scanning for page break characters. Good input documents will have page break characters in proper locations. Use this option for processing input documents that have standard page break characters setup properly within the documents.
Scan For Characters:
If the input stream has invalid, non-existent, or randomly distributed page break characters, then this option may allow you to setup proper page breaks. You must find a sequence of characters or a hexadecimal sequence of characters that appear at the end of each document, and use such a sequence to scan for and create a valid page break indicator. Multiple marker sequences can be used by entering more than one marker sequence into this edit field with a comma separating each sequence. If the body of the sequence includes a comma, then enter two consecutive commas, and they will be treated as one single comma within the marker sequence. This application program will look for the specified sequence of characters and replace a page break character into the stream where the specified marker sequence was. Remember that you can have more than one marker sequence if you separate each marker sequence with a comma. This option removes spurious page break characters as well as placing proper page break characters into the marked locations. Hexadecimal strings of characters can be used, so that new line characters, form feed characters, and other nonprintable characters can be used as the page break marker sequences. For example, the hexadecimal marker string 0x5041474520303031 represents the marker string 'PAGE 001'. Please note that hexadecimal marker strings must be an even number of characters in length and must start with the prefix ‘0x’. If you do not start the marker string with the prefix ‘0x’ then the marker will be used in plain text form. For example, the marker string FReD simply represents the marker string ‘FReD’. Marker strings are always case sensitive. As another example, the marker string 0x5041474520303031,FReD will replace the words PAGE 001, or the words FReD, with a page break character.
Count Pages:
When the ‘count pages’ check box is checked in the parser tab of the properties dialog for a channel, the parse engine will count pages inside of the larger overall document in order to split the overall document up into smaller database documents. It is often true that mainframe reports have exactly the same number of pages in the overall document for each of the database documents contained inside of the overall document. For example, a financial check application report may always contain exactly three pages per database document. The first page might be the image of a check, and the following two pages always contain some kind of a forms based supporting documentation. Therefore, if a report is being processed that always contains three pages per document, then the parser can figure out how many pages there are per document by simply counting up to three pages, extracting those three pages, and using those three pages as the document to insert into the database.
First Page Detection Rule:
When the “first page detection rule” check box is checked in the parser tab of the properties dialog for a channel, the parse engine will apply the specified rule to determine how many pages are in one of the database documents. It is often true that mainframe reports have a variable number of pages per document, and the number of pages per document changes dynamically throughout the overall report because of the amount of information that is included in a specific document depends on the amount of information that is available for that document. It is for this reason that Human Resources reports tend to be variable in the number of pages that they process. This is because Human Resources documents tend have a variable number of attachments. This problem is overcome by specifying a special parse language rule that is used to detect the first page of the document. This application program then reads pages out of the overall input document and continues to accumulate those pages into a database record until the first page of the next document is reached.
Essentially, when the first page of the next document is recognized by the parse rule engine, the application backs up by one page, and inserts the remaining database record. Using a rule for detecting the number of pages per document will work with mainframe reports that have a variable number of pages per database record. However, in this case, the mainframe report is required to possess some kind of a static title or header or marker that can be utilized in order to recognize the first page of a document. Often times the header will have a page count that can be compared against the number “1”.
The following is an example of a sample parse rule that will break input mainframe documents apart into database records of variable length based on the page number section in the header.
First Page Detection Rule: LOOKFOR 'Page#' SKIP 1 GET 3
Is Equal To: 1
This rule will scan the input page for the case sensitive text “Page#” and then it will skip forward by one character, and then it will grab out the next three characters and compare them to the character “1”. Leading and trailing spaces are not a concern for this comparison. Therefore, output documents will be appended together until the rule becomes true. When the first page of the next document is located, then the application program will back off by one page, and insert the previously collected set as a single document with index information into the database. It is often times easy to create the first page rule using the parser rules generator user interface and cutting and pasting the rule back into these fields. This kind of a rule is easy to create. The parser rule language is very flexible, powerful, and extendable, and the user interface provides the ability to access to only a small piece of the power of the text parsing engine.
Preprocessor:
When mainframe documents are being processed, several optional preprocessors are available. These preprocessors take effect after the raw page data has been obtained from the overall mainframe file, and prepare or interpret the contents of the page. These features provide several options for making mainframe files compatible with other expected file formats, such as COLD. When you are building a parse template, you must define rules for finding the data that will be used for the database fields. An important concept to remember is that the format of the text that you see when you are using the parse template designer is the same as the format that is eventually inserted into the database. This has been done to make the parse template WYSIWYG, and more intuitive, and to allow for relative as well as absolute coordinate positioning. For example, if your input mainframe file contains COBOL markers, and you choose to expand these markers, then the text that you will build your parse rule for will have COBOL markers expanded. As another example, consider the case where you have no anchor to attach a "find" type of parser rule. This might happen if the mainframe file contains only "filled in" data, with no other markers or tags, that in the past were printed on preprinted forms, such as UB92's. In this case, only absolute positioning can be used to locate field data. Relative anchors using the "find" type of parser rules are always preferred; however, the WYSIWYG approach allows for both relative and absolute positioning.
Fixup CRLFs:
Sometimes the carriage return and line feed characters from a mainframe file are incompatible with COLD documents. If you choose the fixup the carriage return and line feeds, then each set of one carriage return, or line feed, or pair of carriage return line feed, or pair of line feed carriage returns, are converted into the normal form of carriage return followed by a line feed. This is necessary so that document viewers that use a particular "feature" of the Microsoft API CEdit control will work, and so that when the page is drawn another particular "feature" of the Microsoft API DrawText function will also work. This feature might be necessary for backwards compatibility with older applications that make certain assumptions about the exact functionality provided by the Microsoft APIs. This feature also fixes the problem where random zeros have been injected into the input mainframe file. All zeros are replaced by space characters.
Interpret COBOL Markers:
Sometimes mainframe data will come from a COBOL program. It is often the case that a column on the left hand side of the document is a control column and is not used to provide text information, but instead, is used to provide formatting information for the text. This feature expands this COBOL control column into text data, so that the formatting of the COBOL file will become a visual part of the ASCII text, even if the COBOL programmers have failed to interpret this control column themselves. For example, a "-" in the first column means two carriage returns and line feeds are to precede the following data, and does not mean to print an actual "-" in column one. Zeros and ones also have different meanings and interpretations when they appear in the first data column in a mainframe COBOL dump file.
Skip First N Lines:
Sometimes mainframe data will contain extra header data at the beginning of the overall document. This option strips off the specified number of lines from the first page of the overall document. Turn this check box on if you want to remove lines of data from the start of the overall mainframe document.
Overlay Tab
Apply the Specified Overlay: (COLD->TIFF) into database:
If you have a work mode that creates COLD documents as an end result, then you will have the option to turn on and apply an overlay file to the COLD document, or to simply insert the COLD document. These options give you the ability select and apply a background image where text is drawn into the image at the proper locations, much like filling in an empty form. The final output is a TIFF document with a background, possibly of a form, and the text is filled into the background as though the document had been filled out with the text information and then scanned into the system. A TIFF background file can be black and white, grayscale, or even RGB color. If you choose to work with color TIFF background files, then you can also control the color of the text that is written into the final document. Please note that the visual appearance of a black and white or grayscale form that is filled in with red or blue text is very nice for customers to look at and work with. Overlay output files are always compressed. Black and white overlay TIFF files are compressed using the standard G4 compression scheme. Grayscale and color overlay files are compressed using a well known Macintosh packbits compression algorithm. A single page 8.5 X 11 300.00 dpi color overlay file is about 800K after packbits compression. This is compared to a single page 8.5 by 11 300.00 dpi black and white overlay file is only about 70K after G4 compression. Therefore, the tradeoff for color is of some consequence.
Import:
Use this button to import an overlay file into the “Overlay Templates” folder. You can simply copy the overlay file into the “Overlay Templates” folder in order to import the overlay file. An overlay file is a binary file, and if it is transmitted over an FTP channel, the channel must be set to binary mode in order for the transmission to be successful.
Edit:
When you click on the edit button the entire screen is taken over by the overlay generator user interface. Please refer to appendix B at the end of this document for details regarding the use of the overlay template editor.
Overlay Name: COLD in database:
If you have a work mode that creates COLD documents as an end result, and you want to insert actual OptiDoc COLD documents into the database, then use this option. When this option is in effect, each COLD document that is inserted into the OptiDoc database will contain the specified overlay name. The specified overlay name is used by some client applications to convert a text based COLD document into a form, or an image with a background.
++Opticapture Tab
This tab controls the setup of the optionally available OptiCapture PDF417 coversheet processing module. This tab is only available if the separate OptiCapture DLL has been purchased and installed within the Report Server application program. An OptiCapture channel is a special kind of channel that monitors a one drop folder and is able to insert documents into different collections. Essentially, the required PDF417 barcode coversheet field "collection" aims the OptiCapture module at a chosen collection. The OptiCapture module also supports "bursting" of multi-page TIFF documents that have multiple PDF417 coversheets into separate individual documents.
Collection Map:
The collection map is a drop down list of available field mapping templates for processing PDF417 barcode forms. All templates must be entered and updated manually. The name of a template name must be the name of the collection that the map is associated with. The contents of the map show how incoming fields in the PDF417 barcode are mapped onto SQL fields in the database collection. Please note that there should always be exactly one map per collection.
To add a new template, enter the new template name into the “Collection Map” drop down menu and click the “Add” button. To change a template name, select the desired template, change the name, and click the “Change” button. To delete an existing template, select the desired template and click the “Remove” button.
Note: The name of the Map Template must be identical to the corresponding Collection Name.
Fields List:
The “Fields” list displays the mapped fields of the selected Map Template. The left hand column displays the fields coming from the PDF417 barcode and the right hand column displays the related SQL fields in the target collection. A PDF417 barcode coversheet is required to have the "invisible" field named "collection" that gives the name of the template to use.
The “From” and “To” fields:
When creating a new template, these fields are used to define which PDF417 barcode field relates to which collection field. Once the fields are entered, click the “Add” button and the fields will appear in the Fields window. To make changes to the field names, highlight the desired row; make the appropriate changes and the click the “Change” button. To remove a field name mapping, highlight the appropriate row and click the “Remove” button.





