|
Table of Contents
|
OptiDoc™ Architecture
Network Layers:
The following diagram shows how the network layers in a typical OptiDoc™ installation fit together. It is noted that a typical OptiDoc™ workstation must have two connections inside the network. The first connection is a database connection, and is usually ODBC, but can be any other SQL database communication technology, such as ADO. This connection provides a flow of embedded transact-SQL back and forth from the workstation and the database server. The second connection is a file transport mechanism that is used to physically obtain and store files, or binary large objects, BLOBs, into and out of the mass store, such as a jukebox or a magnetic RAID. When an OptiDoc™ workstation supports full text searching, a third connection type must also be maintained. This connection provides advanced query functions within the full text indexes.

Modular Search Engines:
When an OptiDoc™ workstation issues a search command the result is a list of global unique identifiers, or GUIDs. Typically with SQL server, a query is formed that joins the document base table with the collection table that matches the search criteria, and adjusts for tokens, and matching document securities. Finally, the result set is expanded into index fields, and physical image data as needed. When an OptiDoc™ workstation issues a full text search command, the initial query is made against the full text indexes instead of the SQL table indexes. Once the list of GUIDs is obtained from the full text engine, the same functions within the schema of the workstation expand those GUIDs into index fields, and physical image data as needed. The software layer that dynamically expands GUIDs into fields and documents gives OptiDoc™ a modular ability to support any search engine or future technology.
Full Text Indexes:
Full text indexes are stored on the file system in a special compiled format. Multi-user full text data insertion, updating, and searching are supported through a system of interlocking files. Each workstation with full text search capabilities must have a file system connection to the full text data files. This additional connection is the third connection that must be maintained by a full text client. The data file set, or full text index, is specified using a fully qualified UNC path. This path is configured on the server, and is transparent to the client. However, the client must have access to this network path. There is always one full text data set per collection. A management utility application is used to initialize, compact, and rebuild the full text indexes on a collection by collection basis.
When an OptiDoc™ workstation issues a full text query, the query is first submitted to the full text index. The index then responds with a list of GUIDs. The workstation then passes these GUIDs to the SQL server and expands them into documents and SQL index fields. Therefore, there are three steps involved in a full text search.
- Submit full text query to index
- Ask SQL server for document paths and data
- Pull documents, BLOBs, from the storage media
When TIFF documents are inserted into the full text index an OCR recognition process is typically used to capture the ASCII text on all of the pages. In addition, the SQL index field names and values are appended to the body of the full text data. When a document is submitted from a full text workstation, two actual submissions are performed. The first submission is the full ASCII text that is obtained from the document into the full text index. The second submission is to the SQL database and storage location.





