Electronic Records Research 1997: Resource Materials

Download 47.09 Kb.
NameElectronic Records Research 1997: Resource Materials
A typeResearch
manual-guide.com > manual > Research

Electronic Records Research 1997: Resource Materials

Compilation Copyright, Archives & Museum Informatics 1998

Article Copyright, Author

Frequently Asked Questions:

by Diane Hopkins
This FAQ is primarily intended for new IRIS staff who are training to serve as Information Assistants or Imaging Coordinators. It is not yet cleared for publication on the Intranet.
Questions are organized loosely by category. Some questions might easily fit into more than one category, but they only appear once:

  • Overview

  • Advantages

  • Searching, Retrieval, Display

  • Profiling

  • Technology

  • Records

The on-line version of this FAQ at N:\IRIS\DOC\FAQ.DOC will be updated on an ad hoc basis as new questions and answers are identified. The best way to use this document is to search for a word or topic using the EDIT-FIND commands in WORD.

A brief description of IRIS appears in the ITS NEWS! on the Intranet at

. Parts of this FAQ ***bracketed by asterisks*** are extracted from that web page.

What is IRIS?

***IRIS is an electronic filing, retrieval and storage system. The system is designed to replace the manual filing and storage of papers and documents. It introduces staff to scanning and optical character recognition (OCR), and the concepts of document profiling. Processes and procedures are developed to capture finished paper documents in electronic image format. Existing document flow procedures to and from the division are analyzed and modified to ensure the capture and completeness of the collection. The images generated (as well as the Bank Reports from 1990) are made available to staff through IRIS on the Enterprise Network (EN).***
***IRIS uses a practical example of a "Fileroom" for organizing documents. Documents are stored in an electronic Folder, inside a Drawer, within a File Cabinet, located in a Fileroom. Besides this hierarchical model, descriptive document control fields are maintained in a relational database. Documents can be found by (1) looking around in the Fileroom, (2) performing a search using an index (Control Search), (3) searching the text of documents (Content Search), or (4) searching for the title of a document (Label Search). ***

Overview continued
What is the implementation strategy?

*** The purpose of IRIS is to provide an electronic information and storage retrieval mechanism to improve access and availability of "finished" documents to staff at their desktop. As part of the implementation process, the File Improvement Program (FIP) institutes a common records management system across units. To ensure a smooth transition to electronic document management, business unit staff are assisted in organizing existing paper files to mirror the future electronic filing system. In order to implement IRIS, ITS analyzes the business unit's requirements and functions, analyzes the document flow processes, and designs a customized scan station to ensure the capture of relevant documents and their attributes. ***
As of August 1996, IRIS was ***operational in LASLG, EA1IN and AF2PE. Other IRIS projects that are under development include MIGA, GSDPP, CSH, PEN, OPR, SEC, LASHC, CAPOC and CAPPF. ***


Is IRIS better than using the paper files?

***In transitioning to an electronic environment, IRIS offers significant advantages over traditional manual systems:

  • reduction in staff time spent filing and retrieving documents;

  • reduction in the loss of key supporting documents;

  • reduction in duplicate files;

  • streamlined document flow procedures;

  • ability to copy text files to disk for work while outside HQ;

  • ability to access information at the desktop and print locally***;

  • more than one staff member can access the same document at the same time --if the Task Manager is not available, other staff can still retrieve the document;

  • instead of ordering a complete copy of the entire report, staff can select only the portion of a document actually needed, reuse it or forward it to someone else.

For more information on this topic, see ‘imageBank’ an interview with Flavia Fonseca published in the Change Bulletin, 1:4, 17 June 1996.

How does IRIS differ from other text retrieval systems?

***A unique strength of IRIS is its ability to perform text searches, even when the quality of the text is poor, such as uncorrected OCR-ed text. IRIS uses a pattern recognition technique to search and locate similar "patterns." The results can then be ranked according to how closely they match the search word or words.***

Searching, Retrieval, Display

Why can’t I see a new document at my desk as soon as the Information Assistant has scanned it?

Documents are available to everyone who can access to the Fileroom within half a day after scanning and profiling are completed. New documents are loaded into the Fileroom twice daily, over the lunch hour and overnight. Users are unable to access the Fileroom during this process, so it is usually scheduled over the lunch hour or in the evening. A scan station (where 25-50 documents may be scanned and profiled daily, in addition to direct downloading and profiling of electronic mail) cannot operate efficiently if it is dependent on continuous direct access to the server and central database. Access is also delayed by a series of automated tasks which take place after profiling to enable various retrieval methods and ensure data integrity.

Why does the text have typos in it?

The text file is generated from a scanned image using optical character recognition (OCR) that attempts to match the pattern of black bits on a page to characters in the standard ASCII set used on most computers. The quality of the original document influences the quality of the OCR-ed text. NO OCR system is perfect. Some companies invest in various quality control methods to reduce the error rate. However, the EFS search software uses adaptive pattern recognition (also called “fuzzy searching”) to overcome retrieval problems due to spelling mistakes, typos or alternative languages.

For more information about OCR quality, see How accurate is the OCR process?
What methods can I use to find documents in IRIS?

You can search for documents by:

1. performing a control search on one or more fields in the document profile. This is a database-style search against indexes;

2. performing a label search on the names (or parts of names) of cabinets, drawers, folders and documents;

3. performing a content search to obtain a hit list of document pages that contain the text that approximately matches your “clue”. The fuzzy search feature allows you to locate documents with incorrectly spelled words. The search clue need not match exactly the text you wish to find;

4. browsing through the Fileroom and reading the lists of documents presented.

The IRIS User’s Guide explains when and how to apply each search method.

Searching, Retrieval, Display continued

Why can’t we reorganize the Fileroom?

The Fileroom presents the documents in a hierarchical arrangement, using the Cabinet, Drawer and Folder metaphor. If you are familiar with this arrangement, and the size of the Fileroom is not too large, browsing the hierarchy is one search method. In a shared filing system, some aspects of the arrangement may not suit everyone’s preferences. In a paper filing system, the hierarchical arrangement is the only method for locating items in the collection. Fortunately with IRIS, content, label and control searches provide three more powerful ways to locate documents within a Fileroom. Since this hierarchical arrangement also controls the physical storage of the image and text files in the current software, it is very resource intensive to reorganize the filing hierarchy.

Can I sort a list of items before I view them?

No. The EFS software currently has no sorting mechanism. To compensate, IRIS data entry rules are defined to influence the display sequence and/or make certain parts of a name easy to spot in a hit list.

Why is the in basket empty?

Earlier versions of the EFS software featured an in basket where copies of new, incoming items could be duplicated and displayed for a limited period of time. Experience with various business units has shown that this feature has limited value and is confusing if it includes outgoing items. This purpose is better served by a correspondence tracking system. The in basket will be eliminated in future versions.

Why can’t I invent a new Document Type?

Document Types are used in IRIS to predetermine certain profile attribute values. These defaults make profiling faster and more accurate. Also, IRIS will eventually be used by all parts of the World Bank Group. ITS needs to ensure that new Document Types are unique. If they are specific types, we also need to identify an authoritative source for name, and rules pertaining to the use of the Document Type after it’s initial creation.

Why is it necessary to profile a document?

Irrelevant hits are more likely to occur in a content search than in a control search. Profiling describes each document with specific values for fields commonly used in control searches. Documents in IRIS are indexed according to control field data. To search a large body of documents efficiently in IRIS, use a control search to narrow down a subset of documents, then perform a content search on the full text of the documents in that subset, using the Limit-Hit List option.

How does email get into IRIS?

An ALL-IN-1 account is set up by the division. Electronic mail messages (a.k.a. email or EMs) copied or forwarded to this account are directly loaded into IRIS by the Information Assistant on a daily basis.

What kind of software does IRIS use?

The IRIS system is a collection of client/server software components that perform different functions. Client software is the part that is installed on your desktop; a corresponding part that works in tandem with the client is installed on one or more servers on the network. Search functions are currently performed using EFS. New profiles are entered using a Visual Basic program on a stand-alone scan station. Image Basic software controls the scanner. Calera is the OCR engine that converts images to text. A Pearl program staples all the component files together and loads them into the ORACLE repository and the EFS database. Some of these components are currently being re-engineered with new software to make wide-scale deployment possible.

Is IRIS an imaging system?

Like most electronic imaging systems, IRIS captures and controls the final version of documents. Also, like other imaging systems, IRIS enables documents to be displayed on a computer screen with margin notes, signatures, layout and typography exactly as they appeared on the original documents. However, an image cannot usually be modified and is not readily searchable. To retrieve an image, a corresponding profile must first be created and indexed. To retrieve information inside the content of a document, it’s image is converted to a text file. Not all imaging systems also provide full text searching of document contents. IRIS provides both methods of retrieval: control searches (based on indexed profiles) and content searches on full text.

Is IRIS a workflow system?

No. IRIS is a storage and retrieval system. ITS is currently investigating the best way to integrate IRIS with Lotus Notes. Appropriate documents resulting from workflow applications should be systematically captured for longer term storage, reuse, and sharing across the Bank.

Is IRIS a groupware system?

No. IRIS is a storage and retrieval system. It is not used to facilitate active group collaboration processes, but it does support shared use and reuse of finished documents by making them accessible to all staff in a Business Unit. IRIS also provides enhanced full text search capability across all Bank reports in the imageBank.

Technology continued

How accurate is the OCR process?

* 99% accuracy is really quite poor and generally not acceptable to a normal reader, and well below the best OCR rates.

* OCR accuracies for original clean documents (e.g., right out of the printer, reasonable size simple fonts with no peculiar ligatures) are a lot better than 99% and are acceptable to the normal reader (figure one or two typos per page).

* OCR accuracies for multi-generation copies or poorly printed originals can be very bad and render the page unreadable online -- even though a person could have managed to read the original. Many pages in Bank documents are compromised in this way. For example, Xerox copies of letters or faxes appearing in appendices, as well as poorly printed originals.

* Many word processing practices reduce the accuracy of OCR. For example, putting text in shaded boxes or on top of gray-scale pictures invariably causes OCR errors. Using a very small font to squeeze a spreadsheet on a single page or to set off a lengthy quote also cause OCR problems. Mixing font orientation on a single page will cause one orientation to be completely lost, e.g., the page header is portrait orientation and the text is landscape.

* OCR of numerical tables is inherently inaccurate without manual processing of the table (i.e., pointing out to the OCR software where the tables are). At the volumes we do OCR, manual processing is not feasible. When the technology can automatically recognize numerical tables, we could rescan.

* Mixing pages of different orientations (portrait and landscape) will cause OCR errors if/when the OCR software fails to detect the changes.
* All OCR is being done assuming the text is in English. For mixed language documents the non-English portion will not be scanned at the same level of accuracy as for the English. The practice of producing dual language, two-direction documents (e.g., every other page reads front-to-back in English then upside-down, back-to-front every other page in Spanish) causes complete OCR failure on all of the upside-down set of pages.

Technology continued

What form of backup and disaster recovery does IRIS provide?

To ensure a high level of system availability, ITS has taken steps to provide several levels of redundancy in the hardware and software comprising the imaging service within the IRIS system.
The base hardware includes two AlphaServer 2100s, from Digital Equipment Corporation. The servers are mirrors of one another; each contain two CPUs running at 200MHz, 512 MB of random access memory and 600 GB of shared magnetic disks. Layered on top of the base operating system (Digital UNIX) is software called, Available Server Environment (ASE), which allows for the two servers to operate as standbys to one another.
In the ASE cluster, the software supporting the imaging service (Oracle RDBMS and EFS from Excalibur Technologies) runs on one of the base servers. If at any point that system is no longer available, (e.g., system crash) the other server becomes aware, takes control of the shared disks, and starts running the appropriate software. The imaging service then falls over to the second server. The fall over is not transparent, since users will need to re-establish their connections to the server, but downtime is minimized to a matter of minutes.
In addition to server-level redundancy, ITS has employed RAID-5 technology to ensure both data integrity and data availability. RAID, (Redundant Array of Independent Disks) is a way of configuring multiple disk drives to appear as one large drive. Under RAID Level 5, as data is written to disk, enough parity information is stored so that if one disk fails, it can be replaced and data can be reconstructed on the fly without affecting user access.
ITS intends to continue tracking developments in high-availability systems, and incorporate those technologies that allow for a tighter and more seamless environment for users.


How secure are the documents that our Division stores in IRIS?

(a) IRIS uses several levels of redundancy to ensure that records stored in the system are not lost. Paper records are more vulnerable to misfiling or loss due to physical removal from the file folder or file cabinet.

(b) An IRIS Fileroom can only be accessed by staff who know the password.

(c) All IRIS images and text are accessible via the EN; they are not subject to physical media limitations such as locating the right file on a floppy diskette.

(d) In the event of a network shutdown or hardware problems at the desktop, a parallel paper collection is being maintained.

(e) At present, document level security is not implemented; however the Security Classification is being entered into the profile by default in anticipation of this feature in future releases.

Records continued

Why can’t they just download all my Word (or WordPerfect) documents into IRIS from the shared Divisional drive?

Reports or other documents that don’t require signatures may be directly loaded from a word processor file, provided those variable elements such as dates have been “frozen” and the file is saved in read-only mode. Eventually, the system will prove to be reliable enough to substitute fully as the copy of record for certain types of documents. Most memos or letters created using a word processor may require a signature to make them valid records and authorize action to be taken. In such cases, it is the signed version that must be captured as a scanned image in IRIS.

What’s the difference between the Retention Schedule and the Recordkeeping Responsibilities?

The period of time that certain classes of documents must be kept for legal, evidential and historical reasons, is determined by a Records Retention and Disposition Schedule (RRDS). The RRDS also shows which classes of documents may actually be destroyed after the retention period has expired. Retention classes do not reflect document usage patterns during their active period. Recordkeeping Responsibilities outline which classes of documents need to be scanned into IRIS, and which can be handled using less costly methods. A document that is frequently used by more than one person may be scanned for easy reference, even though the RRDS states that it has limited value and should be destroyed after 10 years.

FAQ: IRIS Features and Limitations p. .

Share in:


Electronic Records Research 1997: Resource Materials iconElectronic Records Research 1997: Resource Materials

Electronic Records Research 1997: Resource Materials icon21 cfr part 11; Electronic Records; Electronic Signatures

Electronic Records Research 1997: Resource Materials iconConfiguring Electronic Health Records

Electronic Records Research 1997: Resource Materials icon2. Critical Success Factors for an Archival Electronic Records Program

Electronic Records Research 1997: Resource Materials iconRequirements for Electronic Records Management Systems (erms) draft – 4/19/02

Electronic Records Research 1997: Resource Materials iconThe materials in this guide were adapted from the course guide “Marketing...

Electronic Records Research 1997: Resource Materials iconLegislating to facilitate electronic signatures and records: exceptions,...

Electronic Records Research 1997: Resource Materials icon1996 wl 1090007 (ca (Civ Div)), [1996] 4 All E. R. 481, [1997] F....

Electronic Records Research 1997: Resource Materials iconBcc research is a leading information resource producing high-quality...

Electronic Records Research 1997: Resource Materials iconElectronic Research Edition

Electronic Records Research 1997: Resource Materials iconThe act of making a Remote File System (rfs) resource available by...

Electronic Records Research 1997: Resource Materials iconResearch documents electronic filenames

Electronic Records Research 1997: Resource Materials iconTest data. Until three years after final payment, Seller shall keep...

Electronic Records Research 1997: Resource Materials iconResearch began in 1997 by kk polyflex for value-added disposal of...

Electronic Records Research 1997: Resource Materials iconWhat you have here is a huge list of wholesalers. Take your time....

Electronic Records Research 1997: Resource Materials iconResearch Assistant. August 1997 August 2002

Electronic Records Research 1997: Resource Materials icon2. 0 Resource Management Goals of Agencies and Indian Tribes with...

Electronic Records Research 1997: Resource Materials icon2. 0 Resource Management Goals of Agencies with Jurisdiction Over the Resource to be Studied

Electronic Records Research 1997: Resource Materials icon2. 0 Resource Management Goals of Agencies with Jurisdiction Over the Resource to be Studied

Electronic Records Research 1997: Resource Materials iconElectronic Media Electronic Story Archive 1994 to


When copying material provide a link © 2017