A PDF version of this document is also available (PDF version, 440KB).
Timothy Arnold-Moore, Legislation Management Consultant
InQuirion Pty Ltd
The TeraText® trademark and logo are owned by Scientific Applications International Corporation.
Table of Contents
1. Executive summary
1.1 Purpose
1.2 The PAL project
1.3 Stakeholders
1.4 Process
1.5 Summary of recommendations
1.5.1 Reassess rendering engine choice
1.5.2 Consider updating to Documentum 5
1.5.3 Upgrade screens
1.5.4 Use native Unicode for macrons throughout the system
1.5.5 Consolidation of the Authoring Tool documentation
1.5.6 Make user acceptance testing team available throughout development
1.5.7 Consider more redundancy for the systems critical for publication
1.5.8 Consider upgrading the web engine to include an XML repository capability
1.6 Summary
2. Introduction
2.1 Background
2.2 A construction project analogy
2.3 Why legislation is different
3. Stakeholders
3.1 People of New Zealand
3.2 The Government of New Zealand
3.3 Parliament
3.4 Parliamentary Counsel Office (PCO)
3.4.1 PCO Drafters
3.4.2 Secretaries and support staff
3.4.3 Proofreaders
3.4.4 PPU
3.4.5 Compilers
3.4.6 IT Team
3.5 Inland Revenue Department (IRD) Drafters
3.6 Office of the Clerk (OC)
3.7 SecuraCopy
3.8 Datacom
3.9 Parliamentary Service
3.10 Legal publishers
3.11 Unisys
3.12 Summary
4. Process
4.1 Initial review of system documentation
4.2 On-site visit
4.2.1 Focus groups and user interviews
4.2.2 System demonstration
4.2.3 System architecture briefing
4.2.4 Code reviews and walk-throughs
4.2.5 Summary of the on-site schedule
4.3 Vendor directions
4.4 Preparation of draft report
4.5 Presentation of draft report
5. Architecture
5.1 Typical architecture for legislative drafting
5.1.1 Authoring Tool
5.1.2 Repository
5.1.3 Workflow
5.1.4 Consolidation tool
5.1.5 Print delivery
5.1.6 Electronic delivery
5.2 The PAL architecture
5.2.1 Authoring/Consolidation tool
5.2.2 Repository
5.2.3 Workflow
5.2.4 Print delivery
5.2.5 Electronic delivery
5.3 PAL development environments
5.3.1 FOSI
5.3.2 ACL (ArborText)
5.3.3 DocBasic (Documentum)
5.3.4 Java
5.3.5 XSLT
5.3.6 CSS
5.3.7 C# and .NET
5.3.8 Microsoft Word
5.4 Conclusions about the PAL architecture
6. Specific questions
6.1 Is the application architecture implemented in a consistent, logical and understandable manner?
6.2 Does the Epic software (Epic Editor, Print Composer, and E3) have the capability to meet PCO's requirements?
6.2.1 Epic Editor—the Authoring Tool
6.2.2 Epic Print Composer—the Print Rendering Tool
6.3 Does Documentum 4i have the capability to meet PCO's requirements?
6.4 Does the mix of package customisation and bespoke development support future development and package upgrade without major rewrites or design changes?
6.4.1 Authoring Tool
6.4.2 Rendering Engine
6.4.3 CMS
6.4.4 Website
6.5 Are recognisable standards applied and consistently implemented?
6.5.1 XML
6.5.2 FOSI
6.5.3 XSLT
6.5.4 Java
6.5.5 .NET
6.6 Is the use of a coding standard apparent, and has it been universally and consistently applied?
6.7 Are APIs well defined and documented?
6.8 Is sufficient information available in terms of documentation and coding structure for a developer or designer to be able to make modifications without the need to rewrite components?
6.9 Given the manner in which XML technology has been implemented, are there any implications in terms of technology development, maintenance and upgrade of components, and data portability?
6.10 Are there any demonstrable gaps or deficiencies in the system documentation and system test plans?
6.10.1 System documentation
6.10.2 System test plans
7. Other issues
7.1 Ongoing support, maintenance and development
7.1.1 Business requirements
7.1.2 Context
7.1.3 Skill sets
7.1.4 Options
7.1.5 Risks
7.2 Print rendering test set
7.3 Legacy data
7.4 Security issues
7.5 Uptime issues
7.5.1 Website
7.5.2 VPN
7.6 Compilation and reprints
7.7 Cross referencing tool
7.8 Annual volumes
7.9 Naming of transform output files
1. Executive summary
InQuirion has undertaken a technical review of the Public Access to Legislation (PAL) project for the New Zealand Parliamentary Counsel Office (PCO) to enable PCO to reassure the New Zealand Government, as sponsors of the PAL project, that the PAL system, when implemented, will be operationally stable, maintainable, and capable of supporting future enhancement and development. This document captures the results of that review.
The PAL project is a large information technology project designed to facilitate the drafting, management, and delivery of legislation by the Parliamentary Counsel Office (PCO), Office of the Clerk (OC), Tax Drafting Unit of Inland Revenue Department (IRD), and external contractors Brookers Ltd, SecuraCopy, and Datacom. PCO engaged Unisys as its implementation partner to assist with the completion of the PAL project, in particular with the development of the technical solution and systems integration.
A legislation management system has some superficial similarity to any other office document management environment, but a number of factors distinguish legislation from other document environments, including the longevity of the documents, the importance of the documents, the regular structure of the documents, the temporal nature of the documents (the requirement to access current and old versions that reflect the effect of amendment documents as made into law), and the unique relationship between the drafters and the Parliament. In New Zealand, these are further complicated by the split functions of the PCO (as both drafter of legislation and publisher of legislation—a common development in many jurisdictions), and the time pressures placed on the drafters and publishers (because of the long sitting period and the extensive use of select committees). Producing a system that takes into account these factors is a complex task.
The PAL system affects a large number of key stakeholders and all of their interests must be balanced. The key driving force behind the project is to make legislation accessible to the New Zealand public. This can assist access to:
- the democratic process: by providing context to debate on new legislation and by assisting the Parliament and the Office of the Clerk to perform their functions,
- the judicial process: by reducing the cost of gaining access to the law both directly to the public and indirectly through legal publishers, and
- the executive process: by assisting the PCO and IRD to support the Government's legislative agenda, by reducing the cost to the public service of access to the law, and by improving the public's knowledge of the powers and obligations of the various officers of the Crown.
Access to the law means electronic access—to provide a fast and cost-effective delivery medium (hosted by Datacom and Parliamentary Service) and to assist the visually and physically impaired, and paper access—to fulfil the legislative requirements of the Parliament (typesetting provided by PCO and printing outsourced to SecuraCopy) and those who are unable to access electronic versions. It requires timely access to draft legislation (prepared by PCO, IRD, and OC), the output of select committees (prepared by OC and PCO), legislation as made (prepared by PCO), and consolidated legislation (prepared by PCO and Brookers).
Unisys, as implementation partner, has put its reputation on the line to deliver a system that supports the needs of the other stakeholders.
InQuirion has consulted widely amongst the stakeholders to ensure a complete and thorough coverage of the issues raised by this crucial project. These stakeholders cooperated willingly in providing documents, information, and time to facilitate the review. The process of the review included:
- an initial review of the system and project documentation,
- an initial on-site visit incorporating:
- focus groups and user interviews,
- demonstrations of the system—with and without direction from PCO and Unisys,
- a system architecture briefing, and
- formal and informal code reviews and walk-throughs,
- vendor interviews to ascertain future product directions,
- preparation and presentation of a draft report, and
- response to feedback to produce this final report.
1.5 Summary of recommendations
The system and the architecture behind it are generally sound. Although there are a number of issues identified in this document, InQuirion is confident that, if these issues are addressed together with those identified in the draft Stage 2 Phase 1 and Phase 2 Scoping document, New Zealand can be confident in deploying the PAL application. Below is a summary of the recommendations that come out of this review.
1.5.1 Reassess rendering engine choice
As described in section 6.2.2, InQuirion has some concerns about the current print rendering solution. These include both performance concerns—can it render large documents fast enough, and functionality concerns—can it render documents the way PCO and other stakeholders have specified. These concerns must be addressed before the system can be deployed.
InQuirion recommends that the User Acceptance Team prepare a set of documents in XML and PDF that can be provided to Unisys and ArborText for confirmation of their project plans and against which to validate candidate releases. This set of documents together with the layout specifications should be provided to selected vendors of alternative rendering engines to ensure that a fallback position is preserved and to explore the cost-effectiveness of alternative solutions.
Further details about the desired contents of that demonstration set appear in section 7.2.
1.5.2 Consider updating to Documentum 5
As described in section 6.3, there are potentially considerable advantages to upgrading from Documentum 4i to 5i for the Content Management System (CMS). These include considerable performance improvements with the throughput of XML documents that are chunked, or split into Parts and Subparts to allow multiple drafters to work on a single draft simultaneously, and better support for the rules that define when an element should be treated as a separate chunk (for instance, Parts inserted by amendment wording should probably not be in a separate chunk).
PCO should consider the feasibility of upgrading for Phase 1 release, and certainly for Phase 2 release. To test the feasibility of Phase 1 upgrade, PCO should install Documentum 5i on a separate machine and, either using their own IT staff or Unisys, with assistance from Documentum as required, attempt to install the PAL application to identify the issues and obstacles to migrating. If the install is successful, the User Acceptance Testing team (UAT) should be used to exercise the install to identify any components of the system that are not functioning as they should. This process should inform the decision on whether or not to upgrade before deployment.
As described in section 6.2.1, drafters and other users of the Authoring Tool are often editing documents with the tags showing. Because of the density of tags in the New Zealand PCO DTDs, the amount of text visible is quite small on smaller screens. Drafter and other user productivity is likely to drastically improve with no modifications to the software, if drafters and other users are provided with large, high-definition screens with which to prepare and edit drafts.
While these upgrades are not crucial to deployment and could take place at any time before or after deployment, obviously the productivity improvements will not be realized until the screens are installed on the users' desktops.
1.5.4 Use native Unicode for macrons throughout the system
As described in sections 6.5.1.2 and 6.5.4, Maori macrons are not being managed in the system in a way that supports data and application portability. InQuirion recommends these macrons and other characters beyond 127 in the Unicode code tables be managed as native Unicode rather than using entities. The Authoring Tool can be configured to import and export macrons using the native Unicode encoding. This will ensure that documents can be passed through any conformant XML tools and still be readily edited in Epic Editor. This change is best implemented before go live to avoid later clean-up tasks.
Should the New Zealand Government wish to deliver Maori macrons correctly on the website, it should be configured to send HTTP headers and metadata fields in the HTML identifying the encoding as UTF8 and native Unicode should be delivered in the web pages. A canonical decomposition of the Unicode data should be applied (to separate an "a" into "a" followed by the "-" combining diacritic) so that older web browsers incapable of displaying the macrons still display the vowels appropriately.
1.5.5 Consolidation of the Authoring Tool documentation
As described in section 6.10.1, the documentation of the Authoring Tool has been augmented by a number of separate documents for each new function added to the Authoring Tool with separate installation instructions for each new function. Ongoing maintenance of the Authoring Tool and the whole system would be aided by consolidating these documents into a single User Requirements Document—describing what the Authoring Tool and the various functions delivers to the user, Detailed Design Document—describing how the functionality was achieved in the coding environment, and Installation Manual—describing how to install the whole application as well as separately installing the additional functions in an existing install.
InQuirion believes that the system is not complete until it has been delivered in a maintainable form and that this consolidation of the documentation is necessary to deliver a maintainable application. If Unisys is responsible for operational maintenance between Phase 1 and Phase 2, delivery of this consolidated documentation could be deferred to Phase 2 deployment.
1.5.6 Make user acceptance testing team available throughout development
As described in sections 6.10.2 and 7.1, the User Acceptance Testing team (UAT) has developed considerable expertise in exercising candidate releases and identifying issues and reporting errors. InQuirion recommends that the UAT team continue to be used throughout the development for Phase 1 and Phase 2 to test candidate releases for Unisys and its subcontractors to reduce the risk that the final candidate release will be rejected by UAT.
For ongoing development, InQuirion recommends that the UAT team be regularly reconvened to test candidate releases of newly developed functionality as part of the ongoing enhancement of the PAL application.
1.5.7 Consider more redundancy for the systems critical for publication
As described in section 7.5, while there is considerable redundancy in the production environment within PCO, external access to the legislation is subject to a few key points of failure. Those accessing the internal repository through a Virtual Private Network (VPN) must access through a single piece of hardware to support that VPN. If that box were to fail, a replacement could take some time to procure. InQuirion recommends that arrangements be made (including possibly purchasing a spare) to ensure that such a hardware failure does not interrupt the ability of IRD, SecuraCopy, Brookers, or Datacom to interact with the PCO repositories in the appropriate ways.
The public web site as hosted by Datacom sits on a single machine. While the original project envisaged relocating the testing machine from PCO to Datacom as the back-up machine, this machine will continue to be needed for ongoing testing. The PCO should either contract with Datacom to arrange a short-term replacement machine, or provide a warm backup machine to ensure that any hardware failure in the web server machine does not interrupt public access to the web site for more than an acceptable period of time. Any purchase of a new server should be deferred as late as possible to maximize value for money. This purchase should initiate a reshuffle of the hardware to ensure that the most powerful servers are deployed where performance is most critical to meet the objectives of the PAL system. The most likely candidates are to support print rendering or to support the public web site.
1.5.8 Consider upgrading the web engine to include an XML repository capability
As described in section 6.4.4, the web application that forms part of the PAL system is a simple, robust solution. But search is limited to full text and metadata fields extracted into the HTML data and fails to take full advantage of the additional value contained in, and potential of, the XML markup. For ongoing development, InQuirion recommends that the PCO explores using an XML repository linked to a web server or application server to deliver more extensive XML-based search capabilities, the ability to dynamically generate HTML appropriate for different browser versions from the XML, and support for point-in-time access to the consolidated legislation.
The current website architecture is adequate to meet the current requirements (although some outstanding issues need to be addressed before deployment). Because the web site is an output or delivery mechanism only, the impact of changes in the web site to the production environment will be minimal. Upgrades could occur before, during, or after deployment of the production environment.
The PAL system as implemented contains a number of strengths outlined in this report. The weaknesses described in this report are not insurmountable and should be addressed by implementing these recommendations and the draft Stage 2 Phase 1 and Phase 2 Scoping document.
Generally, the integrity of the modules has been preserved with customizations appropriate to the applications on which they have been built.
The modularity of the system follows similar systems deployed or in development around the world. While there are still some outstanding issues to do with integration of the system components, the system components seem to be interacting well and reliably.
Scope remains within the applications for further customization and upgrade of system components. With such a large and complex software system, the impact of any upgrades or enhancements will always need to be thoroughly tested before deployment but this is not unusual for developments of this type.
The use of industry standards and coding practices is evident within the development. The variety of subcontractors and development environments deployed on the project has lead to inconsistency between coding practices in different modules. Again, this is not unusual for projects of this size and complexity. Some minor recommendations have been made for improving the application of standards and coding practices throughout this report.
The system is generally robust. At no time during demonstrations of the system to InQuirion or InQuirion interacting with the system did any of the components crash although there were problems with some of the outputs of the modules.
Before deployment, a number of issues relating to ongoing maintenance and support need to be addressed. These have been outlined in this report.
Providing these issues and other issues identified within this report are addressed to the satisfaction of the stakeholders, the New Zealand Government can be assured that the PAL system, when implemented, will be operationally stable, maintainable, and capable of supporting future enhancement and development.
2. Introduction
From the Terms of Reference for this Technical Review:
The Parliamentary Counsel Office (PCO), in conjunction with the Office of the Clerk (OC) and the Inland Revenue Department (IRD), is undertaking the Public Access to Legislation project (the PAL Project). The project is designed to improve the way in which all New Zealand legislation, drafted by PCO, OC, and IRD, is made available, and to provide the basis for public access to up- to-date legislation in both printed and electronic form. The project scope covers authoring and pre-publication tools; content management system; Document Type Definitions (DTDs); database of New Zealand legislation; website and other electronic access solution; change management and communication; systems integration; secure connectivity between the PCO and other agencies (OC, IRD, Brookers Ltd, and Securacopy); establishment of capability for pre-publication, compilation, and database officialisation; and arrangements for printing of hard copy output. (See: www.pco.parliament.govt.nz).
PCO engaged Unisys New Zealand (Unisys) in 2001 as its implementation partner to assist with the completion of the PAL Project, in particular development of the technical solution and systems integration.
The PAL system will operate over a virtual private network (VPN) and, on implementation, will support around 80 users at PCO, OC, IRD and Brookers Ltd.
A web site will support public and subscriber access to legislation in XML, HTML, and PDF format. SecuraCopy, a contract printer, will perform the commercial printing of legislation.
The project was commissioned over two stages:
- Stage 1, covering Planning, Scoping, and Analysis, and the evaluation and selection of system components, was completed at the end of 2001, and
- Stage 2, Implementation, began in March 2002.
Following several plan revisions during 2002, the parties agreed a go-live date of 17 February 2003. The planned Go-live date has been deferred, for a number of technical and commercial reasons. The PCO and Unisys have subsequently produced a revised statement of requirements for the completion of Stage 2. The PCO now wishes to commission a technical review of the PAL solution before the New Zealand Government makes any decisions about the future of the project.
InQuirion Pty Ltd was engaged by the New Zealand PCO to undertake this technical review in order to inform decisions about the future of the project. The primary role of the review is to enable the PCO to advise the New Zealand Government, as sponsors of the PAL project, whether the PAL system, if implemented, would be operationally stable, maintainable, and capable of supporting future enhancement and development.
The lead consultant on this project, Dr Timothy Arnold-Moore, has extensive experience in developing complex SGML and XML-based solutions for legislative drafting and has consulted to the governments of Tasmania, Canada (federal), New South Wales, Ontario, and Papua New Guinea on legislative drafting solutions. Neal Landry, Neda Miskov, and Michael Fuller assisted him.
2.2 A construction project analogy
Producing an information technology (IT) system has many similarities to constructing a building. A small renovation might require only limited planning and one or two skilled technicians a week as with a small change to an existing IT system. Larger changes like adding a second storey might involve more specialized skills, more detailed planning and coordination, and a longer time frame.
Building a house is a well-understood problem and many firms, large and small, can provide a service with varying degrees of customization and cost as with well-understood IT functionality like pay-roll systems. In both environments there are a wide variety of "off- the-shelf" solutions available. Predicting the cost and time necessary for such a project should be and usually is relatively simple.
On the other hand, large public building projects usually require considerable custom design and construction. Rather than a small construction firm, such projects usually involve a reputable architectural firm and are typically built by one of a few companies specializing in large-scale projects. Similarly, there are a small number of organizations that specialize in large-scale public IT projects. Unisys is one such company that has developed a well-deserved reputation for successfully completing large-scale public IT systems. Like successful large-scale construction companies, Unisys has built many large IT systems for government and large corporate clients and has developed methodologies based on considerable experience in large-scale IT project development.
In construction projects, the architect is employed to represent the client's interests with the construction firm, to ensure that the client's requirements are fully understood, to produce designs that capture all of the client's requirements, and to ensure that building commences and continues on a design that meets the client's needs and is signed- off by the client. Architecture is a well-understood discipline that encapsulates centuries of experience in construction. In the medieval period, an architect or construction firm was employed to build a cathedral if the last one built didn't fall down. Now architecture courses ensure that qualified architects combine technical knowledge (about how to build buildings that stay up) with the ability to elicit and understand the clients' needs, the experience to design constructions that satisfy those needs, and the management skills to plan and coordinate their construction.
Designing and building software systems is a much newer discipline. It is rare to find a single person with the technical skills, the business analysis skills (both interacting with potential users and output sources), and the project management skills to perform the equivalent task as an architect for a software project. These tasks are often divided between a project manager, a technical architect, and one or more business or document analysts and this has been the case with the Unisys team. Even though courses in software engineering exist, the reasons for project success or failure are still not well understood. While the software industry is not medieval, often the safest way to ensure a successful IT project is to employ a firm that has built a system like the proposed system before, or for the firm to involve in the role of architect someone with experience in building systems of that kind. Based on InQuirion's experience in producing end-to-end legislation systems, InQuirion considers a critical success factor in such projects is the involvement of someone with that kind of experience as architect.
2.3 Why legislation is different
A legislation drafting and management system on the face of it looks like a typical office document authoring, management, and delivery application. But there are a number of factors related to legislation that distinguish it from typical office environments. These include:
- The longevity of the documents: Some legislation being managed today was drafted many hundreds of years ago—portions of the Magna Carta are still part of the law in New Zealand. By contrast, most office documents are created for immediate use. Few are retained for more than a year or two.
- The importance of the documents: After the Treaty of Waitangi, the documents recording the legislation of New Zealand are arguably the most important documents in the land. 1 This importance justifies particular care in ensuring the highest quality of print production when rendering the documents. The main driver for office documents is efficiency. Providing that the documents look sufficiently professional, minor variations in typography can be overlooked. While efficiency and timeliness are important for legislation, presentation is particularly important and variations in typography, however small may be noticed and trigger unnecessary delays in the Parliamentary process.
- The regular structure of the documents: Legislation in New Zealand, as in most English-speaking jurisdictions, follows largely the English tradition breaking Acts into numbered sections, subsections, paragraphs, and subparagraphs each with distinctive numbering conventions and typographic markers and consistent nomenclature, and grouping the sections into Parts and Subparts and under various unnumbered headings. While some of the typographic markers have changed subtly over the years and vary from jurisdiction to jurisdiction, the structure is largely unchanged and uniform across most English- speaking jurisdictions. Office documents vary much more widely in structure.
- Temporal nature of the documents: The vast majority of legislation made in New Zealand, as in most English-speaking jurisdictions, is amending legislation—that is the wording of the legislation describes textual changes to existing legislation. While these amendments have the force of law, users of legislation care more about the wording that results from applying the amendments to the existing substantive provisions (referred to as "consolidating" the amendments hence the term "consolidation" to describe these documents) than the actual text describing the amendment. This results in multiple versions of the substantive legislation and different versions will be valid at different times. Users of legislation require immediate access to the most current consolidation of a piece of legislation but they also need access to past versions—when preparing legal advice or hearing a case about a past incident—and, to the extent possible, future versions—when preparing advice about future activity. Office documents rarely exist long enough for multiple versions to be valid and, where multiple versions exist, usually only the latest version is of interest. The time period of validity of most office documents is rarely as clearly defined as for legislation.
- The unique relationship between the drafters and the Parliament: The drafters provide legal advice to the Parliament and owe fiduciary duties, including a duty of confidence, to their clients—the Members of Parliament and the officers of the Crown. The documents, while being drafted, are highly confidential and may contain politically sensitive (and occasionally militarily sensitive) information. Once the drafts are tabled, they are public documents. In a normal office environment, most documents either stay sensitive throughout their life cycle or are designed from the beginning for public distribution.
- Split functions of the PCO: Drafting offices such as the PCO have existed for many years solely to draft bills and associated material and regulations and other subordinate legislation for the government. As in many other jurisdictions, the NZ PCO is also now responsible for publishing the legislation and related materials. This is sensible because the PCO is uniquely positioned to maintain many of the products useful to the broader public for its own internal use (including up-to-date consolidations on which to base amendment Bills or draft Regulations). A culture of providing a discrete and largely confidential service to the Government does not always sit well with the more public service of publishing.
- Time pressures placed on the PCO, OC, and IRD: New Zealand Parliament has an extremely long sitting season. Parliament rises a little before Christmas and resumes in early February. With the exception of this 5-6 week break, there are only 3 or 4 breaks of no more than 3 weeks scattered throughout the year. There is little opportunity for respite for the services that support the operation of Parliament including the PCO, OC, and IRD drafting. Unlike many Westminster Parliaments, where a government can often predict with considerable certainty what the final form of an Act will be well advance of its third reading or Assent, the extensive use of committees within the New Zealand parliamentary process means that changes in the text of a Bill are common and less predictable. Committees may work on reports to the Parliament up until a day or two before they are due to be presented, and Assent to a Bill follows on fairly quickly after third reading. Therefore, the OC and the PCO are under considerable pressure to prepare paper versions of a Bill or Act very quickly. Other jurisdictions may be able to pre-prepare Bills or Acts with some certainty many days or even weeks before they are required but this is not often the case in New Zealand.
These features add to the complexity of constructing a legislation drafting and management environment. They require both unique features and novel development techniques to ensure a lasting and usable system.
3. Stakeholders
This section describes the various stakeholders in the outcome of the PAL project. It does not attempt to comment on the extent to which the existing PAL implementation satisfies the requirements of those stakeholders. This section aims to provide a succinct statement of the dominant interests of those parties with respect to this project.
The PAL project is driven by one major overriding force—that is to make New Zealand legislation accessible. Since 1989, the New Zealand Statute Book has been owned or controlled by private corporations and complete sets of consolidations were only available from commercial legal publishers for a fee.
In a democratic government—where the citizens have a voice in the creation and modification of the laws that govern them—it is fundamental for the correct operation of the legislature that all citizens have access to the text of the law so that their influence is informed.
In a common law jurisdiction—where ignorance of the law is no excuse—it is fundamental to correct operation of the judiciary that all citizens have equal access to the text of the law so that all citizens, criminal or law-abiding can readily discover their obligations and rights under the law.
In a welfare state—where taxation funds a wide variety of services and wealth redistribution—it is fundamental to the correct operation of the executive that all citizens have ready access to the legislation that creates and guides the exercise of the powers and obligations of the public service to ensure that citizens are receiving the intended benefit of the services provided.
Effective access to the law by the citizens of New Zealand must be:
- complete—it must include the entire statute book, including all Acts and Regulations as made, and to support the legislative process, Bills and proposed amendments to Bills (including SOPs), and consolidations, to ensure an authoritative source of the text of the law as in force at a particular time,
- timely—it must be available as soon as is practicable after it becomes law, and in the case of Bills, SOPs and slip amendments, for a reasonable time during the parliamentary process to ensure adequate opportunity to lobby elected representatives, and
- accessible—it must be available to citizens with disabilities, whether visual impairments (rendering paper publications inaccessible), physical impairments (rendering centralized distribution facilities such as government information kiosks or government bookshops inaccessible), or technological impairments (rendering web-only delivery inaccessible to those who require traditional paper publications).
3.2 The Government of New Zealand
The primary historical role of the PCO is to support the government's legislative agenda by providing a timely, high-quality drafting service for primary (Bills, slips and SOPs) and subordinate legislation. The government want to ensure that the new system is introduced without adversely affecting the legislative timetable, either by slowing down the current drafting service or turn-around time, or by introducing minor formatting errors that distract the Members from the task of debating the content of the legislation. Despite acquiring new responsibilities as custodians and publishers of the Statute Book, the PCO must continue to provide a high level of service to the government.
The PCO, the OC, and Legislation Direct between them have in the past provided a service to mark up committee amendments, split Bills, and undertake other potentially time consuming tasks in very quick turn around times. As PCO assumes more of the responsibilities formerly taken by Legislation Direct, it is imperative that they provide as good or better service as provided in the past so that the legislative process goes smoothly and Members have available to them timely information for consideration before debating or voting.
3.4 Parliamentary Counsel Office (PCO)
The primary sponsor of the PAL project is the Parliamentary Counsel Office. The PAL system needs to support the historical role of the PCO in delivering services to the government and Parliament, and the new role of the PCO as custodian of the Statute Book and publisher of legislation.
PCO drafters will spend most of their time working in the Authoring Tool fulfilling their primary function of drafting proposed legislation or amendments to legislation or drafts for the government. While the primary focus of the PAL system has been to provide public access to the outputs of the drafters, for the PAL system to be truly effective, it must deliver this public access without interfering with or adversely affecting the task of drafting the legislation.
3.4.2 Secretaries and support staff
The secretaries and support staff work with drafters to support the production of draft legislation and amendments. The PAL system will be most effective if it avoids imposing additional burdens on the secretaries and support staff but instead provides useful tools to improve the productivity or quality of their output
Because of the high importance of the legislative documents and the potential costs, political and financial, of mistakes in the legislative documents, like most jurisdictions, New Zealand engages proofreaders to edit and correct draft legislation and other documents to ensure that the documents that leave the office are of the highest possible quality. The PAL system must continue to support the role of the proofreaders, making the drafts available to the proofreaders immediately that the drafter requests a proofread, and making their comments and suggestions available to the drafters immediately on the proofreaders completing their task.
The Pre-Publication Unit will be taking on the publication role formerly undertaken by Legislation Direct. They will bear the primary responsibility for ensuring that the correct stylesheets are applied to the correct documents, that the output of the print process produces high-quality page presentation, and that the documents are made available in a timely fashion, both in print and electronic form, once the content has been created. They will also be responsible for managing the more unusual inclusions in legislation such as graphics, treaties and deeds, and complex tables. A well-crafted print and electronic rendering solution as part of the PAL system should eliminate much of the stress of this typesetting role and allow the PPU to concentrate on the presentation of the more unusual elements.
One of the objectives of the PAL project is to make not only sessional or as made legislation available to the public, but also timely consolidations of amendments to supplement the official reprints. The PCO are engaging Brookers, the provider of the consolidated statute book, to maintain the consolidations while PCO staff are validating the consolidations to facilitate the officialization of the consolidated statute book. Brookers staff will be housed both within the PCO and on Brookers premises connected by a Virtual Private Network (VPN). When all of the consolidations are officialized, the Reprints Unit (RU) within PCO will assume responsibility for maintaining the consolidations.
The PAL project needs to provide tools to support the compilers, whether working on official or unofficial consolidations, to allow them to track all amendments and make sure that they are applied at the right time in the right way, to ensure that appropriate history notes are generated and to assist with the generation of the various publications that report on the commencements and amendments in particular time periods or applied to particular legislation.
The IT Team's role is to contribute to the effectiveness and efficiency of the PCO by assisting with the reliable operation, effective use, and development of the PCO's computer systems.
The PAL system will be most effective if it integrates seamlessly into the current IT infrastructure, and if the current high level of user satisfaction continues.
3.5 Inland Revenue Department (IRD) Drafters
In addition to the legislative drafters in PCO, Inland Revenue Department (IRD) has a small team of drafters dedicated to drafting tax and associated legislation. Their requirements are similar to those of the PCO drafters, except they will be accessing the repository of legislation and work in progress from IRD's offices across a virtual private network (VPN). It is important for their productivity that the network provides a reliable connection and sufficient bandwidth so that they do not have to wait very much longer to check-in documents or request and receive print previews or other renditions than users physically located in the PCO.
The Office of the Clerk (OC) is responsible for supporting the business of the House. They manage the progress of Bills through the Parliament and manage and record the decisions of the Parliament and reports from Select Committees to the Parliament. They need to be able to access the print proofs of documents to ensure that they match the decisions made by the House (and electronic versions to ensure no changes have been introduced in unamended text) and need a responsive print publication service from the PPU for the preparation of the various versions of the Bills for each stage of the legislative process, and assent copies to bring the Bills into law. Since PCO drafters are only available to the Government, private members Bills are drafted by OC.
SecuraCopy is a private firm that is effectively taking the role of an outsourced government printer. In the new PAL system, the typographic page setting work will be done primarily by the system and managed by the PCO's PPU. SecuraCopy will be responsible for preparing the print versions from PDF documents and an XML electronic "envelope" that gives instructions about what should be done with each PDF—how many copies and to whom to send them. SecuraCopy will also be responsible for distributing the print copies through normal commercial channels. It is most important to SecuraCopy that the PAL system provide a reliable connection for the communication of the proofs and that the instructions about volume and distribution are correct so that the right documents get sent to the right people at the right time.
SecuraCopy is connected to the PCO's network via dark fibre to ensure that communications between SecuraCopy and PCO are secure (that when a request is received by SecuraCopy that it is actually sent by PCO) and the confidentialities encapsulated in the more sensitive documents are not breached. Replistor technology is used to transfer documents.
Just as SecuraCopy provide an outsourced distribution channel for paper delivery, Datacom, as the Internet Service Provider (ISP), provide the outsourced distribution channel for electronic delivery of legislation. Datacom host the machines responsible for the public website as well as the service for downloading XML and PDF updates, see section 3.10 below. Updates to these services are uploaded on a daily basis (or by request) through a VPN using the Replistor technology to communicate only those files that have changed since the last replication.
The Parliamentary Service, amongst other things, is responsible for providing the network infrastructure to both PCO and OC. It provides firewall services to prevent unwanted intruders from breaching these sensitive networks, and hosts the PCO end of the VPN services to limit outside access. Its main interest in PAL is to protect the integrity of the network service to PCO, OC, and the other agencies that share their network infrastructure.
In addition to the public website that will provide HTML-based search and browse capability of as made and consolidated legislation, the PCO is also making available a free subscriber service to sessional and consolidated legislation in XML and PDF form, primarily for legal publishers. This service will maintain the last month's as made and consolidated documents so that third party publishers can access authoritative XML and PDF versions of New Zealand legislation for repurposing and republishing.
Unisys, as implementation partner on the PAL project, has a significant financial and reputational investment in the PAL project. While this interest is likely to diminish over time, they have a current interest in ensuring the successful deployment of the PAL system, and a potential continuing interest in securing an ongoing operational and developmental maintenance contract for the PAL system.
The legislative process is complex and adopting a centralized system for managing the total document lifecycle of legislative documents requires co-ordination of the needs of a large number of different interest groups and stakeholders to ensure that a complete system is successfully deployed.
4. Process
Given the large number of organizations and requirements for the PAL system, it was necessary for this review to consult widely to ensure a complete and thorough coverage of all the issues raised by this crucial project. The efficiency of this process largely relied on the cooperation of the various stakeholders in providing information to InQuirion in the form of documents, code, and interviews during a short but intensive on-site period and with subsequent phone and email clarifications and confirmations. InQuirion would like to state at the outset that all of the stakeholders, including Unisys, the vendors, and subcontractors, were extremely cooperative and helpful with InQuirion personnel during the course of the technical review. The following summarizes the processes followed to identify the issues of relevance to the PCO and other stakeholders.
4.1 Initial review of system documentation
Prior to visiting the PCO offices in Wellington, InQuirion was provided with extensive system and project documentation including the following documents:
| Parliamentary Counsel Office |
|
| PCO Test Scripts Website Test Scripts UAT Plan UAT Process UAT Requirements Elements of Legislation Specification DTD QA Report by Complete Data Solutions Ltd |
|
| Unisys
|
|
| Content
Management System |
CMS Design Document v1.1 CMS Configuration and Customisation Guide v1.1 Transform Process Technical Documentation v1.1 XSL System Documentation v1.0 Metadata Guidelines v0.2 Metadata Specification v1.1 |
| Database |
Legislative Database—Markup & Re-purposing Specification v1.4 |
| Document
Type Definitions |
DTD History v1.2 DTD Design Specification v2.1 DTD Guide v2.9 ID Schema v1.41 |
| Website |
PCO Website Detailed Design v3.0 PCO Website Style Guide v3.0 Website UI Design and Supplementary Specification Website System Documentation Subscribers Website Configuration Indexing Options |
| Systems
Integration |
Book of Requirements & Functional Specifications Technical Architecture Plan v1.1 Remote Access Architecture v0.6 Remote Access Detailed Design Document v1.2 PAL VPN Client Software Configuration v1.2 Stage 2 Phase 1 and Phase 2 Scoping Statement 1.0 Server Documentation Callisto v1.1 Server Documentation Europa v1.1 Server Documentation Jupiter v1.1 Server Documentation Leda v1.1 Server Documentation Leo v1.1 CMS Standby Server Configuration Guide v.1.0 Legato Replistor System Documentation v1.0 Interface between PCO and SecuraCopy Configuration v1.1 Graphics Features v1.1 Training Materials PCO PPU |
| Authoring
Tool |
ADG Autotext v1.0.doc Authoring Tool—Custom Fonts.doc Authoring Tool End User Specification v1.0 Final.doc Authoring Tool Matrix 7 August 2002 v1.0 Final.xls Authoring Tool Workstation Installation Notes 1.1 with track.doc Epic Authoring Application v1.0.doc Epic Editor 4.4 release notes (4.3.1 start at chapter 4).pdf NZ PCO Autonumbering Utility Installation and User Guide v1.doc NZ PCO ID Schema Utility Installation Guide v1.2.doc NZ PCO Initial Startup Dialog v1.4.doc NZ PCO Licence Release Utility Installation and User Guide v.doc NZ PCO Numbering Restart Utility Installation and User Guide.doc NZ PCO Publish Utility for Drafters Installation and User Gu.doc NZ PCO Publish Utility for PPU Installation and User Guide v.doc NZ PCO Return Quick Key Installation and User Guide v1.2.doc NZ PCO Stylesheet Movement Utility Installation and User Guide.doc NZ PCO_Startup DialogV1.4.doc Revision Tracking Tool Installation and User Guide v1.4.doc |
After an initial familiarization with the system via the documentation, InQuirion then commenced an on-site visit in Wellington. InQuirion was keen to elicit information from as many stakeholders and participants in the PAL project as possible without jeopardizing the timeliness of the review.
4.2.1 Focus groups and user interviews
A number of different focus groups and interviews were convened in order to explore and understand any reservations different categories of user have about the PAL system and to elicit details about any perceived or actual limitations of the system. The review was commenced by a focus group representing PCO (including drafters, secretaries, IT, and PPU), OC, IRD, and Unisys. A summary of the other interviews and focus groups appears in the table below in section 4.2.5.
The bulk of the first day on site was taken up by an initial system demonstration. The User Acceptance Testing Team (primarily Melanie Bromley—a drafter, and Michelle Antoine—the head of PPU) led the demonstration. This included exploring the full document life cycle of a Bill from initial draft through to publication as an Act. While there were a number of formatting issues visible throughout the demonstration (including not being able to generate Bills with line numbers), the demonstration showed support for the bulk of the lifecycle—however the demonstration was designed to avoid areas of the system where known defects prevented completion of a document lifecycle. Throughout the onsite visit, other parts of the PAL system and the existing systems used by PCO to manage the drafting process were also demonstrated to InQuirion.
4.2.3 System architecture briefing
In order to augment and explain the system documentation, InQuirion required a system architecture briefing from the PCO technical staff and then from Unisys. The purpose of this briefing was both to explore the extent to which the developed system matched the architecture documentation and to ascertain the depth of knowledge the various personnel had of the overall system architecture. InQuirion is satisfied that the developed system substantially corresponds with the architecture documentation. While PCO technical staff demonstrated a good overall understanding of the system architecture and how the pieces fitted together, Unisys was not able to present one person who both understood the architecture and the individual technologies, and understood the business requirements of a legislative drafting office. The knowledge seemed to be split between numerous Unisys staff and subcontractors with no single system architect uniting the vision.
4.2.4 Code reviews and walk-throughs
A thorough technical review of a project of this size and complexity requires an examination of the code to implement it to ensure consistency and quality. In addition to simply looking at the code, InQuirion had members of the development team—Unisys employees and subcontractors—walk them through a number of areas of the code, explaining the design decisions and layout of the code. While time prevented InQuirion from reviewing every single line of code, a combination of formal and informal reviews with the aid of developers and InQuirion staff examining the code directly without assistance ensured thorough coverage. There was a focus on the areas of code of most risk, and immediate response by the developers on any code or practices perceived by InQuirion to be unusual.
4.2.5 Summary of the on-site schedule
Date |
Purpose |
Participants |
| WEEK 1 |
||
| Monday 18th | Initial meeting | Geoff Lawn |
| Team meeting | PCO Team Alan Grainer OC IRD |
|
| System demonstration | PCO Team (primarily Melanie Bromley and Michelle Antoine) | |
| Tuesday 19th | Review user issues |
PCO Team |
| Systems architecture review | PCO Team, Judy Heaphy, Devon Heaphy, Tim Woodill | |
| Unisys architecture review | Unisys (Alan Grainer, Dermot O'Brien, Lesley Jones, Lee Shelton) |
|
| Wednesday 20th | Website review (including code review) |
Unisys (Helen Hunt, Todd Mansill) |
| Transforms review | Unisys (Ben Horan, Lesley Jones) | |
| CMS review | Unisys (Lesley Jones) | |
| Thursday 21st | Publishing issues |
Brookers (Mark Bacon, Jenny Sinclair, Jane Pilkington) |
| DTD review | e-Gloo (Alan Burton) Brookers (Mark Bacon) | |
| Stakeholder interviews | Office of the Clerk (Fay Paterson, Donna Tunnicliffe) Securacopy (Chris Eales) IRD (Warren Cole, Margaret Nixon) Datacom (Mark Muru, Sarah Weavers) |
|
| Website code review | Unisys (Helen Hunt, Todd Mansill) | |
| Transforms code review | Subcontractors (Bevan Souster, Mike Player) | |
| Friday 22nd | Authoring tool users focus group |
PCO, OC and IRD |
| Stakeholder interview | Parliamentary Service (John Preval) | |
| Following week planning | Geoff Lawn, Anthony Baker, Alan Grainer | |
| Date |
Purpose |
Participants |
| WEEK 2 |
||
| Monday 25th | Progress meeting |
Geoff Lawn, |
| Authoring Tool and rendering engine issues | Alan Grainer ADG (Tammy Halter) | |
| Office of the Clerk issues and background | Donna Tunnicliffe | |
| Print production unit | PPU Team | |
| Tuesday 26th | Code reviews |
- |
| System familiarization | - | |
| Transform issues | Unisys (Lesley Jones Mike Player) | |
| Wednesday 27th | Stage 2 Scoping Document review |
PCO, |
| Systems Testing review | Unisys (Lee Shelton) | |
| Thursday 28th | Stage 2 Scoping Document review ct'd |
PCO, |
| Future enhancements | PCO Team | |
| CMS Technical review | Unisys (Alan Grainer, Tao Zhang) Documentum (Mike Pomponio, Debra Bordignon) | |
| Friday 29th | Legislation tracking |
PPU |
| Support issues | PCO, OC and IRD | |
| Documentum future directions | Unisys (Alan Grainer) Documentum | |
| Review meeting | Geoff Lawn, Anthony Baker | |
In order to assess the ongoing viability of the PAL project and ensure that it is maintainable into the future, it was necessary to interview representatives of the vendors of the main components of the PAL system, ArborText—vendors of Epic Editor and the Epic Print Composer and Epic E3 rendering engines—and Documentum—vendors of the content management system. The vendors were very cooperative in providing access to local representatives as well as senior executive and technical staff in order to provide all the relevant information about future product directions.
The findings that InQuirion makes in this area rely on the information provided to it by the vendors. PCO should be aware that the IT market can change very quickly and even plans outlined by the vendors in good faith to InQuirion may change drastically to accommodate unforseen market developments.
4.4 Preparation of draft report
Subsequent to the on-site visit, InQuirion has spent a number of weeks collating the information gathered from the documentation and the various interviews and focus groups and code examinations to form an opinion on the strengths and weaknesses of the developed system, the integrity of the customizations and the core products, the integration and modularity of the solution components, the scope for future customization and upgrade, the use of industry standards and coding practices, the overall robustness of the solution, and any implications for ongoing operational management and support. These findings and recommendations have been collected together to form this report.
4.5 Presentation of draft report
After delivery of the draft report to the PCO, InQuirion presented the report to the various stakeholders in Wellington on October 1st. InQuirion's lead consultant remained in Wellington for the rest of that week in order to elicit feedback on the draft report and include any changes.
A second draft was delivered to PCO and additional feedback from the stakeholders was provided to InQuirion.
5. Architecture
5.1 Typical architecture for legislative drafting
There have been a number of efforts throughout the English- speaking world to apply SGML and XML technology to the task of legislative drafting, management, and publication. During the information-gathering phase, New Zealand PCO consulted a number of drafting offices in jurisdictions that had either commenced or completed projects to implement these technologies to learn from their experience and avoid their most obvious mistakes. There has been a considerable push amongst Anglo-American jurisdictions for higher quality electronic access to the law and more timely availability of the legislative artefacts. The catch-cry of e-government is not just a political tool—it has the potential to deliver real benefits to the community and enhance the democratic process valued throughout the common law world.
Common to all of these projects are a number of fundamental components. Not all jurisdictions have used technological solutions for each of the components, but each has developed a solution that fits their requirements and budgets.
The intent of a drafting system is to provide a set of tools to support the authoring of legislative documents. That is the primary task of the drafters and the tool in which they author drafts is the most fundamental piece of the solution. A number of different approaches exist. Some jurisdictions have adopted a structured authoring environment that validates SGML or XML drafts as they are created such as ArborText Epic Editor (Canada Department of Justice, New Zealand PCO) or Corel XMetaL (US House of Congress, California, Canada Parliamentary Counsel). Some have adopted a word processor environment for the drafters with down stream processing by additional staff (Singapore using WordPerfect and then FrameMaker+SGML, Western Australia using Word and a combination of SGML tools, Ontario using Word and in-house conversion, Quebec with Word and IroSGML) or interactive conversion on demand (Tasmania and PNG with Word and TeraText for Legislation, NSW PCO with FrameMaker native and FrameMaker+SGML).
The next most important piece of the architecture is the repository for managing multiple versions of the drafts and the statute book. Many jurisdictions have opted for the simpler solution at least in the short term of using the file system as the repository (Singapore, WA, US House, NSW although moving towards a repository technology). Others have used native XML repositories (Tasmania, PNG, Canada Department of Justice, and NSW using TeraText DBS) or traditional relational database systems (California, Quebec using Oracle, Canada House using MS SQL) or document management systems (New Zealand with Documentum, Tasmania and PNG with TeraText DMS).
5.1.3 Workflow
Legislative processes are complex with a large number of steps and the drafting processes that support them involve numerous additional quality assurance steps before entering the public legislative process. Jurisdictions need to track the progress of a file through these processes. Some do it with paper folders (Canada Department of Justice) or with simple database applications (NSW, Quebec). Others have adopted lifecycle support (New Zealand with Documentum) or full Workflow Management Coalition (WfMC) model workflow systems that are part of the repository technology (Tasmania and PNG with TeraText for Legislation, Hong Kong with Lotus Notes).
For most jurisdictions, the consolidation process is manual and typically uses either the same tool as for authoring (New Zealand) or a slightly more sophisticated authoring tool (Singapore, NSW PCO use XMetal for consolidation). Other jurisdictions have adopted a semi- automated (Canada Department of Justice and Quebec with IroSGML) or fully automated (Tasmania and PNG with TeraText for Legislation) consolidation system.
Many users of legislation are fairly conservative and print publications are still an important part of the output process. In most jurisdictions, the legislative process still relies on paper artefacts. Numerous tools are being used by Westminster governments to render the legislation into print format including low-end solutions (Tasmania, PNG mapping XML to Word), medium-cost solutions linked to the editing environment (Singapore, NSW using FrameMaker, New Zealand using Print Composer) and high-end solutions (Canada using 3B2).
One of the major motivating forces for adopting an electronic legislative drafting environment is to provide for more complete electronic delivery solutions, particularly web-based searching and browsing. The level of sophistication of the web delivery solutions ranges from generating static HTML for delivery out of standard web servers (Ontario and New Zealand using XSLT to generate HTML for IIS), through dynamic generation of HTML content direct from the repository (Hong Kong using Lotus Domino) including delivery of the XML to the drafters (NSW and Canada using TeraText DBS) to full point-in-time with all of these services (Tasmania using TeraText for Legislation).
5.2.1 Authoring/Consolidation tool
New Zealand has selected ArborText Epic Editor as the Authoring Tool for PAL. This tool is also the basis of the consolidation tool, with a manual consolidation approach being adopted in the short to medium term. A template for Microsoft Word 2000 has also been developed to support authoring of commentaries by OC, and the Word interchange capability within Epic Editor is used to translate this into appropriate XML.
Because New Zealand PCO has been using document management systems for some time and therefore understands the benefits associated with formal document management, access control, version management, and workflow/lifecycle control, it has selected a major DMS solution, Documentum 4i, as the repository for managing the multiple versions of legislation, draft and published, and for controlling access to those versions.
The advantage of choosing a major DMS such as Documentum for the repository is that workflow comes with it as a bonus. New Zealand has chosen to utilize the more flexible document lifecycle model supported by Documentum rather than a full Workflow Management Coalition workflow enactment service. The more complete workflow capability may be used later to support the functions currently managed by the Legislation Tracking System, a Microsoft Access application that will continue in use after PAL goes live.
Leveraging the stylesheet work needed for configuring the Authoring Tool, New Zealand has chosen a print-rendering tool from the same vendor as Epic Editor, ArborText's Epic Print Composer. This is a medium-level rendering tool that provides support for a number of industry standards including FOSI and XSL-FO.
New Zealand has opted for a fairly low-tech, low-risk web delivery solution using static HTML generated by exporting the XML out of Documentum through XSLT stylesheets. A relatively simple website implemented on MS IIS with ASP.NET infrastructure provides the web logic and dtSearch supports a simple set of searches over the text of the HTML and the metadata stored in HTML-specfic metadata tags.
5.3 PAL development environments
A number of different development environments and languages have been used to develop the PAL system.
Formatting Output Specification Instances (FOSIs) are used in Print Composer and Epic Editor to describe how to render the XML for display and print. This standard is an old Defence standard,2 designed for SGML that allows a developer to describe the presentation of each element and attribute. Like the W3C's CSS standard, FOSIs have limited capability to present content occurring once in the document in multiple places in the document (e.g. the title on the title page and in the running header) or to reorder the content for presentation. Although this is an open standard, there are few vendors still supporting it (primarily ArborText and Datalogics). Most have migrated to the W3C standards CSS3 or XSL-FO.4
ACL is used in Epic Editor to customize the operations of the editor and to provide legislation- and DTD-specific capabilities to the users of the Authoring Tool. ACL is a proprietary scripting language supported only by ArborText products. ACL customizations are collected in one or more packages. Epic Editor also supports scripting in VBScript, Java, and JavaScript (although none of them were used in customizing the Authoring Tool, despite the use of similar languages elsewhere in the project).
DocBasic has been used for customizing the actions when promoting or demoting a document in a Documentum lifecycle. DocBasic is a proprietary language supported only by Documentum, but it is based on Microsoft's VisualBasic, which, while also proprietary, is very widely used. Documentum does not embed VisualBasic into the Documentum product but an experienced VisualBasic programmer would find the code familiar and easy to modify or extend, although without the richness of the Visual Studio development environment.
DocBasic provides a number of APIs for accessing the Documentum features and managing the import and export of documents through Documentum. These APIs are increasingly being exposed through Java.
Java has been used for some additional customization of Documentum and for the agents controlling the rendering of the XML to PDF for print, HTML for the Web, and XML for export to 3rd party publishers. Java is an international standard, multi-platform language with the support of large vendors such as IBM, Oracle, and Sun Microsystems, and the open source community.
Borland's JBuilder has been used as the primary development environment for the Java code although any alternative Java development environment could easily be substituted.
Documentum provide a number of APIs specific to Documentum, and also generic XML APIs (DOM, call outs to XML Schema validators, XPath evaluators, XSLT engines, etc). Documentum version 5i leverages more Java than in version 4i.
XSLT has been used for the transformations from XML to HTML for the web. XSLT is the W3C standard for transforming XML and numerous engines are available to support the language including commercial and open source implementations.
A CSS stylesheet is made available by the PAL website to the user's browser in order to render the HTML page. CSS is the W3C standard for rendering structured documents including HTML5 and XML6 where complex reordering or multiple use of content is not required and numerous engines are available including good support in most web browsers.
C# has been used to implement the website application logic. C# is an international standard language championed by Microsoft as part of their .NET initiative (principally designed to compete with Java). While no other major vendors appear to have embraced it as a standard platform, Microsoft have indicated that .NET with C# as the primary language in that framework is central to their future direction and Microsoft are unlikely to go away within the lifetime of the PAL system. There are some open source initiatives to implement C# compilers and the .NET framework on other operating system and hardware platforms.
ASP.NET has been used to provide the framework of the Website implementation and is used as a wrapper around the C# code and the HTML generated from the XSLT transformations. ASP.NET is only supported within Microsoft's operating systems and web platform, IIS.
As part of the PAL solution, the Office of the Clerk has been provided with Word templates to facilitate its preparation of commentaries. These templates capture some existing macros and provide some standard paragraph and character styles to facilitate easy exportation as XML. Word documents produced by OC using these templates will be converted to XML using the Word importation features of Epic Editor. This process will be managed by PPU.
5.4 Conclusions about the PAL architecture
The overall PAL system architecture follows quite closely the standard pattern of systems of this type. The components selected at the high end of the market—the authoring tool and the DMS—reflect the understanding of the importance of powerful drafting tools to the drafters and support staff, the positive experience that PCO has had in the past with document management solutions, and the complexity of the legislative processes that need to be managed and tracked by the PAL system. An efficient and well-managed back-end process is crucial to providing timely, high-quality service and these selections are entirely appropriate. The components selected at the mid-range or low end—the rendering engine and the web site architecture—seem a little out of place given the capability of the other components. Because of the focus on public access of this project—high-lighting the newly acquired role of the PCO as publisher of legislation and related information—InQuirion has recommended considering upgrading both the print rendering and the website infrastructure.
With respect to the development environments, although a large number of languages and environments have been chosen, none of these environments is of particular concern from a maintenance point of view (other than the sheer number). By selecting a Java-based website solution and using Java for the Authoring Tool customizations and Documentum lifecycle customizations, the use of C#, ACL and VB and similar languages might have been eliminated but most programmers are used to switching from one language to another according to the suitability of the language to the task at hand. All of the languages chosen are suitable for the tasks chosen.
6. Specific questions
6.1 Is the application architecture implemented in a consistent, logical and understandable manner?
Section 5.1 above describes the typical architecture associated with an XML or SGML legislative drafting, management, and publication system. While New Zealand has selected a different tool combination from other jurisdictions, all of the pieces fit into the standard category of tools with some more advanced tools and some less advanced tools reflecting the different emphases, priorities, and experiences of the New Zealand environment. InQuirion is unaware of any other jurisdiction utilizing Documentum for XML legislation management. Perhaps this is because Documentum 4i was the first version of Documentum to support XML documents as anything more than another document type and provide chunking and XML metadata extraction. Most of the tool selections in these other projects took place before that of New Zealand.
While InQuirion has some reservations (set out below) about the capability or flexibility of a few components, the overall architecture is sound and has been implemented in a consistent, logical, and understandable manner. The resulting system architecture is no more complex than one would expect given the complexity of the domain and requirements.
6.2 Does the Epic software (Epic Editor, Print Composer, and E3) have the capability to meet PCO's requirements?
Note: Particular attention should be paid to the extent of stylesheet development in Phase One (pre-go-live) of the project, and the amount of further development planned for Phase Two (post-go-live to completion of the overall project).
6.2.1 Epic Editor—the Authoring Tool
Epic Editor appears to be a good fit to the PCO's requirements. In most jurisdictions where drafters have switched from a word-processor drafting environment to a structured editing environment such as Epic Editor (including Canada where they are migrating from Corel WordPerfect to ArborText Epic Editor), the reactions have been quite negative and drafters have required considerable convincing to migrate to the new environment. By contrast, the reaction of the New Zealand drafters both from PCO and IRD has been predominantly positive. A number of drafters have requested to use the new environment ahead of the system deployment! This openness and acceptance of the Authoring Tool is partly due to the provision of a number of extremely helpful, legislation- and DTD-specific features in the customizations of Epic Editor (worthwhile additions despite additional time and cost if only for this reason), and partly due to the drafters' familiarity and agreement with the aims of the PAL project and their understanding of their role not just as authors of legislative drafts but also as custodians of the statute book.
InQuirion notes that the drafters, secretaries, and PPU staff have generally expressed a preference to operate the Epic Editor in "tags visible" mode. This allows the user to position the cursor with greater accuracy ensuring that any content created is being placed within the right markup. With all the other windows and menus provided by Epic Editor open, and because of the density of the tags as specified by the DTD, this results in a fairly small amount of text visible as the user is drafting. User studies have shown that knowledge workers (of which legislative drafters are an example) generally perform more productively when they have more context available to them. In particular larger screen sizes and resolutions tend to make them more productive. An effective way to overcome this limitation would be to purchase large, high-quality monitors for the PCO staff.
There are some issues to do with management of non-English characters (Unicode) detailed in section 6.5.1.2, that can and should be addressed in the configuration of Epic Editor, but ArborText's documentation suggests that this is a relatively minor change.
6.2.2 Epic Print Composer—the Print Rendering Tool
The ArborText rendering engine, whether in Print Composer or E3, is a mid-range rendering engine. Its cost and its capabilities are lower than the high-end systems such as XyVision's XPP or Advent Publishing's 3B2 but higher than lower-end systems such as the Open Source FOP or a Microsoft Word-based solution. The initial evaluation included the high-end engine XPP from XyVision as well as the selected Print Composer and a number of other alternatives, but not Advent's 3B2. When InQuirion heard that Print Composer had been selected, based on viewing New Zealand Acts and Regulations, this seemed like a perfectly reasonable decision. The rendering requirements for consolidated Acts and Regulations were not particularly taxing, and, providing that Print Composer or E3 could satisfy the throughput and turnaround requirements of the PCO, it seemed that the increased costs for a high-end rendering engine were not really justified.
The ability to produce accurate, high quality, timely print renditions of legislation—in draft, final, and consolidated form—is crucial to the success of the PAL system. A number of different stakeholders have considerable reservations about the current print rendering solution. InQuirion shares their concern.
These issues focus around performance, functionality, and future fit, and the economics of continued deployment.
The rendering engine performance, the time taken to render large documents, is of considerable concern for a number of the stakeholders. InQuirion's interaction with the system suggested that rendering a single large document could take well over 20 minutes. Regardless of the lack of specific expectations in the formal requirements documents, this is clearly unacceptable by any reasonable subjective analysis.
In the current transform environment, only one such rendering can be done at a time so even small jobs might be delayed by more than 20 minutes if somebody else is working on a large Bill. Even with print previews (which are not bottlenecked like the transforms), most Authoring Tool users (whether drafters, secretaries, RU, or PPU) are likely to want to print preview a document many times a day. That performance will severely affect productivity if any user is working on a large Bill, Act, or Regulation.
Once the Parliament or a Committee has approved a document, there are often tight time frames in preparing it for the next stage, be it Assent or second or third reading. Minor changes early in a large document will necessitate repagination of the entire document. Any manually inserted page breaks will require user interaction. In order to set these breaks, a large document may need to be rendered many times, each time setting a new break and checking the resulting flow for subsequent pages. Therefore, what can be considered an acceptable rendering time is dependent on how the user invokes the stylesheet. A once off invocation can be significantly slower than an interactive invocation and still be usable. But Unisys is proposing that some issues be dealt with using manual intervention.
The bottleneck for transforms may be addressed by duplicating the print server (probably requiring the purchase of additional Print Composer licences) or by migrating to Epic E3, which allows multiple simultaneous rendering jobs, unlike the Epic Print Composer/Java agent solution that serializes all print requests. However, neither PCO nor InQuirion have been shown any evidence to suggest that individual print previews will be any faster (or indeed batch processes).
For reprints and bound volumes, the issue may be even more difficult. Since bound volumes number pages from the beginning of the first Act or Regulation in the year and end with the last page of the last Act or Regulation in the year, and Epic Print Composer (and InQuirion presumes E3) cannot set an initial page number, the entire year's collection will need to be rendered as a single document. How long is it going to take to render many thousands of pages and what other jobs will be blocked while this is happening?
It is possible that expanding the entity references in the FOSIs so that a single file is loaded rather than the large numbers of files currently loaded for each FOSI could reduce the rendering time for smaller documents. This is unlikely to be a significant factor in larger documents.
6.2.2.2 Functionality and future fit
InQuirion also has a number of concerns about the capability of FOSIs as implemented in the ArborText products as being sufficient to cope with the PCO's requirements. PCO's experience in the past has been that, when an issue is raised in one stylesheet, it is fixed there but the changes are frequently not propagated to other stylesheets or even similar elements in the same stylesheet. Frequently fixes to one aspect of the rendering have introduced new problems in other areas.
While it could be argued this is simply due to insufficient regression-testing infrastructure, InQuirion believes that it is symptomatic of the deeper issues discussed below.
Revision tracking
Although InQuirion has considerable experience with legislation and legislative drafting environments in a number of jurisdictions, every jurisdiction has its own unique requirements. New Zealand's extensive use of select committees (nearly every Bill is referred to a select committee) and committee of the whole House has lead to a sophisticated markup scheme to track the changes from first reading through the various committee processes to the third reading. The current system uses strike-through and underline for revision-tracked versions provided to the select committee, and subsequent changes are tracked using a combination of vertical and horizontal bracketing and other typographic markers.
As part of the PAL project, it was decided to use a modified form of the strike-through and underline style to represent changes coming out of select committees and the committee of the whole House for the consideration of the next legislative stage. As part of this decision, the level of changes was to be restricted to 2 (formerly up to 5 levels of markup were represented). These changes were made partly as a concession to the technology and partly to aid readability. This uses a mixture of single strikethrough and underline (to represent the first level of changes) and double strikethrough and underline (to represent the second level). For select committee output, bold strikethrough or underline represents a unanimous decision and normal weight strikethrough or underline represents a majority decision. Since ArborText's Epic Print Composer was unable to change the weights of strikethrough or underline independently of the weight of the font, a number of additional fonts were created to simulate this capability.
Although there are still outstanding issues with rendering revision tracking, this solution should be able to support the current PAL requirements for revision tracking.
The number of rules
One consequence of the revision tracking requirements is the very large number of rules. FOSIs typically contain many rules for each element, particularly if the element can appear in multiple contexts. ArborText products place limitations on the number of rules based on attribute values per element. This is because each rule has to support a single value for each attribute on each element. That means that an element with multiple values has to have a rule for each combination of different values of attributes. This can produce a huge volume of rules in some instances.
The volume of rules and the need to maintain identical, or worse, similar, rules in multiple stylesheets increases the development and maintenance overhead, the time taken for testing (as tests need to be applied to every different context), and the chance of human error in propagating changes in one stylesheet to all appropriate stylesheets and checking that new rules do not introduce unintended side-effects. While entity references have been used extensively in the FOSIs in an attempt to eliminate duplication, there are still numerous examples of very similar rules for an element appearing in multiple stylesheets. Most of the time, if one is altered, they all need to be altered in a similar way. ADG described a manual checklist that they use to propagate changes to 16 different contexts.
DTD issues
Legislation DTDs are necessarily complex. It is rare to have fewer than 100 elements in a legislation DTD. The markup is dense, typically 5-10 times more tags per word than normal prose. Even in a very carefully crafted legislation DTD, it is likely that the stylesheets will be large and complex.
But the PAL DTDs exacerbate this problem. The "Para" element appears in just about every possible context within the "Body" (and many others besides). However, most of the "Para" tags could be removed from the instances without affecting the representation of the logical structure of the document. While most tags in the PAL DTDs correspond to a concept or concepts that drafters have named (and use in reference and amendment wording), there is no such concept matching the "Para" tag. Ontario, Canada, Tasmania, PNG, and US Congress all have DTDs that are similar in structure to the PAL DTD but without the "Para" element. They support a wide variety of formatting and flexible editor interactions (including in some cases promote and demote). While New Zealand's revision tracking requirements are more extensive than these other jurisdictions, the revision-tracking markup in the New Zealand DTD is based on that in the Tasmanian and PNG DTDs. The "Para" element is completely independent of revision tracking.
Provisions within the body have a very regular structure (with a very regular formatting to match), but, within Schedules, this structure is much less rigid and the formatting is much more varied. The PAL DTD uses the same tag for provisions within the body as for provisions within Schedules.
These two factors are examples of how the existing DTD has made the rendering stylesheets more complex than they need to be. At this stage of the project, changing such fundamental parts of the DTD is not feasible. For instance eliminating the "Para" element would require changes to most Authoring Tool customizations, changes to the stylesheets for authoring, print rendering and transformation for the web. It would also require reconfiguration of the CMS and possibly changes to some of the Java code. Such a mammoth undertaking is not justified, as the benefits could be small and the costs in time and money considerable.
Large number of formatting issues outstanding
There are still a large number of formatting issues outstanding. A few minor formatting glitches are to be expected in introducing a new automated rendering system and some minor degradations in quality, such as unfortunate line or page breaks, are normal sacrifices in order to gain faster turn around. But legislative documents are amongst the most important documents in the land. The sheer number of formatting issues raises concerns about compromising the authority and status of the documents. Even if the issues can be easily overcome, a vigilant Opposition with the desire to delay or prevent the passage of some Bill, will not miss an opportunity to slow the government's legislative agenda by drawing attention to even minor presentation issues.
While the number of outstanding issues would be more easily resolved if there were one stylesheet not several, it is InQuirion's understanding that limitations of the rendering engine require the stylesheets to be split.
Line numbering
Among the unresolved formatting issues, there are still serious problems with the automatic line numbering of Bills. Any problem with line numbering (or any other rendering) is likely to distract the Members from the debate at hand. Committee amendments are worded using these line numbers and incorrect line numbers, particularly if two lines effectively receive the same number, will interfere with debate on the proposed amendment. PCO has currently been delivered version 4.3.1b of Epic Editor/PrintComposer. Fixes for line numbering have been included in version 4.3.1d but these fixes have not been delivered to, or tested by, PCO.
Manual intervention
Manual intervention is currently required to avoid a number of issues with the rendering. At least one formatting issue must be corrected by manually inserting page breaks in the markup to avoid the occurrence of a particular combination of elements at the top of a page.
The Epic suite also has the capability to manually alter the rendition directly (Epic Editor's Touch-up Tool). This results in processing instructions being inserted into the markup. While this is a standard industry approach where the problem is application-specific, if the need for intervention is inherent in the document instance rather than being application-specific, the changes should be incorporated into the element and attribute markup.
The danger of this approach is that, unless the XML with the processing instructions is correctly inserted into the repository, although a correct PDF is produced, the next time that XML document is rendered the old issues will reappear. Even if the processing instructions make it into the versioned repository, using such processing instructions limits the mobility of the data between applications. The lifetime of a legislative document may be many hundreds of years. The lifetime of a typical information technology system is rarely more than 10 years. Other rendering processes (in particular the generation of HTML for the web) currently ignore these processing instructions so these changes are not reflected on the web site. If these changes are simply changing a font size (which web users can do in their browser anyway), this is not problematic, but if it involves correcting autotext or correcting indentation for sandwich text, the changes may have significant semantic connotations and must be propagated to all publishing formats.
The importance of ensuring that the PDF created for the Parliament is the same as that sent to the website should not be underestimated. Currently, every time a document is promoted as part of the lifecycle, a new PDF rendition is created and, for those stages that involve tabling the document, sent to the website. This is currently done without checking for a previous PDF rendition whether created with or without manual intervention. At the very least, the CMS should check for a PDF document that has already been created and only create one if a previous rendition does not exist. This will prevent the possibility that the PDF sent to Parliament and to the web are different.
This approach of manually correcting rendering is part of the tradition of typography and publishing, but it fundamentally undermines one of the major drivers for utilizing structured markup such as XML. A successful XML system should allow the content to be rendered automatically without human intervention to each of a number of output formats—in this case PDF for a Bill, PDF for the subsequent Act if any, PDF for a later consolidation or official reprint, and HTML of any or all of these. This should be possible by matching the complexity of the rendering requirements with the capabilities of the rendering tool.
6.2.2.3 Economics of continued deployment
To address these and other issues, there is considerable stylesheet work proposed for both Phase 1 and Phase 2. This work combined with the editor customization is currently on the critical path for the Phase 1 deployment. Although review of the project plans and contractual arrangements is not within the scope of this review, the cost of this work alone, without factoring in the down-stream maintenance costs, is substantial.
Unisys suggests that Epic E3, if used as a replacement for Epic Print Composer, may address the concerns that InQuirion has raised. In order to replace the current rendering engine, Unisys has identified a number of sources of additional cost including:
- evaluation of the alternative,
- acquiring the new software,
- constructing the stylesheets,
- new integration work,
- project management,
- specification workshops,7
- doing the remaining work on the Authoring Tool, and
- additional training costs.
Of these additional costs, using Epic E3 over another alternative rendering engine such as XyVision's XPP or Advent Publishing's 3B2, potentially saves only the additional stylesheet work (since the same FOSIs as used in Epic Print Composer apply) and some of the additional training (since the FOSI training already received presumably still applies).
While Unisys claims that much of the stylesheet work proposed for Phase 2 is for the Authoring Tool environment rather than print rendering (specifically around Reprints and Bound Volumes), there are a number of level 1 and 2 issues required to be fixed before deployment of Phase 1 and a number of level 3 and 4 issues that are currently scheduled for Phase 2. This is in addition to the reprint and bound volume work (which requires both editor and print rendering customization). The argument in favour of using ArborText technologies for both authoring and print rendering relies on being able to share that work. However, if the print rendering is still not satisfactory, a new replacement solution may still be required. The economic argument of shared work between the authoring tool and the print-rendering tool loses much of its appeal.
The complexity of the FOSI rules and the reappearance of old issues in new contexts in past work suggests that work on reprints and bound volumes is just as likely to create new issues as to resolve them. Higher-end tools such as XPP and 3B2 use more flexible rule models than FOSI, and the effort to create the stylesheets, and the effort for ongoing maintenance of the stylesheets could be considerably reduced. While these tools use proprietary stylesheet languages, not many vendors support the FOSI standard so the benefits of an open standard are limited and tend to be outweighed by its limitations.
XPP was evaluated in the initial rendering engine selection process but because of the high cost of the product and the lack of a local distributor, 3B2 was not considered. Since the evaluation, Advent Publishing has appointed an Australasian distributor (Allette Systems based in Sydney) and is aggressively pursuing the Australasian market with local support and significantly reduced licence fees. This reduces the costs and risks of development and ongoing maintenance of a solution based on 3B2.8
While it was not within the scope of this review to examine project plans and cost and schedule implications, InQuirion's experience with the 3B2 rendering engine as used in the Canadian legislation project with ArborText's Epic Editor suggests that it is possible that the costs of acquiring and configuring 3B2 could be lower than the costs of addressing the current issues with the ArborText products and the development time frame should not be adversely affected. Further cost savings in lower maintenance and ongoing development costs are also likely. The same is possibly true of the XPP engine from XyVision (although InQuirion's familiarity with this engine is not as great).
It has been claimed that ArborText have never reached the attribute limitation in any other implementations despite implementing a number of legislation solutions. This suggests to InQuirion that perhaps the ArborText tools are being used beyond their capabilities, either because the New Zealand requirements are more complex than other jurisdictions (which is certainly true of the revision tracking requirements), because the DTD is more complex than other jurisdictions' DTDs (through a mixture of requirements and design decisions), or because the stylesheets have not properly utilized the product's capabilities.
Unisys subcontracted out development of the document analysis and DTD design to eGloo and the FOSIs and other Authoring Tool customization to Absolute Data Group (ADG). The eGloo principals have worked on a number of legislation-related projects including the NSW PCO. While there are some issues with the DTDs as described above, the general philosophy behind the DTD design is sound. Addressing these issues is likely to create large amounts of additional work without any guarantees of significant change in complexity of the FOSIs or improvements in reliability or throughput. ADG are trained in the ArborText products and experienced in developing FOSIs and editor customizations with ArborText and other products. InQuirion has not identified any major issues or inconsistencies in their customization. This suggests that New Zealand's requirements possibly justify a more flexible rendering engine.
For these reasons, InQuirion has strong reservations about the ability of Print Composer to support the current rendering needs of the PCO, let alone the future needs. InQuirion is happy to entertain that E3 may be able to satisfy PCO's requirements. Our concern is that, while E3 may address some or all of the performance issues, the complexity and maintenance overheads of FOSIs will remain along with the outstanding functional and formatting issues.
In order to address the issues of specification which have been raised as problematic by Unisys, InQuirion recommends that PCO should prepare a set of sample documents, the rendering specification, and example PDF or paper output showing what PCO consider the output should look like (see section 7.2) together with some indication of what issues PCO feel are most problematic in the current environment (e.g. throughput, line numbering, revision tracking).
This should address the issue of specification workshops without additional cost to Unisys. This test set is necessary for acceptance testing, but could also be used for unit, system, and integration testing. However, InQuirion recommends that the test set be provided to a selected number of vendors to demonstrate their capabilities with those examples on candidate replacements for Print Composer including ADG (ArborText) with E3, Allette (Advent Publishing) with 3B2, and possibly XyVision with XPP. PCO and Unisys should seek a fixed price quote from subcontractors based on this example set so it is in PCO's interest to ensure complete coverage of all of the issues.
While InQuirion acknowledges that the risks of switching rendering engines at this late stage of the project are not trivial, we consider that the risks of retaining the existing rendering engine and customizations are comparable.
6.3 Does Documentum 4i have the capability to meet PCO's requirements?
Note: Particular attention should be paid to chunking.
The current release of the PAL system utilizes Documentum's version 4.3.2 client and 4.2.3d server. InQuirion believes that PCO could go live with these versions, however it may be prudent to consider upgrading to 5i earlier rather than later (5.2 seems the most appropriate candidate). The PCO and IRD require the ability to chunk Bills at Parts and Subparts so that multiple drafters can work on different Parts or Subparts simultaneously to increase the throughput for drafting large, complex Bills (see Content Management System Design section 7). The main issues are:
- 1. Documentum 4.3 does not support the ability to turn-off chunking just for those Parts and Subparts that appear in amendment wording (as provisions to be inserted) without inserting additional attributes, which would require additional work in the Authoring Tool and add more application-specific markup to the XML data;
- 2. Documentum 4.3 does not provide a mechanism for autotext to be used in the metadata describing the chunk, meaning that there is no simple way to display autotext Part numbers when browsing through the list of chunks that make up a particular Bill (once the Part number is fixed, this problem vanishes);
- 3. Documentum 4.3 can be significantly slower at checking in large XML documents with large numbers of chunks than the current release; and
- 4. Documentum 4.3 relies on four attributes that it inserts into the elements on which it chunks to identify the chunk when later versions are checked in. If in the editing environment, users naïvely copy and paste an element that is a chunk, these attributes are reproduced even if the vast majority of the content is changed. This is actually a feature rather than a limitation of the product but it is not the desired behaviour in a legislative drafting context and can only be addressed by Authoring Tool modifications. A package has been created by ADG for the Authoring Tool but it does not replace the existing cut-and-paste functionality. It is merely an additional tool. ArborText claim in their release notes that this is fixed in ArborText Epic Editor 4.4 for the Documentum/XML Adapter.
With the exception of issue 4, these chunking limitations are potentially annoying for the users but do not affect the data integrity and issue 4 cannot really be addressed in the CMS.
Documentum have assured InQuirion that Documentum 5 has addressed issue 1 allowing fragmentation rules to refer to their parent elements without causing exceptions. They have also spent considerable effort optimizing the XML capabilities of the product (including fragmentation) and suggest that considerable performance improvements can be gained by upgrading, therefore addressing issue 3. They have also suggested that upgrading from 4 to 5 should require very little if any coding changes. Going live with a release that is already more than a year old (4.3) means that PCO is likely to have to upgrade within a year of going live anyway (Documentum typically support a release for about 2 years after releasing). PCO already have licences to the upgrades included in their software maintenance agreement. Given that the CMS is not on the critical path for either of the Stage 2 phases as currently planned, despite the belief that 4.3 is an acceptable release on which to go into production, the benefits of upgrading earlier are compelling.
Note that, in order to receive the performance and functionality benefits, both the client and the server installations would need to be upgraded.
6.4 Does the mix of package customisation and bespoke development support future development and package upgrade without major rewrites or design changes?
Note: Particular attention should be paid to the amount of development required for Phase Two (post-go-live) of the PAL project, and the ability to further develop the system beyond Phase Two.
It is inevitable in a large IT project such as this one that no one package will do everything that the users require "out-of-the-box". The PCO sought an Implementation Partner because they recognized a need to configure and customize the generic applications (Epic Editor, Epic Print Composer/E3, Documentum) to support the business processes and operational requirements of the users.
Such configuration and customization always risks that future versions of the software on which the system is based will require changes to that customization. The extent to which changes are required is largely in the hands of the vendors of the underlying software components. Comments in this section rely heavily on information supplied to InQuirion by ArborText and Documentum. Some speculation is also included based on InQuirion's knowledge of the broader XML market.
The lifetime of such a large IT system is usually of the order of 5- 10 years and predicting anything in the IT market beyond a year or two is highly speculative at best. An unequivocal answer to this question is therefore not prudent or practicable. InQuirion has identified a number of issues pertinent to future development and package upgrade for each of the components of the PAL system.
The Authoring Tool configuration uses FOSIs for rendering the documents and ACL packages to implement custom tools for authors. These configurations and customizations cannot be readily migrated to alternative authoring tools from other vendors. In the unlikely event that PCO would choose to migrate to an alternative authoring tool in the near term, new stylesheets and customization would probably need to be developed from scratch for the new target authoring platform. But ArborText has supported both FOSI and ACL for many years and has a large installed user-base with customizations using these tools. They are unlikely to discontinue support for either in the near future. They also have released a tool that reads in existing FOSIs, allows modifications to be made, and outputs either FOSIs or XSL-FO based transformations. If they were to discontinue FOSIs in the future (presumably in favour of XSL-FO), this tool should minimize the human effort required to migrate. Migration from one version to the next of Epic Editor in the past has required little or no additional coding. ArborText also have an excellent record in supporting older versions of their product.
Looking ahead a little further, Microsoft has released beta versions of the next version of Word (2nd quarter 2003), which provides an XML editing solution as part of its Professional Office suite. While this product is not as mature or sophisticated as Epic Editor, it is likely to have an impact on the market for Epic Editor. In the short term, this may actually increase the market as XML solutions become widespread but power users require more than the new version of Word offers. As future versions of Word match the capabilities in Epic Editor, the long-term viability of the product may be in jeopardy, but this is likely to be many years into the future, probably beyond the useful life of the current PAL implementation. If it were to dominate earlier, Documentum are already working on integrations to the new XML capabilities so migrating to Word as an XML authoring platform would not have a significant impact on the CMS.
InQuirion's concerns about the rendering engine, Epic Print Composer, are explored above in section 6.2.2. The major concern with future development is the complexity of the FOSI stylesheets and the maintenance burden of managing so many similar stylesheets.
InQuirion agrees that maintenance of stylesheets would be much simpler if the number of stylesheets were reduced but understands that the number of FOSIs was increased partly because of limitations of Print Composer. Tasmania and PNG both have a single print rendering stylesheet that covers Bills at every stage, Acts for assent, for loose- leaf publication, annual volumes, and consolidations as well as draft, final and consolidated regulations.
Looking more long term, CSS has effectively replaced FOSI as a language for describing how to display XML documents, and XSL-FO is emerging as the standard page description language in the XML community. While XSL-FO still has a number of limitations that prevent it from being used in high-end publishing environments, these limitations are certain to be addressed within the next few years. ArborText, as a leader within the XML community, has already embraced XSL-FO and the rendering tools used in the PAL project support using XSL-FO to render XML to paper as well as the FOSI approach adopted. At some point in the future, it will become feasible to migrate to an XSL-FO solution, which would give the PCO more freedom to change rendering engines. InQuirion supports the conclusions in Authoring Tool End User Specification that FOSIs currently provide superior formatting capabilities to XSL-FO and, given the selected rendering technologies, were the appropriate choice but notes that the XSL-FO standard is likely to evolve to a point were it would be likely to support the PCO requirements some time in the future.
Note that both 3B2 and XPP support XSL as well as their own proprietary formatting languages. Neither support FOSIs directly.
While Documentum have assured InQuirion that migrating from 4i to 5i is a relatively straightforward exercise, InQuirion recommends that Unisys or the PCO involve Documentum technicians for any upgrades to ensure that the full benefit of their experience with other customers is available. A significant proportion of the configuration of Documentum, particularly relating to promoting and demoting documents in the lifecycles, is written in DocBasic. InQuirion understands that, in 4i, DocBasic was the only option for this configuration.
Documentum, following the lead of larger repository vendors such as Oracle and IBM, have embraced Java as the platform of choice for configuration and customization. While they have not stated when they will discontinue support for DocBasic, it is likely they will discontinue support for DocBasic in some future release. If this point is reached, the PCO will either have to remain with the version that supports DocBasic (Documentum typically support a released version for at least 2 years and will support it longer by negotiation), or migrate the DocBasic code to Java.
The current website is based on a simple, robust model. The transformations export XML out of the CMS, and transform it into static HTML as whole documents and as fragments, and as XML stripped of internal tags for delivery to 3rd party publishers, and PDF where available. These documents are placed in a directory structure known to the web application so that it can deliver the correct documents as pages on request. Search is supported by dtSearch, which indexes the text contained in the HTML documents, as well as the metadata embedded in the HTML. The file structure represents the current version and previous versions allowing navigation from the current version to previous versions.
The website is simple and usable but does not support any advanced features such as "point-in-time" searching or browsing, or search by particular features of the markup such as text occurring in Part headings or headnotes, or search with or without macrons in Maori text. The current web application is not sophisticated enough to support these requirements, should they be required in any future development. The support for XML in dtSearch seems to involve simply extracting text from an XML document and treating tag names as words for the purposes of search unless they are explicitly listed by name to be ignored. It does not appear to support the extraction of metadata fields using generic XML markup or limiting of the search scope based on generic markup.
Because static HTML is all that is available to the website, it is not possible to search and browse the legislation to find components of the legislation to download as XML for the drafters to incorporate into their drafts.
While the current website is adequate, there is little scope for using the existing framework to expand the capabilities of the system. Significant expansions in capabilities would require a system capable of parsing XML to extract relevant search fields, manage Unicode for handling the Maori macrons, and deliver either dynamically rendered HTML or XML as required.
Note that a complete audit against the New Zealand Government Web Guidelines (http://www.e.govt.nz/standards/web-guidelines/web-guidelines-v-2-1/) as outlined by the State Services Commission was not within the scope of this technical review. InQuirion does note however that the requirements in section 5.2.6 regarding Maori content and section 6.3.8 regarding macrons and Unicode do not appear to be met in the current web site.
6.5 Are recognisable standards applied and consistently implemented?
The PAL system is based around the dominant document standard, XML. This international, vendor-neutral standard designed for representing documents but now used for metadata and other data exchange has been adopted and embraced as a core Web technology by virtually every major technology company including Microsoft, IBM, Oracle, Sun Microsystems, and numerous other organizations. Based on the ISO standard, SGML—widely used by legal publishers including Brookers, LexisNexis, and Legislation Direct, XML is a simplified subset designed to support web applications. Both SGML and XML allow the logical structure of a document to be separated from its presentational forms—in stylesheets. This makes it ideal for representing legislation and similar documents, which are characterized by their dense and regular structure and typically require multiple presentation forms including print and electronic renditions.
SGML is virtually unchanged since becoming an ISO standard in 1986. In that time, Microsoft Word superseded WordPerfect as the dominant word processor, and Word itself has changed underlying representation at least twice. SGML, an ISO standard, and XML, a W3C standard, are both maintained by large, international organizations that are not controllable by any one vendor no matter how influential. Changing these standards involves lengthy consultation and review processes and participants in the standards process are likely to resist any changes that jeopardize backwards compatibility. This makes SGML, and particularly XML, an excellent choice for representing long-lived documents such as legislation.
While SGML was a leading standard and adopted widely in the legal publishing and technical publishing (particularly defence and aerospace) industries, its complexity and the cost of tools implementing and supporting the standard made it inaccessible to many organizations. XML is a more mainstream technology and support for XML can be found on computer environments—in web browsers, in standard office applications, enterprise repository systems, and most software development tools. Using XML rather than SGML should ensure the availability of quality tools well beyond the lifetime of the current PAL system.
PCO and Unisys have selected reputable vendors who are active participants in the development of XML and related standards. ArborText have been producing SGML products for many years and were part of the initiative to create XML. Brookers, and their parent company Thomson, have a long history in applying both SGML and XML to legal documents including legislation. Documentum is a more recent entrant into the XML standards community but has demonstrated a commitment to open standards throughout its products and its presence on W3C standards committees relating to XML.
Just using XML does not really give any benefit unless certain philosophical approaches associated with XML are also utilized. XML typically uses a Document Type Definition (DTD) or Schema to define the allowable markup. It specifies the elements that can make up a document, their names, their content (what elements they can contain and in what order and combination), and the attributes associated with them. A DTD can be defined to strictly enforce a regular structure (like the body of legislation), or to be loose to allow the maximum flexibility (like HTML).
A DTD can be defined to use purely presentational markup—like that found in older versions of HTML—as some traditional publishers have done. This simply uses XML as a replacement for the old typesetting languages where headings are marked-up not by their logical function identifying them as say a Part Heading but by their current typographic representation as say 14pt Roman Bold centred.
However, SGML and XML were created to allow documents to be marked with their logical structure, identifying the function of pieces of text rather than a single typographic representation of that function. This allows multiple different presentations of the same function in different contexts—varying how a document looks as a paper document or on the web, varying the font size up for partially visually impaired users or down for users of small screen web devices such as PDAs or mobile phones. Rather than embedding in the statute book the exact typographic conventions of 2003 and the limitations of the XML tools available in 2003, XML allows the fundamental information to be stored without fixing the data to a particular rendition. A DTD that prescribes structural markup rather than presentational markup is much more consistent with the XML standard and its intent.
The PAL system and the data to be managed by it generally follow these principles well. The DTD represents the logical, structural components of legislation rather than particular typographic features. The markup is generally free of application-specific artefacts that are likely to change as the tools to manage the data change. There are a few exceptions—the attributes inserted by Documentum 4i when it chunks elements, and some attributes to get around the limitations of the ArborText Print Composer in resizing running headers to fit extra long legislative titles—but these are isolated and could easily be removed programmatically if the tool sets for managing the collection were to change.
InQuirion has reviewed the DTDs and has had the opportunity to consult the report from Complete Data Solutions Ltd on the DTDs and agrees with many of its findings. The DTDs are overly complex, with a number of redundant tags (in particular the "Para" tag—see section 6.2.2.2). The decision to use the same tag set for marking the very regular provisions in the body of legislation as the far less regular provisions in Schedules substantially weakens the ability of the DTDs to validate the structure of those more regular components. Although the DTD documentation refers to separate validation tools or "contextual rules" to support validating the documents against the prevailing drafting style guidelines, no such tool is part of the current system. Nor is it proposed for stage 2 development.
While simplifying the DTD would enhance the productivity of the users by reducing the number of tags visible and hence increase the actual content visible as a drafter is writing, InQuirion feels that it would jeopardize project schedules and introduce additional costs to change the DTD radically at this point. This can be readily addressed by introducing larger screens for users, which has other benefits, see section 6.2.1.
One important difference between SGML and XML is the character code set and encoding. While SGML supports a wide variety of character encodings and does not mandate support for any particular character set (except perhaps for ASCII), XML mandates support for a single, universal character set—the Unicode 3.0 standard (ISO10646) and support for both UTF-8 (8-bit) and UCS-2 (16-bit or 2-byte) encodings of that set. This character set, in addition to the standard Roman characters used in Western European languages, provides character support for most widely used languages including Chinese, Japanese, and, of more interest to the New Zealand context, macrons as used in Maori.
Unicode represents traditional Roman characters using the same code points as traditional ASCII, code points 0-127. While traditional 7-bit ASCII has been extended with a number of 8-bit variants to 256 characters, this is not nearly enough to encompass all of the characters in a language such as Chinese or Japanese, let alone to manage a number of different languages in a single encoding. Systems that rely on 8-bit character sets have to switch fonts or encodings to manage the additional characters. Because there are so many alternative encodings, this introduces a number of problems. Unicode offers up to 2 billion code points (it assumes a native 32-bit encoding but partially multiword 16-bit and 8-bit encodings are possible). In practice, significantly fewer code points are required—the standard Unicode 3.0 character set can be represented in just 16-bits. The macrons used in Maori appear at code points 256 and following.
The SGML basic set assumes a 7-bit character set. The standard way to represent characters above 127 in SGML is to use entities. There are two types of entities—named general entities that allow the DTD to specify to what character a symbolic name is mapped ("ā"), and numeric entities that represent a character by using its numeric code in figures ("ā"). SDATA entities are named general entities that can encapsulate application-specific instructions such as to switch a font or encoding. These are not supported in XML at all. While XML supports named general entities and numeric entities, a conforming parser is required to expand all entities, which means that macrons represented as entities are replaced by an encoding of that character that cannot be easily returned to their entity form. ArborText Epic Editor as a traditional SGML tool supports SDATA entities and by default preserves entity references (even for XML which is not strictly conformant). It can be configured to expand entity references on export and to convert characters with codes above 127 into entities on import but this has not been done in the PAL configuration. That means that importing an XML document into Epic Editor (as currently configured) where the named general entities have been parsed into their corresponding numeric encoding results in errors.
The CMS parses these entities as a conformant parser (provided that the appropriate entity tables are made available as they have been) but, because it stores the XML exactly as it is provided rather than returning the result of a parse, it preserves the XML documents with the entities intact. However, other components within the system that must parse the XML and output results derived from the parse are not free to preserve the entities. An approach more consistent with XML and Unicode for higher code points would be to configure ArborText Epic Editor to ensure that documents going in and out of Epic Editor used native Unicode encoding rather than relying on preserving the named general entities currently used. ArborText documentation suggests that this is a simple configuration.
Note that the New Zealand Government Web Guidelines section 6.3.8 suggests using UTF8 and the combined macron characters ("a", "e" etc,) for encoding macrons. These guidelines wrongly describe UTF8 as encoding a subset of Unicode (it encodes all Unicode characters) and fail to consider the use of canonical decomposition to deliver macrons to those browsers that understand them while still retaining the underlying vowels for those that do not. A canonical decomposition of the Unicode (e.g. "a" becomes "a" followed by the "¯" combining diacritic) will produce just the vowel in older browsers or on systems without Unicode fonts and the vowel with the macron on newer browsers with appropriate fonts.
Most legislation in New Zealand is in English. XML defines a standard way to represent the language of a piece of text. Using the "xml:lang" attribute (which must appear in the DTD), the language of the text can be specified using a two letter mnemonic in lower case for the language (in this case "en" for English),9 a "-" separator, and a two letter mnemonic in upper case for the country (in this case "NZ" for New Zealand).10,11 However many pieces of legislation, particularly those to do with Maori land rights or treaties also contain elements with text contents entirely in Maori, particularly Preambles. For the purpose of searching and possibly of spell-checking, it may be useful to default the "xml:lang" tag to "en-NZ" for most elements but apply the "xml:lang" attribute to those elements with a value of "mi-NZ" specifying that the text is in Maori. This can be directly output as a "lang" attribute in HTML 4.0 or "xml:lang" attribute in XHTML 1.0 and many web search engines will make use of this information for more directed searching.
This feature should be considered as part of the Phase 2 work. At the very least, a record of which elements in which legacy documents contain all Maori text should be kept during officialization.
The FOSI standard was originally developed by the US Department of Defense as part of the CALS suite of standards.12 It was designed to provide a means of mapping a logically encoded SGML document onto an appropriate presentation of that document—for paper or screen. While it was the first stylesheet language developed for SGML, it was never adopted widely outside of the defence/aerospace community, and has since been superseded by DSSSL (the forerunner in many ways to XSL), CSS, and XSL-FO.
FOSIs are the way in which ArborText Epic Editor and Print Composer are usually configured to render SGML or XML. They are being applied appropriately in the PAL system. Although ArborText products also support XSL-FO as an alternative mechanism for rendering, as discussed above in section 6.2.2, XSL-FO has its own limitations. The ArborText FOSI capability is being stretched to its limit in the PAL application but this is partly due to application limitations, partly due to limitations of the DTD, and partly due to the complexity of the formatting requirements all of which have been discussed at length in section 6.2.2.
The risk here is that the customization for print rendering and even interactive authoring, may simply not support the addition of sufficient numbers of new rules to correct some of the existing outstanding rendering issues as stated in the Phase 1 and Phase 2 scoping document, let alone on-going development and enhancement.
XSLT has emerged as the standard way to transform an XML document into another XML document or an HTML document. It is currently being used to produce HTML and XML for the PAL website and subscription service. The XSLT code is clean and well structured and being used entirely for its intended purpose.
The XSLT engine included in the Documentum product quite rightly expands all general entity references (used for some more unusual characters including macrons) into their native Unicode encodings. Any system that did not would not be conformant. This has caused some problems for the current configuration of the Epic Editor.
Java is a general purpose, multi-platform programming language. Java is particularly compatible with XML as it also mandates support for Unicode 3.0 character set (in particular the UCS-2 16-bit encoding). The Java code examined appears to follow the Java standard and its use is consistent with industry standard practice.
The one notable exception is that there are a number of places within the PAL system where XML transformations are being performed by Java code accessing a standard set of libraries for manipulating XML known as DOM. InQuirion is of the view that it would be more consistent to use either just Java code and the DOM to do all transformations or just XSLT. XSLT is being avoided because XML conformance requires output of a parsed XML document to expand named general entities (representing macrons and other non-ASCII characters). The use of these DOM manipulations could cause problems in later versions of software because a DOM API should be implemented on a conformant XML parser that should be expanding the entities anyway—later versions of the DOM API may change the behaviour with entities compromising the integration between the system components. A better approach would be to ensure that all components of the PAL system were configured to accept all valid XML and to conform to the XML specification when importing or exporting XML.
Note that there is no current problem with the existing configuration, in fact the use of Java instead of XSLT was adopted specifically to ensure reliable round-tripping between system components.
The .NET standard is being utilized in the PAL website. .NET is a Microsoft standard framework which includes the programming language C#, a virtual machine, and an approach to publishing interfaces that allows modules written in a variety of programming languages and potentially communicating over a web interface (a web service) to interoperate. The website is an appropriate use of the .NET standard.
One issue is, given that there are also Java-based web frameworks such as JSP and J2EE, and that Java is already being used extensively in the configuration, why an additional framework was introduced. Such criticism is easy with the full vision of hindsight but may not have been apparent at the time the technologies were selected.
6.6 Is the use of a coding standard apparent, and has it been universally and consistently applied?
A number of different languages have been used to customize the various software components and different subcontractors have developed different parts of the system. Therefore, a single, consistent coding standard across all system components is not really apparent or appropriate. Within code of the same language, some consistency should be apparent.
InQuirion has viewed a significant portion of the code with and without the developers present. The coding is generally laid out well and reasonably easy to understand. There is some variation in the level of inline documentation in the code, particularly in the Java code. Unisys employees and at least two different subcontractors developed different parts of this code. It is certainly possible to distinguish a number of different styles within this code.
It is normal in most widely adopted coding standards to begin a file containing code either with a brief comment describing the purpose of the file and the interfaces or services provided by the code contained in it, or the definition of the interface itself. This allows a developer to quickly determine the relevance of the interface and whether he or she needs to explore the implementation in the file further in the absence of the full development environment and without resorting to paper documentation. In the PAL code, many files begin with a lengthy boilerplate disclaimer and no description of the purpose of the file at all. More consistency in this regard would aid the ongoing maintenance of the code base.
6.7 Are APIs well defined and documented?
The documentation for the PAL system is extensive. While InQuirion has not examined every line of code, the code that we have examined exhibits well-defined APIs (including those defined by the vendors ArborText and Documentum). In general, these are well documented. However, given the extent and volume of the paper documentation, it is also necessary and advisable to provide good in- lined documentation in the code itself so that developers are not forced to go to the paper documentation all the time. The code would be potentially more maintainable if the documentation in-lined in the code matched the level of paper documentation.
The other advantage of in-lined documentation is that, while it is easy to allow paper documentation to fall out of date, where the documentation is in-lined in the code, it is very simple for the developer to update the in-lined documentation as they change the code. The comments are right there reminding them to do that. This in-lined documentation can then augment and supplement the paper documentation, which is a more manageable way to come up to speed on an application without any prior knowledge. The practice of in-lining documentation is a long established practice in the software community and the level of documentation within some parts of the PAL code is well below normal industry practice.
6.8 Is sufficient information available in terms of documentation and coding structure for a developer or designer to be able to make modifications without the need to rewrite components?
Where documentation exists, it is usually quite consistent and of a fairly high quality. The code is generally readable and understandable. As with all large systems the quality of the code and documentation does vary, particularly with regard to in-lined documentation but it is generally very readable, and would not be difficult to modify if necessary. Section 6.10.1 describes some gaps in the system documentation, which, if addressed, would make the system more maintainable.
The main reservation in this regard is the stylesheets or FOSIs for the print rendering of Bills. Because of the limitation ArborText places on the number of attribute rules per element, the stylesheets will have to be split (see section 6.2.2). Where refinements to the existing rules are made, as proposed in phase 1 and 2, and going forward with future enhancements, it is possible that this hard limit will be reached again. There is no obvious basis on which to split the stylesheets again. InQuirion's concern is that this limitation may eventually prevent incremental extension of the stylesheets without considerable reworking and rewriting. While this is not a documentation limit, it may prevent a developer or designer from making modifications without rewriting the rendering stylesheets (and most likely changing the DTD and editing environment).
6.9 Given the manner in which XML technology has been implemented, are there any implications in terms of technology development, maintenance and upgrade of components, and data portability?
Note: Particular attention should be paid to the balance of use of the Epic Editor package functions and XML.
As described above in section 6.5.1, XML has generally been applied consistently with how most vendors and experienced users of XML would choose to manage XML. There are a few exceptions that may cause data portability issues or component upgrade issues but these are relatively minor and should be easily remedied:
- 1. As discussed in section 6.5.1.2, the handling of non-ASCII characters in Epic Editor is through use of SGML-style general named entities. In fact, the DTDs delivered to InQuirion contained entity references not containing SYSTEM identifiers (SGML allows only a PUBLIC identifier but XML requires a SYSTEM identifier whether or not a PUBLIC identifier is present)13 but Epic Editor happily accepted them. Since the XML standard requires entities to be expanded by conformant validating parsers, Epic Editor has a feature to manage import and export of documents containing non- ASCII characters that would ensure that the parsing issues experienced when attempting to import characters above 127 would be handled correctly.
- 2. The use of Java code and the DOM library as discussed also in sections 6.5.1.2, 6.5.3, and 6.5.4 instead of XSLT for transforms relies on a potential weakness in the XML parser in Documentum, which also appears not to expand general named entities as configured although the XSLT engine does correctly. This code cannot be relied on to perform the same way in all XML environments.
- 3. The insertion of application-specific attributes by Documentum on elements that have been chunked with XML validated by DTD cannot simultaneously satisfy both the XML specification and the Namespace specification. This inconsistency is well known in the XML community and the choice by Documentum is common practice. Epic Editor even supports validating against documents containing those attributes even if they are not present in the DTD!14 The choice by Documentum would be consistent with the use of Schema rather than DTDs but the use of a processing instruction would be more consistent with the original XML specification.
The inconsistencies in item 1 can be addressed by correct configuration of Epic Editor to do the right thing with general named entities in XML. This would allow round-tripping documents regardless of whether they are transformed using XSLT, the Java DOM library, or any other validating XML parser. Item 2 does not cause the same problems but it does require the maintainers to learn the DOM API as well as XSLT when either would have sufficed. While InQuirion does not consider a coding change to be justified at this point, the inconsistency is noted. Documentum are unlikely to change how they embed chunk information in XML documents, and since this approach is common in the XML community, despite being strictly incorrect, it should not cause any major issues for ongoing migration. Even though these attributes can be ignored by Epic Editor for the purposes of DTD validation, they are correctly included in the DTD (to allow validation by other stricter validating parsers).
Some concerns were expressed that some of the text appearing on the page in final printed versions of Bills, Acts and subordinate legislation does not actually appear in the content of the XML markup but is sometimes inferred or inserted by the various stylesheets. This includes text like "Preamble", "Compare", "Part", various quotation marks, and brackets around numbers in subprovisions, paragraphs, and similar elements.
There is some debate between legislative drafters (and others) in jurisdictions implementing SGML or XML solutions as to the appropriateness of this action. For instance, the Canadian Department of Justice insists on every single character that appears on the page also appearing in the content of the XML. Tasmania, by contrast, uses attributes to represent Part, section, paragraph and similar numbers inferring the text "Part X—", "X. ", or "(X)" (where X is the attribute value) and other similar text from the markup context.
A number of factors influence conclusions on this issue:
- 1. Most jurisdictions still regard the paper version that is presented to the Parliament and eventually receives the Royal assent as the normative source of the law. In such jurisdictions, any electronic version, whether it contains every character of text on this page or not, is only a reflection of that document when rendered in a visual form to resemble that paper product. With XML, such rendering is only possible by associating a stylesheet with the content. Rather than considering the XML instance as a reflection of the law, the instance together with the stylesheet should be considered as a reflection of the law. As long as the stylesheet is considered with the instance, whether the characters are generated from the stylesheet or the content is irrelevant. In jurisdictions where the electronic version has been given force of law, it is still important to consider the XML document together with the stylesheet but there is considerably more freedom to determine what is and what isn't normative.
- 2. In many jurisdictions (including Tasmania and New Zealand but not including Canada), Parliament has conferred on the Clerks, the Parliamentary Counsel, or the Government Printer considerable powers to correct or modify the numbering and formatting of provisions and often punctuation and other minor issues that do not affect interpretation. In these jurisdictions, it is arguable that the numbering and other inferred text are ephemeral in a legal sense so it is appropriate to treat them as ephemeral technologically as well.
- 3. With respect to quotation marks (typically around defined terms or text omitted or inserted by amendments), normally quoted text is surrounded by double quotes. If that text itself appears within quoted material (say in an amending provision which itself amends an amending provision), the outside quotes are double quotes and the inside quotes are single quotes (facilitating matching opening and corresponding closing quotes). If the letter of the law were strictly followed, when consolidating that amendment to an amendment, a distinction between single and double quotes would be needed, but currently those single quotes are replaced by double quotes once the enclosing context is removed and the amendment consolidated. The most appropriate way of managing this is to use tags to mark the beginning and end of the quotation as has been done in the PAL system and infer which of single or double quotes is appropriate when rendering (not currently being done by the stylesheets but on the outstanding formatting issues list).
6.10 Are there any demonstrable gaps or deficiencies in the system documentation and system test plans?
The following are some issues that have been identified in the system documentation:
- Various user requirements documents appear to be inconsistent with the developed application in a number of places—user requirements should change as the system design and requirements evolve (with relevant gatekeeper processes to prevent scope creep or pruning). Even conservative software engineering methodologies like the IEEE Std 730-1998, IEEE Standard for Software Quality Assurance Plans prescribe change management procedures for documents produced during the software development lifecycle including user requirements documents. More progressive methodologies such as Extreme Programming or Feature Driven Design advocate iterative development and requirements refinement, making the feedback loop between early releases and user requirements explicitly part of the project plan and development cycle.
- There are a number of documents describing additional functions for the Authoring Tool. Some of these documents document user requirements,design decisions, and installation procedures. Many of them do not document all three. It would be easier to identify gaps in the documentation and easier to manage maintenance if all of these documents were collected into a single, consolidated End User Specification (which specifies end user requirements not design decisions), Detailed Design (which describes design decisions including summarizing the various utilities and why they were developed), and an Installation Manual (describing how to install the entire environment from scratch as well as how to install the extra utilities assuming the rest of the application is installed).
- The Authoring Tool End User Specification seems an odd place to discuss design decisions such as selecting FOSI over XSL-FO for screen (and print) rendering. This discussion should probably be in a missing overall design document.
- InQuirion has been unable to identify a document describing why additional fonts were required, what characters are and are not defined (e.g. macrons) and what would be required to add additional features to change tracking. The Authoring Tool—Custom Fonts document is little more than a list of font names and a description of how to install them.
- The source code is managed in a mixture of Visual Source Safe, JBuilder, and Documentum. It would make sense to consolidate all of the source code into a single repository so that it could be managed together as a whole.
- Section 6.1.1 of the Content Management System Design describes the chunking as being at the Sub-part level but further detail in 7.1 (and the behaviour of the system) indicate that documents are chunked at Part and Subpart and Schedule. InQuirion also notes that, for the purposes of the CMS, chunking also takes place on Explanatory Notes and Preambles but not in the website. This is an example of a system change that is not reflected in the design documentation (and hence explains why it was not propagated correctly to the website).
- InQuirion has been unable to identify a document describing why ACL packages were used to customize ArborText Epic Editor instead of JavaScript or VBScript, which are supported by other vendors and have a broader programmer base.
The purpose of system testing is to ensure that all of the components that make up a system work together to allow the business processes to progress from beginning to end. It typically includes end-to-end lifecycle testing as well as targeted testing on specific integrations and load testing of the production environment or a reasonable replica. While this testing is not typically performed by real end-users, it must ensure that sufficient realistic scenarios are run from beginning to end with all of the steps taken in between to ensure that every system component is exercised in all its likely contexts of use. User acceptance testing is normally intended to be a gatekeeper process to allow real users to sign off on the functionality of the delivered system, and to prevent the deployment of an inadequate system to the whole user community. A candidate release should not even be released to user acceptance testing until at least partial system testing has validated that it is likely to pass the user acceptance testing.
The original Unisys system testing strategy as implemented in early 2003 had significant weaknesses. The main problem was that it failed to expose the system to users or testers familiar with the drafting task and the legislative process before acceptance testing. By running the system testing concurrently with acceptance testing, the implementers failed to ensure that the system would manage representative documents adequately through the document lifecycle to completion before delivering a system to the client for user acceptance testing and training.
The main problem with system testing to date has been the lack of knowledge of the drafting task and legislative process by the constructors of the system test plans or the testing staff. The revised system testing strategy for the phase 1 release is to make available to the acceptance testing team a number of candidate releases that address outstanding issues to ensure that feedback is provided to developers during the development phase in time for fixes to make it into the final candidate release for phase 1. An alternative would be to actually include real users (borrowed from the User Acceptance Team) on the system testing team. These additional testing phases are intended to provide progressive feedback to the development team from real users in plenty of time to ensure that the issues are adequately addressed before the final candidate release is produced.
Some limitations evident in the current system testing plans include:
- throughput and load testing of the rendering solution and the CMS with typical and peak user loads in the operational environment—what happens when all of the staff request a print preview at the same time? check-in a large document?
- end-to-end testing of print requests to SecuraCopy—are SecuraCopy receiving all the right documents with all the right job tickets at each stage in the lifecycle?
- lack of variety in the structure of the Bills processed through the system test plans—are all the system tests being run with the same basic sample document or is a collection of candidate sample documents being selected from randomly?
- a test case for splitting a Bill—does the system readily support splitting a Bill for second or third reading?
- a test case for integrity of the VPN/Replistor characteristics—do changes in the system configuration preserve the security characteristics of the VPN? can SecuraCopy or Datacom change files on PCO's servers via Replistor?
- system testing checking that line numbering is functioning correctly on large Bills? that throughput is not affected by rendering changes?
- scalability of the solution—what happens to the performance of Documentum when it contains 200 chunked Bills? 1000 chunked Bills and 7,600 consolidated documents? 50,000 different versions of 7,600 consolidated documents?
7. Other issues
7.1 Ongoing support, maintenance and development
One of the issues that the terms of reference asked InQuirion to consider is the ongoing support and maintenance of the PAL system and its extension. This section discusses InQuirion's view of the business requirements, the contextual information, the skill sets required, and the risks associated with potential options.
7.1.1 Business requirements
The main capabilities required for ongoing support and maintenance are:
- the operational maintenance issues—what to do when one component stops working? how to install a new machine for use by a drafter?
- upgrades to the commercial software packages—ArborText, Documentum, SQL Server, IIS etc
- ongoing development—adding utilities to the Authoring Tool to increase drafter or PPU productivity, extending the website capabilities, adding elements to the DTD and propagating the changes through the system components
- system and user testing of any enhancements or fixes.
The preparation of the Bar 1-5 Bills often has a short turn around time and the preparation of the Assent copy is even shorter. Most stakeholders need to ensure that print can be reliably produced on short notice. The service levels required of the operational support would require immediate technical response whenever Parliament is sitting (approaching 40 weeks a year) with possible late night support to ensure overnight turnaround even if Parliament does not sit late into the evening.
The expectation once PAL goes live will be that PPU will provide a similar turn around service to that provided by Legislation Direct. Legislation Direct are currently doing that with mature technology with staff with considerable experience with the relevant tools. Will PPU be able to provide that level of service with less experience in the tools and while there are still concerns about the ability of the PAL system to reliably render documents in a timely fashion?
Regardless of who provides the support, maintenance, and development, upgrades and enhancements to the production system will need thorough testing before deployment. Currently a user acceptance testing cycle requires more than 80 days of effort for each test iteration. There is already an excellent test infrastructure in place in the current user acceptance team (UAT) and the duplicate servers. While the UAT have developed the necessary skills to provide an appropriate level of testing to ensure that any release is production ready, they must be released for other work.
InQuirion suggests that they be made available for 3-4 week periods a few times a year. A development and release cycle can be scheduled around this requirement to continue the ongoing development and enhancement of the application.
However, user expectations with the website are not likely to be compatible with a release every 4-6 months. User expectations for websites are that feedback receives a fairly immediate response. To avoid user retraining, many small incremental changes are preferred to fewer releases containing more changes at a single time. The website is technically less complex than most of the other elements of the PAL system. It is also a purely output system currently without any connection back into the production side of the PAL implementation. For this reason, a full UAT cycle is not required if the only changes are made to the web code or the XSLT transforms that produce the HTML. Only acceptance testing on the website itself is required.
Regardless of who provides the ongoing support and development, it is vital to the level of service and productivity of the personnel implementing the requests, that they understand the legislative process and each of the stakeholder's roles in it, including what each stakeholder group considers important and what external pressures are placed upon them. It is also vital that the team understand the internal business processes of each group, not only what they are normally but how, and when they can be bypassed. Lastly, they should understand how each of the pieces of the PAL system fits together to ensure that changes made in one area don't jeopardize processing in other parts of the system.
The unique nature of legislative documents must also be apparent to the support team. They must understand the longevity of the documents—requiring tool independence in the underlying data, the status of the documents—requiring high quality formatting, the complexity and consistency of the document structure—requiring a different authoring style to that used for disposable office documents, and the temporal nature of the collection—that old versions retain value and importance. Those involved in maintaining the rendition tools or the DTD should be familiar with the current legacy data, know what legislation should look like, what structures are possible and what structures are desirable, and ideally have some knowledge about what legislation looks like in other jurisdictions to maximize the future possibilities.
Each of the capabilities described above will be needed during different time periods of the project:
-
operational maintenance will be needed between Phase 1 and Phase 2 deployment,
-
operational maintenance will be needed during the warranty period after Phase 2 delivery,
-
operational maintenance will be needed past the warranty period,
-
ongoing development will be required past the warranty period,
-
deployment of commercial software upgrades will be required past the warranty period.
A number of different skill sets are required for ongoing support, development, and maintenance of the PAL application.
The main requirement here is the ability to troubleshoot draft and compilation markup, and the ability to identify what impact changes and extensions to the data set will necessitate modifications to the existing application.
The skills required include a general XML knowledge, the ability to modify DTDs and realize the consequences of those changes, an understanding of the markup rules—how to markup all the different structures that are supported by the system, and familiarity with the existing markup—how legacy data has been marked.
7.1.3.2 Authoring tool/print
To maintain and enhance the Authoring Tool and the print rendering stylesheets, the team will need the ability to identify editor and print layout issues and track them down to the rules activated in the stylesheets. Modification of the existing application will be needed to support new markup, fix the layout of existing markup and to add new tools and customizations.
The skills required to perform this are a good knowledge of the existing markup and layout requirements, proficiency in ACL and FOSI, and a good general knowledge of XML.
The Word templates for use by OC for producing commentaries will also require maintenance. While they are not overly complicated, a working knowledge of paragraph and character styles, VBA (the language for programming macros in Word) and an understanding of the Word import capabilities of Epic Editor/E3 will also be required.
7.1.3.3 CMS
For operational support, the support team will need to be able to trouble shoot problems with the repository and lifecycle migration issues. Modifications to the application will be needed to support new markup, to modify attributes or conditions on promote or demote, and to modify the dialogues for the various lifecycle stages.
The skills required will be mainly related to Documentum training including knowledge of the configuration files, knowledge of DQL/XDQL, and proficiency in Java, DocBasic or Visual Basic.
7.1.3.4 Transforms
Operational maintenance on the transforms requires the ability to troubleshoot the Java agents and the XSLT scripts that generate the output. Modification of the existing application will be necessary to support new markup and to support additional website functionality.
The skills required are Java, XSLT, HTML, CSS, and probably a little JavaScript in a web environment. It may be necessary to purchase licences to JBuilder to support the current Java development environment (InQuirion is unsure whether the licences belong to Unisys or PCO).
Operational maintenance of the website is mainly about troubleshooting the website logic (which is relatively simple and should not raise any issues after thorough testing). Modification of the existing application will be necessary to support new markup and to extend the website functionality.
Skills required include C#, ASP .NET, HTML, CSS and possibly JavaScript.
Operational maintenance requires project management to ensure that an operational capability is fully resourced. The ability to manage resources and schedule deployment is required for ongoing development primarily to support scoping and scheduling development releases, to manage the resulting business process changes, and to prioritize the development schedules.
The project management should have resource management, change management, and some technical project management background and a good understanding of the context of the PAL solution.
7.1.4 Options
There are two clear alternative strategies for ongoing support and maintenance:
- 1. Outsource support to Unisys.
- 2. Support the system with existing in-house or augmented resources.
- A number of hybrids of these are possible, including:
- 3. Outsource for a short time to Unisys to ensure continuity of service after the warranty period with the intent to migrate to in- house support once the system is bedded down.
- 4. Support the system in-house but outsource the development either to Unisys or to selected contractors.
Regardless of which of these models is selected, InQuirion is assuming that support and maintenance will be paid on the purchased licences, ArborText, Documentum, SQL Server, and IIS to ensure access to patches and upgrades, that ongoing support is fundamental, and ongoing development is desirable, and that there is sufficient funding available for both.
While section 7.1.3 might seem to describe a lot of different languages and environments, most developers familiar with web development using XML will be quite familiar with either one or both of the Java or .NET APIs and languages for manipulating XML. XSLT, CSS, HTML, and JavaScript are all necessary to develop any serious XML web application. Programmers typically come with groups of related skills not just a single programming language. While ArborText or Documentum experience are harder to come by in Australasia, an experienced developer with a good knowledge of the standard XML APIs will pick up the Documentum and ArborText APIs fairly quickly. The legal publishing community is a good potential source of experienced XML and ArborText users, and probably come with some level of understanding of legislation. Where the specific skills are not available, a competent programmer should be able to learn a new language in 2 weeks and be productive in a new programming paradigm in 6 weeks.
Many of the skills are already available in-house in PCO and the level of understanding of the legislative drafting requirements is far greater for internal resources than outsourced resources. The one technical person that InQuirion encountered that truly understood the whole PAL architecture at a technical level and was most likely to be able to make changes to any system component while understanding the consequences to other system components was one of PCO's IT staff, not one of Unisys's.
There are a number of risks that PCO and other stakeholders face with ongoing operational maintenance, and to a lesser extent ongoing development. These can be grouped into a few categories.
Staff risks include maintaining staff continuity. Because of the high learning curve for understanding the legislative drafting context and the complex business processes and the potentially large training investment to acquire the appropriate technical skills, it will be important to retain the existing internal PCO resources and attract and retain any new additional PCO resources. Continuity of any outsourced resources is also problematic but PCO have less control over the issues. If support is in-house, operational support requirements may swamp the resources, initially making fewer resources available for ongoing development. By outsourcing, the excess operational capability is not available to be used for system enhancement and the stakeholders may end up paying twice for the same service. The latter risk can be managed contractually.
Regardless of who provides the support, a high service level is going to be required. The responsiveness of operational maintenance, the timely access to resources, the redundancy of operational support resources (managing leave, illness etc) are all factors affecting the decision. If there is a likelihood of failure to achieve a particular service level then outsourcing may be preferable simply to have somebody else to blame.
However, this application is mission critical to the PCO and the effective operation of the whole government. The support infrastructure must have the trust of the PCO and the other stakeholders (OC, IRD, the relevant ministers, and the Members). A third party is unlikely to accept a mixture of in-sourcing and out-sourcing. Outsource contracts are usually all or nothing. New Zealand is also painfully aware of the dangers of outsourcing control of the statute book. Outsourcing control of the system to manage it has similar risks.
There are also risks related to value for money. If maintenance is outsourced, PCO may end up with the same subcontractor that they might have engaged directly themselves but pay a margin on top of their rate to the intermediary. If maintenance is in-house, PCO may end up committing to surplus additional staff that may be difficult or costly to shed if not required. This risk can be managed contractually. The level of demand for operational maintenance is likely to be high early in the deployment and taper off later in the life cycle of the system. A low development target initially with increased development load as the operational maintenance load tapers off could balance the resources required.
There are continuity and consistency risks associated with in- house support in that PCO support might be more responsive to PCO issues than those of other users, and the handover period will be awkward with a rush to identify all of the outstanding issues within the warranty period. It will also require a higher internal management load with increased staff.
If support is outsourced to Unisys, the difference between before and after warranty period disappears, as Unisys should be contractually bound to fix issues before and after. It does simplify the management for PCO—simply pay the money and get the service. Using Unisys may be an advantage for IRD who already have a service level agreement with Unisys. There is a neutral third party so no preference would necessarily be given to PCO over IRD or OC.
All these issues make the selection of an ongoing maintenance and support strategy a complex issue. The PAL system should not be deployed until an operational maintenance strategy is in place. If PCO and other stakeholders choose an outsourcing strategy, negotiating an appropriate service level agreement can be a lengthy and complicated task. PCO should involve other agencies with experience in this area and other stakeholders to ensure an outcome satisfactory to all PAL users. Unisys already has infrastructure in place for such support including a call centre, knowledge-base software to support call centre operators, and issue tracking software to report on and manage issue resolution.
The in-house support strategy also requires considerable lead- time. The current IT team is already experienced in supporting users within PCO (currently 63 users, 54 of which will use the PAL system), both in legislative drafting and administrative environments. The PAL system also has users in IRD (approximately 6 users), OC (approximately 5 users), Brookers (3 users operating PCO equipment). The PAL system also has relationships with SecuraCopy and Datacom. The public is supported via Datacom. The PCO IT team already has established relationships with OC, SecuraCopy, Datacom, Parliamentary Service and, to a lesser extent, Brookers.
PCO already has an infrastructure in place that includes issue- tracking software to manage issue resolution, and a knowledge- base.
An in-house support strategy would require additional IT resources to support both development and operational matters.
More formality and infrastructure would be necessary to ensure user satisfaction including additional staff to cover illness and other leave, after hours call support, controlled handover to ensure that all information associated with an issue is communicated to the person responsible for resolving it, and then back to the user who raised the issue (and any other affected users).
Regardless of the preferred solution, the lead-time necessary to get operational maintenance in place is such that these decisions need to be made sooner rather than later.
InQuirion is of the opinion that PCO will get better value for money and a better service level by augmenting the existing resources in- house with a few new recruits in the long term, but care will have to be taken to retain existing staff with appropriate capabilities and to recruit for redundancy.
InQuirion believes that a rendering test set needs to be developed that will allow the PCO to move forward confident that the rendering solution, whether the existing Print Composer solution, or a replacement solution including possibly E3, 3B2 or XPP could satisfy all of the print rendering needs of the PCO and other stakeholders. This test set should include a set of XML documents together with PDF files (or paper documents) showing how each XML document should be rendered in each of the different contexts in which it will be rendered and the format specification documents. Ideally, the change markup should be shown with examples of the current style and the proposed new style (to confirm that alternative rendering engines could support either look).
The test document set should include:
- a large Bill (more than 100 pages) containing examples of every
single element possible in the Body or Schedules in as many different
contexts as is practicable. This document should be made available
in:
- draft Bill form;
- Bar 1 introduction;
- revision tracked draft for consideration of a select committee;
- Bars 2 to 5 (including a commentary) with change markup applied within and to as many element combinations as is practicable, preferably with matching SOP and slip amendments (with line numbering where required);
- the Assent copy;
- the Act as it would be printed in loose leaf form;
- the Act as it would appear within the annual volumes;
- the Act as a reprint.
- regulation with a similar body as the Bill above in draft and final publication form,
- at least one substantial example of each and every other document type that needs to be rendered by the system (one each of the different types of regulation, slip amendments, different types of SOP, etc).
- a smaller set of documents that exercise each and every type
of problem that has so far been encountered (including those
that have been addressed in the current system) with the rendering
engine including:
- documents with different size running headers,
- documents with and without final Schedules, and
- documents with notes (to demonstrate the arranging of notes at the end of each provision).
7.3 Legacy data
Because of the delay between the intended go-live date and the projected phase 1 deployment, the Brookers DTDs will have changed in that period, and Brookers will need to regenerate the legacy data set to import into the PAL system. This may involve additional development of the export scripts and may raise some additional issues with the application.
7.4 Security issues
While the scope of this technical review did not extend to a full security audit and InQuirion does not have the expertise to provide a full security audit, the terms of reference requested InQuirion to identify any security issues encountered during the course of the review.
The only secure network is a network without connections to the outside world. No security infrastructure is completely unbreakable just like no safe is completely impenetrable. The only solution is to provide an appropriate level of security to protect the level of importance or sensitivity of the material.
Any set of networks connected by VPN is as vulnerable as the weakest network in the link. InQuirion understands that a review of the firewall capabilities and other network security precautions taken at each of the VPN sites (IRD, Brookers, Datacom, and SecuraCopy) was undertaken in order to satisfy the Parliamentary Service that a VPN could be established with those sites. Assuming that these sites are worthy of trust, the mechanism used to secure the connections between them seems to be at the appropriate level of security.
The only major level of concern is that, although some security testing of the VPNs and the website has been conducted, it is not clear to InQuirion that the system tests include the provision to retest the security when any changes are made to the infrastructure relevant to the security. In particular, changes to the configuration of Replistor should trigger a retest of the security arrangements between SecuraCopy, Datacom, and the PCO.
7.5 Uptime issues
7.5.1 Website
InQuirion notes that there is currently no provision for machine redundancy at the website. If a board or CPU fails on the production website machine, there is no hot backup to step in to replace it. Disk failures can be managed by using appropriate RAID configurations but CPU or board failures will require a replacement machine or board. While the platform used is a commodity machine, it is likely to take at least a few hours to source a replacement machine, and another few hours to install all the relevant software, and to replicate the data from the PCO master machine. If PCO is not prepared to allow possible down time of a day or so, possibly two or three days if the outage occurs on a weekend or public holiday, a redundant server for the public website should be given serious consideration.
InQuirion understands that the original plan anticipated moving the current test server to Datacom and using it as the warm or hot back-up to the public web server. In the light of the recommendation that an ongoing test infrastructure be preserved after Phase 1 deployment and even beyond Phase 2 deployment, InQuirion believes that it would be prudent to acquire a new server and 'shuffle' the existing servers to ensure that the newer (and most likely more powerful) servers are available in the production environment—either within PCO or on the website.
7.5.2 VPN
Currently there is only one box to support access via VPN between any one of IRD, SecuraCopy, Brookers, or Datacom and the Parliamentary Service network. If that box was to fail, sourcing a replacement could take weeks during which time the VPN would be unavailable and IRD drafters and Brookers compilation staff would have to draft physically located in the PCO. Transfers of data to SecuraCopy and Datacom would have to be managed by couriering CD's or similar means. A less serious failure would occur if the boxes at IRD, Brookers, Securacopy or Datacom failed. If this is an acceptable mode of operation for that time period then there is no problem, but, if it is not, InQuirion suggests some risk mitigation strategy including ensuring the availability of spare hardware to replace faulty VPN hardware either by procuring a spare unit to be available to any of those nodes when required or by ensuring that spares were available in Wellington at short notice.
7.6 Compilation and reprints
The current PAL solution does not provide support within the Authoring Tool or rendering engine for compilations or consolidations or official reprints. While it is acceptable to deploy the system for a short period of time without a capability to produce reprints (say between Phase 1 and Phase 2), even during this interim period, the Brookers staff will need to maintain the consolidated legislation database with amendments that commence during this period.
In addition to the markup required for Acts and regulations as made, consolidations require markup for recording information about the legislative history of a principal document including information about what Acts or regulations have amended it, when they commenced, and what provisions were affected. It appears that accommodation for this markup has already been made in the DTDs but the Authoring Tool is yet to be configured to support inserting this markup. Facility has been made to include this configuration within the Phase 1 development. Ideally, this configuration should also include the facility to print an unofficial consolidation (if only for proofreading) although conformance to the requirements for official reprints is unnecessary.
In addition to modifications to the Authoring Tool, the compilation team require an Editorial Diary. This should take the form of a database into which they can insert entries recording the amendments to be applied and when they are to be applied. This database should allow compilers to record when each amendment is incorporated to ensure that no amendments are missed and that the timing of amendments is correct. More complete specifications of the requirements for this tool appear within the draft Stage 2 Phase 1 and Phase 2 scoping document.
On an ongoing basis, PCO requires the capability to generate official reprints and additional capabilities will be needed in the Authoring Tool and the print rendering engine to support these (most likely an additional stylesheet only to be used for official reprints). Much of the development required for consolidations will satisfy the requirements for official reprints. Reprints also require the ability to include document "skeletons". Where amending Acts or regulations contain transitional provisions relevant to the principal Act, a "skeleton" for each amending document is typically included at the end of the reprinted principal. This "skeleton" contains only those transitional provisions and the relevant framework to identify the amending document.
Given the status of these documents, it would be prudent to create a document lifecycle to manage the creation and publication of these official reprints to ensure that they are released in a controlled fashion only after all the appropriate quality assurance procedures are covered including checking for completeness (all in force amendments have been applied, all uncommenced amendments are appropriately noted, all changes have matching history notes), accuracy (all changes have been checked against the original amending document) and output quality (the PDF and HTML renditions have been checked for correct formatting).
This latter capability should be included in the Phase 2 development. Note that PCO's ability to produce official reprints between Phase 1 and Phase 2 deployment is likely to be severely curtailed (or even suspended). It is important that this time period be kept manageably short to ensure an appropriate level of service.
7.7 Cross referencing tool
Legislation contains extensive references, both explicit and implicit, and the value of being able to navigate through electronic copy of the legislation simply by following hypertext links cannot be underestimated. The data acquired from Brookers contains extensive links reflecting the wording of cross-references in the text of the legislation. These links capture identifiers of the target elements described in the cross-reference wording. Given that the data purchased by the New Zealand Government already has support for this navigation, it would be poor stewardship of the asset not to maintain those links as the consolidated collection is updated.
While the model that New Zealand has adopted involves manual consolidation at least initially, manually copying and pasting these attributes from the target XML into the current working consolidation would be an extremely onerous task. In order to make link maintenance manageable, RU, PPU, and Brookers will need a tool to insert links into a working consolidation integrated into the Authoring Tool environment. This tool should provide the ability for a user to browse a hierarchy of legislation and navigate to the target of a cross- reference (at least to the containing section). By clicking on that provision, the appropriate attributes would be associated with the selected text or cross-reference element within the Authoring Tool. This would vastly improve the accuracy of maintaining these references. The need for this tool has been identified within the draft Stage 2 Phase 1 and Phase 2 scoping document as necessary for Phase 1 deployment, although the requirements probably need to be more tightly specified.
An alternative would be to use an attribute on the cross-reference element to distinguish between validated and unvalidated links. A tool could be written to scan candidate compilations (or just newly inserted elements) to identify text that matches cross-reference templates. These templates could then be mapped to likely identifiers. InQuirion has written such tools in the past for data conversion projects, as have the AustLII maintainers. The compilers could then traverse the links to ensure that they have correctly identified the link targets setting the validated attribute to true for those that have and manually overriding the automatic selection for those that have not. InQuirion recommends this be considered as possible future development if not for Phase 2 deployment.
7.8 Annual volumes
Most jurisdictions, including New Zealand, produce annual volumes of the legislation passed in each calendar year. New Zealand produces one set containing all Acts and another containing all regulations. Most years, the Act set contains 2 or 3 volumes. It contains the numbered Public Acts in number order (reflecting the order in which they were made), followed by the Private and Local Acts. Typically, the set is broken up into volumes of around 1000 pages each. Each volume contains tables of contents for the entire set, with the last volume typically also containing divider pages between the different types of Acts. The pages of the current volumes are numbered from 1 without restarting at the beginning of each volume.
While it would be preferable to preserve this behaviour, it is acceptable (because of limitations of Epic Print Composer) for the pages to renumber within each volume. The next set of annual volumes will be prepared for release in March 2004 and since the Acts and regulations will not have been prepared using the PAL system, it is impractical to prepare the 2003 volumes using the PAL system. The first set of volumes to be prepared using PAL will be for release in March 2005.
The PCO therefore requires the capability to prepare these annual volumes early in 2005. A number of solutions are possible to support this process.
- 1. The simplest development task would be to allow the PCO to print Acts and regulations in annual volume form (very similar to loose leaf form) one by one with the ability to set the starting page number. The tables of contents, covering and dividing pages could then be prepared manually using standard word processing software.
- 2. Alternatively, the tables of contents, covering and dividing pages could be prepared by a separate tool that ran over the legislation collection for that year extracting the relevant information although the page numbers would probably have to be inserted manually unless the rendering tool could provide information about the size of each individual PDF output it created.
- 3. The simplest operational tool would take a collection of all of the legislation for the year with a single wrapper document around it, and generate the entire volume set automatically, choosing break points at the end of Acts close to the desired volume size.
InQuirion understands that there is no mechanism to override the starting page number for a document rendered in Print Composer. Any evaluation of alternative rendering tools should consider this capability.
Print Composer also will only render a single XML document, so a gigantic document containing the whole year's legislation would have to be created in order to automate the generation of the entire volume set. InQuirion has doubts as to whether the Print Composer engine can render such a large XML document. Note that timeliness is not such an issue for the production of these volumes as several months currently pass from the time the last Act is assented to until the time the volumes are normally available, but the PCO has been steadily working to reduce this time lag. Again, any alternative rendering engine should be assessed either on the basis of whether it would be able to render a document that contains the entire set of Acts or regulations for a single year, or alternatively, the ability to read in documents one by one to create either one large PDF or a series of PDFs that reflected a whole volume or volume set. Currently, annual volume pages are actually printed and provided to proofreaders incrementally throughout the year. However, the necessity to manage workload in this way may be reduced or eliminated if the amount of checking is minimized as a result of greater confidence in the integrity of text and format as between loose enactments and bound volumes.
7.9 Naming of transform output files
When a document is promoted to the next stage in the Documentum lifecycle, in many situations the Java agents generate two files—a PDF rendition of the document and an XML version with private markup removed for distribution to legal publishers. These are added to the document cabinet in Documentum. Regardless of the name of the document being promoted, or the stage of promotion, these documents are always given the file name "whole.pdf" and "whole.xml".
The current practice results in hundreds of "whole.pdf" documents in the document repositories making it extremely difficult to identify which document is which.
When PDF is migrated to the web site, the documents are given the same name prefix as the source XML document with a ".pdf" extension by similar Java agents so it is possible to preserve a more representative file name. The transforms should be modified to ensure that a more meaningful name is given to each file before being saved in the Documentum repository. Ideally, the name will be different at different stages in the legislative process so that different versions of the Bill in the same folder can be distinguished easily.
8. Conclusion
The PAL system as implemented contains a number of strengths outlined in this report. The weaknesses described in this report are not insurmountable and should be addressed by implementing these recommendations and the draft Stage 2 Phase 1 and Phase 2 Scoping document.
Generally, the integrity of the modules has been preserved with customizations appropriate to the applications on which they have been built.
The modularity of the system follows similar systems deployed or in development around the world. While there are still some outstanding issues to do with integration of the system components, the system components seem to be interacting well and reliably.
Scope remains within the applications for further customization and upgrade of system components. With such a large and complex software system, the impact of any upgrades or enhancements will always need to be thoroughly tested before deployment but this is not unusual for developments of this type.
The use of industry standards and coding practices is evident within the development. The variety of subcontractors and development environments deployed on the project has lead to inconsistency between coding practices in different modules. Again, this is not unusual for projects of this size and complexity. Some minor recommendations have been made for improving the application of standards and coding practices throughout this report.
The system is generally robust. At no time during demonstrations of the system to InQuirion or InQuirion interacting with the system did any of the components crash although there were problems with some of the outputs of the modules.
Before deployment, a number of issues relating to ongoing maintenance and support need to be addressed. These have been outlined in this report.
Providing these issues and other issues identified within this report are addressed to the satisfaction of the stakeholders, the New Zealand Government can be assured that the PAL system, when implemented, will be operationally stable, maintainable, and capable of supporting future enhancement and development.
9. Glossary
ACL
ArborText Control Language or Access Control List. In the context of ArborText products, ACL is the language used to create the custom "packages" to add legislation-and PAL-specific capabilities to the Authoring Tool. In the context of the CMS, it describes the method used by Documentum to control who can access which documents.
API
Application Programming Interface. A set of objects and related methods, or structures and associated functions, that are made available from one system module (or application) to other modules in a computer system.
ASCII
ASCII is an old ANSI (American National Standards Institute) standard for representing western European characters in 7-bits. Virtually all commonly used character encodings include the standard ASCII characters in the code points from 0-127.
auto-generated text
A distinction has been made in some of the documentation between text that is generated (primarily by the Authoring Tool but also by transforms) and inserted in the markup, and text that is generated purely for display or rendering. Auto- generated text is inserted in the markup and becomes a permanent part of the document instance. Autotext is generated only for display or printing and does not appear explicitly in the underlying markup.
autotext
See above auto-generated text.
code point
A numeric value in a character set assigned to a particular character.
CSS
Cascading Style Sheets. A W3C recommendation for mapping XML and other structured markup onto presentation rules. This stylesheet language cannot reorder the document but merely describes how to lay the elements out on a screen or page. A CSS stylesheet for legislation provided by the PAL application is used by the web browser to render the HTML in the web-site.
DocBasic
DocBasic is a programming language supported only in the Documentum product similar to Microsoft's VisualBasic.
DOM
Document Object Model. The DOM is an interface or set of methods for manipulating a parsed (XML) document. DOM conformant libraries exist in a number of languages including Java.
DTD
Document Type Definition. A set of rules that describe a set of elements and how they can be combined to form a class of documents.
FOSI
Format Output Specification Instance. A US Department of Defense standard for mapping XML and other structured markup onto presentation rules. This stylesheet language cannot reorder the document but merely describes how to lay the elements out on a screen or page.
IRD
Inland Revenue Department. The IRD maintain a small team of drafters for drafting tax and related legislation. These drafters are physically located in the IRD's offices in Wellington and access the PAL system via a VPN.
ISO
International Standards Organization. An independent international standards body that works on the principle of national representation to establish international standards.
Java
Java is a general multi-platform programming language supported in the Documentum product and many other products.
OC
Office of the Clerk. The OC are responsible for managing the progress of documents through the legislative process.
PCO
Parliamentary Counsel Office. The office responsible for government drafting in New Zealand.
PPU
Pre-Publication Unit. The division of the PCO responsible for preparing legislation for publication in print and electronic form.
RU
Reprints Unit. The division of the PCO responsible for officializing the statute book and compiling reprints.
SGML
Structured Generalized Markup Language. ISO8879:1986. An ISO standard for document interchange. This standard was the forerunner of XML.
slip
A proposed amendment to a Bill for a select committee.
SOP
Supplementary Order Paper. An SOP typically describes an amendment proposed by a Member to a Bill.
stylesheet
This is a computer file that defines a transformation of an SGML or XML document into another format—generally a more readable one. The most typical use of a stylesheet is to transform XML into HTML or another dialect of XML.
VPN
Virtual Private Network. A VPN is a means of connecting two or more physically remote networks together in a way that enables one or more computers on each network to be "trusted" by computers on the other network as though they were physically on the same network.
XML
eXtensible Markup Language. A recommendation of the W3C. This standard is a document interchange standard that has been widely adopted in the web community for exchanging documents and other data. XML is a strict subset of SGML.
XSL
XML Stylesheet Language. A set of W3C standards for stylesheets for formatting XML.
XSL-FO
XML Stylesheet Language Formatting Objects. A W3C recommendation describing an XML namespace for representing layout objects of a page—essentially a page description language in XML.
XSLT
XML Stylesheet Language. A W3C recommendation describing XML namespaces and semantics for transforming XML documents into an output format—usually but not necessarily HTML, or another XML encoding including XSL- FO.
Unicode
An international standard (ISO10646) managed by the Unicode Consortium for representing characters and encoding them in 8-bit, 16-bit, or 32-bit encodings. It essentially describes a set of code points and their semantics and a number of ways of encoding those code points.
W3C
World Wide Web Consortium. An international consortium of vendors and other organizations that collaborate to produce standards for the web. A recommendation of the W3C is a standard approved by the W3C for use on the web.
10. Declaration of interest
As with most experts in the area of legislative drafting support technology, in addition to consulting services related to legislative drafting, InQuirion has interests in particular technology relevant to the solution. InQuirion provides a whole-of-lifecycle software solution for legislative drafting, management, and delivery, TeraText for Legislation based on the TeraText Database System and Document Management System also developed and sold by InQuirion. This product is the basis of the upgrade for the well-known Tasmanian EnAct system. This product includes an authoring environment based on Microsoft Word and workflow and document version management from the TeraText DMS, which competes directly with Documentum. InQuirion responded to both the RfEoI and the RfP for the PAL system. InQuirion's technologies—TeraText for Legislation utilizing both the TeraText Document Management System (DMS) and the TeraText Database System (DBS)—were assessed by Unisys as a potential CMS (TeraText DMS) and website solution (TeraText DBS) as part of the technology evaluation.
While it could be argued that InQuirion has a conflict here, and possible grounds for ill feeling with Unisys (they selected Documentum over TeraText DMS), we understand the pressures informing the technology selection at the time and bear no such ill feeling. Regardless of any concern in this area, InQuirion is aware that it would not be in our best interests for such a large, high-profile, legislative development based on XML to fail. A large failure would discourage other jurisdictions from moving forward with their plans for implementing XML-based solutions shrinking the market for everybody. It is therefore in InQuirion's best interest to provide the most honest and open response to this technical evaluation in order to maximize the chance of a successful deployment. The New Zealand Government need not rely just on InQuirion's integrity, but can rely on our self- interest to provide a fair and honest assessment of the PAL system.
Footnotes
1 Note that the New Zealand Constitution Act 1852 (Imp) and other related Acts of Imperial Parliament no longer have legal effect by virtue of the Constitution Act 1986 (NZ) and the Constitution is part of the legislation of New Zealand.2 Originally defined in US Defense standards MIL-HDBK-59A [28 Sep 1990] (superseded by MIL-HDBK-59B [10 Jun 1994]), MIL-HDBK-28001 [30 Jun 1995], and MIL-PRF-28001B [20 Apr 1995] (superseded by MIL-PRF-28001C [2 May 1997]) Appendix B.
3 Note that the PAL website makes some use of CSS in rendering the HTML in the web browser.
4 Note that the Epic products have support for XSL-FO also.
5 CSS 1 was aimed specifically at HTML.
6 CSS 2, CSS 2.1 and the proposed CSS 3 all explicitly state their intended application to XML and are widely used in the industry for rendering XML.
7 Either the specification is complete or it isn't. Either way, this work needs to be done regardless of the platform including if it is Print Composer!
8 InQuirion's familiarity with the 3B2 product is through our involvement in the Canadian LIMS project where it has been deployed by both the Department of Justice and Parliament to render bi-lingual legislation, Bills, and related material. The bi-lingual requirements and multi-column displays made a high-end rendering engine necessary in Canada. InQuirion has no formal relationship with or interest in either Allette, Advent Publishing, or any other vendor or representative of an XML rendering engine.
9 The two letter code is determined by ISO 639:1988.
10 The two letter code is determined by ISO 3166 Alpha-2 list.
11 This approach follows IETF RFC 1766.
12 Originally defined in US Defense standards MIL-HDBK-59A [28 Sep 1990] (superseded by MIL-HDBK-59B [10 Jun 1994]), MIL-HDBK-28001 [30 Jun 1995], and MIL-PRF-28001B [20 Apr 1995] (superseded by MIL-PRF-28001C [2 May 1997]) Appendix B.
13 Therefore the DTDs as shipped were not conformant XML DTDs although they were conformant SGML. Note that a conformant version was shipped but that was not the version referred to by the top level DTDs—NZAct.dtd, NZBill.dtd, and NZReg.dtd.
14 Again, strictly a breach of the XML standard but a useful compromise when mixing Namespaces with DTDs.
