Many have promised transparency, few have delivered. The LegCo Secretariat is leading the way to develop a legislature for Web 3.0 – a LegCo for the people.
Technology has grown at such a rate that famed scientists and entrepreneurs like Stephen Hawking and Elon Musk have warned that artificial intelligence “could spell the end of the human race.” However, when it comes to the public sector, an over-rapid implementation of new technology is usually less of a concern. Governments are often laggards, but at the LegCo Secretariat, efforts to overhaul the presentation of its massive data with the latest technology is about to re-define the industry practice in the public sector.
A total overhaul
“There is a need to help our own people (LegCo staff) as well as LegCo members’ assistants to dig up information more efficiently,” says Mr Kenneth Chen, LegCo Secretary General. It was not too long ago when HKU’s Journalism and Media Studies Centre came to him and pointed out the difficulty of analysing LegCo data under the current data format, that Mr Chen started looking into the a long term plan to upgrade the LegCo database to enable ‘smart search’.
The idea was further developed after Ms Elyssa Wong, Head of the Information Services Division in LegCo, visited the Parliament of Canada in February 2014 and discovered their IT project, Prism, which has allowed parliamentary staff and the public to access and extract data easily. Ms Wong came back with an inconvenient truth – for LegCo to undertake Mr Chen’s plan, there had to be an overhaul of the whole content management system. It is a monumental task and even more so when the Secretariat is now busy handling the filibusters.
“What we are trying to do is to make sense of the documents for the posterity. [The goal is so that] 100 years from now when people look at these documents, they will understand it”
LegCo Secretary General Kenneth Chen
The primeval work in the 21st century
Already, major improvements have been made following the debut of the Hansard Database by the Secretariat. This new service makes it easier to search members’ speeches at the Council meetings in this term (from 2013 onwards). A workshop last month was organised for media and members’ assistants to learn how to use the Hansard Database. Users can now narrow down their search to see when and what members have spoken on a specific topic, and even to identify the members who have voiced concern on a particular issue.
“Getting the Hansard available for all Council meetings since 1858 online is a major accomplishment,” Mr Chen says. The reason he said that was because this very useful tool came at a great cost; labour cost to be precise.
Data conversion is a painstaking task and requires manual work. Currently, LegCo staff have to convert all documents to PDF form. They will then have to arrange the data by sections before uploading them to the Database. The prevalent use of computers in the 21st century has dramatically reduced the work of the Secretariat, but managing antique documents that date back to the 19th century is a monumental task.
LegCo staff have to literally reproduce the old documents. If they could simply scan the papers, their jobs would be much easier but current technology is unable to deliver accurate word recognition of antique script and that means the staff have to type it all in manually. The fact that documentation like the colonial Opium Ordinance in early years such as 1885 and 1891 came to light, and the detailed records of proceedings of the Council meetings before 1997 were available to the public, is a testament to the contribution of the Secretariat.
Getting the staff to type up the lengthy meeting documents and present them in PDF form are only parts of the picture. Different rules of procedure in the colonial time can be confusing and the obscure handwriting makes the transition work utterly challenging. In some parts where handwriting is impossible to identify, LegCo staff will leave a remark, explaining that their reproduction may merely be a guess.
“What we are trying to do is to make sense of the documents for the posterity. [The goal is that] 100 years from now when people look at these documents, they will understand it,” says Mr Chen. The work to transfer over 330,000 pages of LegCo documents to structured PDF form is a significant achievement, yet it is hardly close to his plan.
a narrow range of extremely relevant results. This is the promise, yet to be realised, of Web 3.0
20th century language
“We never get high up on the search directory because everybody is talking in the 21st century [computer search] language and we are stuck in the 20th.” The problem with the PDF format, the 20th century language, is that it is only readable by PDF reader applications but not produced in a useful machine-readable format and therefore the LegCo data is not ‘smart’ enough to be picked up by the search engines or to be developed for further open data applications.
The Web 3.0
The tech-savvy will know that 2015 lands in the middle of the development of web 3.0 – the age of Semantic Web. What it means is ”web has a concept and reason,” says Mr Ian Leong, Chief Information Technology Officer of LegCo. In a simple sense, Semantic Web emphasises the use of a common framework RDF/XML to allow data to be shared and used by everyone but the data has to become structural and to be produced in machine-readable format in the first place.
The goal of Mr Chen’s mega plan is: To improve the linkage of LegCo data to search engines and allow the data to be used for open data developments. One major difficulty is again data conversion. Data has to be converted from the existing PDF to RDF/XML format (a coding language that is both human-readable and machine-readable). However, when converting data into PDF is already cumbersome, the task to re-convert all data to RDF/XML format is quite unthinkable.
What this would do is not only make data more searchable, but make it searchable by more refined, useful searches. For example, if a journalist wanted to pull up quotes from Charles Mok supporting the ITB, he might search, with quotes “Charles Mok supports the ITB” and get no results, but get 951,000 results without quotes. In the future, a search might be “Find instances where Charles Mok spoke in LegCo supporting the ITB” and get a narrow range of extremely relevant results. This is the promise, yet to be realised, of Web 3.0. It exists in a trial form, as described above. A culmination of this effort could see all LegCo records be that useful.
Having this power to find relevant information faster, would improve the quality of political discourse (something this publication is much in favour of). Confusion could be cleared up faster, politicians and bureaucrats held to account faster and debates resolved in a more expeditious fashion. Ideally.
Not the new world, just better
“We are not really creating anything new but recasting what we already have in a new vocabulary,” says Mr Chen. Learning new coding language is another problem and the mega plan will also require altering the way data is submitted from members’ offices and the Government.
Currently, this mega plan is in its nascent stage and the Secretariat is still researching and testing the idea. For years, Office of the Government Chief Information Officer has been improving its Data.One project to release simple public data, like weather and traffic info, in machine-readable format. With this new initiative taken by the LegCo Secretariat, the progress of the Government’s plan to implement an open and transparent public data policy can be hastened, ideally fundamentally changing the way the public monitors, and works with, the Government.