Smart Statistics: C&SD's Digital Transformation in
the Era of Big Data
Elderly-friendly designs - Making the cyberspace more accessible to senior citizens
Census and Statistics Department
To promote digital transformation, C&SD strives to take forward significant development in two areas as follows:
C&SD processes around 70,000 import/export declarations daily, totalling 18 million or so annually for compilation of external merchandise trade statistics. In 2023, this process involved the verification of nearly 3.6 million commodity classifications and their declared unit values across 8,000 statistical categories.
The challenge lies in accurately validating these raw inputs for monthly compilation of trade statistics within a very short timeframe. For many years, considerable manual efforts have been required for reviewing the commodity descriptions, which are generally in free-text format and difficult to be processed by traditional rule-based computer systems.
Since 2018, C&SD has been exploring the use of AI models to analyse a large volume of unstructured textual data, with an aim to enhance the quality and efficiency of the data validation process by leveraging AI.
Our in-house developed AI models utilise deep learning algorithms, which were trained on millions of labelled commodity descriptions to predict the commodity code and validate the unit value for each trade declaration.
This innovative approach to automated commodity coding and unit value anomaly detection greatly reduces the need for manual checks, while enhancing data quality at the same time. The new initiative also helped us cope with the extreme challenges during the COVID-19 epidemic.
Since early this year, we have fully implemented the new approach in our workflows and reduced the manpower required by 40%. The resources were re-allocated to establish two strategic branches in C&SD: the Data Science Branch and the Social Data Development Branch. Besides, the manpower in some other statistical domains involving the use of big data has been strengthened. This enables us to better prepare for the dynamic era of big data and deliver more sophisticated statistical analyses across various areas.
Aside from conventional sample surveys, C&SD has also been exploring new data sources for statistical compilation, aiming to reduce data collection costs and the burden on respondents, while ensuring the quality of statistics compiled.
C&SD plans to utilise administrative data collected from various government departments more extensively and systematically starting from the 2026 Population Census, primarily in the following two areas:
First, we aim to replace some census questions (such as those on the rents of public housing and amounts of welfare payments) with administrative data, so as to reduce data collection costs and the burden on respondents. C&SD has employed self-developed AI-based record linkage tools to efficiently and accurately match census sample data with administrative records at the living quarters level.
Second, we aim to replace the “Short Form” questionnaire covering around 90% of all households in the 2031 Population Census with administrative data. Through comprehensive utilisation of anonymised immigration records, C&SD can now compile more precise population estimates without relying on the “Short Form” questionnaire, thereby significantly reducing the scale of operation and the costs involved.
It is expected that this new workflow of incorporating more administrative data and re-engineering work processes will significantly reduce costs. C&SD estimates that the total costs incurred for 2026 and 2031 Population Censuses will be reduced by 40%, saving around HK$680 million at current prices.
In addition to the above two new initiatives, our forward-looking big data strategy encompasses the exploration and application of cutting-edge technologies, such as drone-assisted and web scraping-based intelligent data collection, and computer vision technology for document processing. Through a comprehensive capacity building programme covering data science training, inter-departmental data science project collaborations and information technology infrastructure upgrades, C&SD has greatly enhanced its capabilities to capitalise on digital transformation opportunities.
These initiatives underscore our dedication to harnessing the power of AI, ensuring that C&SD remains at the forefront of smart statistics in an increasingly digital world.