Nicholson Consulting Submission: Modernising our Approach to the 2028 Census Consultation

On the 18th of June 2024, Nicholson Consulting provided our submission on Stats NZ's proposal to adopt an admin-data-first census for 2028 onwards.

Our full submission is provided below.

Nicholson Consulting Submission: Modernising our Approach to the 2028 Census Consultation

The move to an admin-data-first approach presents risks that cannot be ignored - risks to equity, social licence and data quality. These are of particular interest to us at Nicholson Consulting, given our vision of creating a more equitable Aotearoa and our expert knowledge of the IDI.

Some of these risks are not new. The 2015 cabinet paper Census transformation – A promising future[1], investigated the future options for transforming the New Zealand census. This paper detailed risks and actions that would be subject to “considerable further work and an extensive testing programme”. Further, various other sources have provided recommendations on how to address these risks, including the Statutory Reviews of the 2018[2] and 2023[3] censuses and advice from Te Mana Raraunga (for example, Kukutai & Cormack 2018[4]).

There is little publicly available information on which to assess whether sufficient progress has been made to mitigate and manage these risks. Further, it is unclear whether there has been an assessment undertaken by Stats NZ as to the real-life impact that changing from the status quo census process to an admin-data-first census will create. We recommend more transparency in relation to both fronts.

Given the nature of some of the risks identified below, and the problematic impact of not having adequate mitigations in place in time, we recommend that the census continues in its current form until these risks have been addressed.

A detailed discussion of our key concerns is provided below.

1. A move to an admin-data-first based approach will likely exacerbate existing inequities

As a result of colonisation, and complex intergenerational experiences, groups within Aotearoa have different ways of interacting with the census, with admin data and with survey data. This is evidenced out by the low participation of Māori and Pacific People (among other groups) in the 2018 and 2023 censuses. An increased focus on admin data, which is often deficit-based and incomplete, will likely further exacerbate existing inequities and does not address the likely low participation in supplementary surveys from these groups.

Whilst we anticipate that those who are difficult to capture within the existing census will also be hard to count in admin data, this is not a justification for accepting lower quality in the representation of those people in the admin-data-first census.

Defining who one is and how one exists is an inherently emancipatory activity. A survey-based individual and dwelling census offers people the opportunity to articulate their personhood, their relationships with others, and their day-to-day existence. This is independent of the pressures, incentives, and imperfect system that can be associated with data collected for other administrative purposes. For communities that are over-surveilled and therefore overrepresented in administrative data, such as Māori and Pacific Peoples, there will be a disproportionate focus on administratively collected data in admin-census profiles. For example, tamariki Māori are much more likely to appear in care and protection data, and Māori are dramatically overrepresented in corrections data. This is contradictory to the notion of self-determination and Te Tiriti o Waitangi. For Māori and Pacific people, both the low participation rates and over-representation in the source data used in the derivation of attributes will lead to real world systematic biases in the allocation of resources. It will also lead to other unexpected biases that will be difficult to detect and rectify and therefore may persist for multiple admin-data-first censuses.

Specific practical examples of biases in admin data leading to inequity include:

Ethnicity data: Alongside data from Te Tari Taiwhenua – Department of Internal Affairs, census ethnicity records have been noted as being of the highest quality source of ethnicity data[5]. The loss of high-quality ethnicity data collected via a survey-based census will lead to a reduced understanding of the complex ethnic make-up of Aotearoa, disproportionately affecting people with multiple ethnicities, or with less common ethnicities. The admin data sources are likely to be limited in two ways:

  • Due to a lack of dependency on ethnicity data for the delivery of services, individuals are less likely to list all their ethnicities and go to the same level of detail as in the survey-based census

  • Source systems may not be capable of capturing several ethnicities or highly detailed ethnicity levels.

Both of these limitations will lead to an underrepresentation of the true detail of ethnicity information. This is likely to be most severe at the detailed ethnicity level – levels 2 to 4. Care will need to be taken to ensure that agencies are not simply collecting data at the least-detailed level and that there is the opportunity for people to list all their ethnicities with source agencies. The impact of underrepresentation will likely lead to government agencies, who rely on this data to form evidence-based decisions and policy, arriving at flawed conclusions, for example, estimating inadequate funding for a programme targeting people of Māori ethnicity with a disability.

Attributes for immigrants: Immigrants who moved to Aotearoa recently will be missing vital information. Two examples are the education data required to ascertain qualification level and the health data required to have an immunisation history. Thus, getting a full and accurate picture of immigrants as a subpopulation group will not be possible. This will impact government functions such as workforce planning, community risk profiles regarding communicable disease etc.

Language data: Census data is used to provide an understanding of language utilisation and to develop real-world solutions for revitalisation. For example, the New Zealand Sign Language (NZSL) Board uses data captured by the census on languages spoken to count the number of people using sign language[6]. Also, the use of census information about te reo Māori proficiency has meant that it is now possible to forecast the number of te reo Māori speakers out to 2040[7] at a community level. This is important from a Te Tiriti perspective to ensure that te reo Māori is protected and thriving.  The implications for surveys on attitudes towards, and use of te reo Māori have not been clearly articulated and changes will take time to implement. These include Te Kupenga, the General Social Survey, and work by Te Kāhui Raraunga to design a te reo Māori survey.

2. An elevated use of admin data creates ethical concerns relating to full and informed consent and an erosion of social licence

As detailed below, the proposed use of admin data does not align with existing ethical best practice, including Ngā Tikanga Paihere.

The move to an admin-data-first census raises concerns about whether people are adequately able to consent to how their data is being used. For Stats NZ to access the data needed to produce population level statistics it will be essential to use sources of admin data in ways not necessarily related to their original purpose of collection. While legally possible, it places the responsibility on the individual giving the data to ensure that they are truly informed as to what they are consenting their data can be used for.

The elevated use of the IDI to access admin data goes beyond the current research purpose of the IDI and moves it into more of a data collection and data processing role. This usage of the IDI poses significant risks in terms of consent and social licence that may impact trust in Stats NZ and in the government data system.

In 2018, a Stats NZ survey conducted by an external consultancy to measure the organisation’s social licence found that less knowledge resulted in less trust, and that people did not know enough about Stats NZ to give informed trust or any trust to the organisation[8].

A particularly concerning situation which may arise is whether people could still access services without giving consent for their data to be used for non-administrative reasons. If they are unable to access services then this will have an adverse impact on access to government services and support. In this situation, people may feel pressured to give consent, which is not true consent, or may provide information which is incorrect.

Specific examples of ethical considerations include:

Using data beyond its original collection purpose:

  • We are already seeing an erosion of good practice in the use of data beyond the means for which it was collected. In 2023, the electoral roll was used for census purposes, with 0.6% of the census usual resident population having their Māori descent sourced from the electoral roll. Although covered by a privacy impact assessment, there is a lack of population-wide transparency, which is problematic as it creates a risk that trust in the electoral system could be eroded[9],[10].

  • In 2020, Stats NZ worked with Maui Hudson to develop Ngā Tikanga Paihere as a framework to guide ethical and culturally appropriate data use, initially to build and maintain public trust and confidence in the way Stats NZ manages access to the microdata in the Integrated Data Infrastructure[11].  As described in Ngā Tikanga Paihere, the principle of Mauri requires the purposes of data collection to be clear and well understood, and for these uses to be acceptable to Māori. It is unlikely that all service users will find the collection of data beyond that required for the service to be provided is acceptable.

Erosion of social licence: Informed trust and organisational trust must be high in order for individuals to freely share their personal data. This speaks to social licence - the permission that New Zealanders give to Stats NZ and other government agencies to make decisions surrounding the management, sharing and use of their data. For Stats NZ, loss of social licence will likely lead to greater difficulty in collecting reliable information and people may even deliberately provide incorrect information.

3. An admin-data-first census will create new data quality concerns

The proposed census transformation presents multiple challenges to data validity, reliability, and timeliness. We have summarised these under the themes of accuracy and practical concerns.

Accuracy

Linkage error: An admin-data-first census will require a linked dataset, such as the IDI or something akin to it. A linked-up dataset like the IDI, where there are not always common identifiers between datasets, requires some element of probabilistic linking. These methods bring with them the possibility of linkage error – false negatives and false positives. The rates of linkage error vary across sub-population and age groups[12],[13], impacting the accuracy of attribute data, which will likely lead to further inequities.

Households: Households are a key component of the census. Households are the unit about which much of the census and existing survey data (such as the HLFS) is collected. Without a reliable way to determine households, we lose key insights into life in Aotearoa such as household income and owner occupancy. We are particularly concerned about how households will be identified in an admin-data-first census. The definition of a household is heavily reliant on address data from multiple individuals and the address data is known to have issues around accuracy and how up-to-date it is across the source systems. Common situations where there are likely to be issues with households in the admin-data-first approach include:

Rental properties: Many people in Aotearoa do not own their own house. Due to the short-term nature of some flats, and the insecurity of renting in general, admin data may struggle to keep pace with people’s address changes. For younger people, if someone is living away from home they may keep their parent’s address. In this case, they would be counted under the parent’s household rather than the flatting household. This will disproportionately affect Māori, Pacific People, and young people.

Family moving to another address: When different members of a household have their address identified from different source systems this may cause problems with the stability of the household. If even one of them is inconsistent with the other(s), then the household level information could be wrong. Consider a situation in which a household moves location. Different members of the household will interact with different agencies at different times. The household will appear to break into different households (and will be assigned new household ids) and then over time as different members interact with source agencies, they will merge back into a singular household (again with a different household id). This also presents issues where there is a genuine fragmenting of a household – for example parents separating or children leaving home and then returning.

The quality of household membership and household variable data are highly linked. These issues for households are summed up by the following quotes – “Further work is needed to better understand how differences in household membership impact derived household variables “ and recent work has substantially improved the ability to construct admin-based households and familial relationships, although there are still issues, particularly with placing more mobile groups in the correct households[14].

In addition, it is important to note that a high accuracy in determining households may not be sufficient to indicate that a method is suitable. This is because:

  • it is easy to identify households for people who have all been living together for a long time at the same address. This may cover 80% or more of all households.

  • it is hard to identify households for people who move around a lot or live in less stable situations.

The household will almost always be correct for the first group and will often be wrong for the second group. This situation where the errors are highly concentrated into one group is a significant issue.

Procedural Issues

Enforcement: We are particularly concerned around the accuracy and completeness of admin data in the absence of national survey data given that there is a legal requirement for residents in New Zealand to fill out the census forms accurately. There is currently no ability to enforce accountability on the part of agencies for their data quality at source. Admin data is primarily collected for operational purposes and people may choose not to fill out information they deem is irrelevant for the purposes of engaging with that agency.

Validation: Currently, the results produced by the admin-data-first approach are validated against the census data. For example, Stats NZ has determined that in 2018 the overall output quality rating for Māori descent is 82%[15]. It is unclear how Stats NZ will assess measurement accuracy in the future when the census is also based on admin data. There is a risk that over time the results of the admin-data-first approach may gradually diverge from what would have been obtained from a survey-based approach.

Self-judged attributes: Many attributes in the census are inherently self-judged or can be answered differently depending on the contexts in which they are asked.  Attempting to reconstruct these measures based on rules applied to admin data will be extremely difficult. As discussed in the equity section, this will disproportionately affect groups that do not interact directly with surveys or other mechanisms for self-determining data held about them.

Source system changes: Future (and past) policy and operational changes impact the consistency and comparability of source data over time. Under these circumstances agencies could change the data being collected, which will then flow on to potentially unexpected changes in the admin-data-first census. The impact of this includes continuity of measures over time since observed changes may simply be due to changes in the source systems. Another example is where changes in the eligibility for different government services leads to changes in the people for which an agency collects data.

Other considerations

The arguments of respondent burden to complete the census and collection of duplicated data are not credible drivers for a move to an admin-data-first model. While completing a census form may not be completely effortless, it isn’t a large effort and is only every five years. It is also independent of the access issues, biases, and other systemic effects associated with collecting data for administrative purposes.

Duplication of information collection and having information from multiple sources is a feature, not a bug. People respond differently to the same question at different times and in different contexts.  Providing information via census is rather unique in that it is not directly tied to other services or perceived consequences that come with data provided for administrative purposes.  Where discrepancies and duplications exist between different data sources, those discrepancies often carry valuable information that can provide insights into related systems.  As an example, it has been noted in health research that people may record their ethnicity differently in health user data compared to their preference in other contexts as a deliberate attempt to counteract what they perceive as different treatment based on ethnicity.

Recommendations

Nicholson Consulting would like to offer the following recommendations. We believe that Stats NZ should:

  1. Investigate and publish the expected impacts of the change on key decisions. For example, funding for hospitals is based on census data. If the admin-data-first census impacts the counting of young Māori and Pacific people then that will in turn impact funding for hospitals in areas with high proportions of Māori and Pacific people.

  2. Continue the census in its current form until a number of the key risks identified in this document have been addressed. Especially as addressing these issues may take time, and not having a suitable alternative may lead to dimensions being completely omitted.  For example, in order to continue to understand language usage in Aotearoa, a suitable new data source, such as the proposed te reo Māori survey, would need to be designed and thoroughly tested.

  3. Publish estimates of the number of New Zealanders who are not captured in either census data or admin data to understand the extent of the issue.

  4. Quantify the number of people who participate in the national survey-based census but who cannot currently be captured in admin data, this would be complemented by an analysis of descriptive statistics at the attribute level.

  5. Rank the sources of data by quality for each attribute and be transparent about level of data quality. Commit to not include data of insufficient quality. Most importantly, we believe that it is critical not to rush this change and rather to understand potential adverse impacts properly.

  6. Work closely with Māori communities and Māori data experts to ensure that an admin-data-first approach is co-designed with Māori. This will require Māori Data Governance to be established as a pre-requisite.

  7. Publish a list of organisations and experts that Stats NZ has consulted with around the use of admin data and note those who support and do not support the transformation to an admin-data-first census, including a summary of consultation content and submissions.

  8. Undertake an assessment of the potential impact any loss of social licence would have on data collection at source. This needs to include any impact it would have on the operational service delivery for government agencies.

  9. Continue to consider the requirements of informed consent and ensure that there are processes in place for those that do not consent to data collection to receive the services they require.

[1] Census transformation – A promising future- Cabinet paper (redacted) https://www.stats.govt.nz/corporate/cabinet-paper-census-transformation-a-promising-future/. Retrieved from www.stats.govt.nz.

[2] Report of the Independent Review of New Zealand’s 2018 Census (2019) https://www.stats.govt.nz/reports/report-of-the-independent-review-of-new-zealands-2018-census

[3] Report of the Statutory Review of New Zealand’s 2023 Census (2024).

https://www.stats.govt.nz/reports/report-of-the-statutory-review-of-new-zealands-2023-census/. Retrieved from www.stats.govt.nz.

[4] Kukutai, T and Cormack D. (2018). Census 2018 and Implications for Māori. New Zealand Population Review, 44, 131-151.

[5] Bycroft, C, Elleouet, J, & Tran, H (2023). Harmonising ethnicity from multiple administrative data sources using latent class modelling. Retrieved from www.stats.govt.nz. Retrieved from www.stats.govt.nz.

[6] Stats NZ (2024). Data helps people access and use New Zealand Sign Language. https://www.stats.govt.nz/census/why-the-census-matters/data-helps-people-access-and-use-new-zealand-sign-language/. Retrieved from www.stats.govt.nz.

[7] Nicholson Consulting and Kōtātā Insight (2021). He Ara Poutama mō te reo Māori.  Forecasting  te reo Māori Speakers in Aotearoa, New Zealand. https://www.tematawai.maori.nz/en/research-and-evaluation/our-research/he-ara-poutama-mo-te-reo-maori/

[8] Nielsen and Stats NZ. (2018) A social licence approach to trust. https://www.stats.govt.nz/corporate/a-social-licence-approach-to-trust/. Retrieved from www.stats.govt.nz.

[9] Stats NZ (2023). Privacy impact assessment for the use of admin data in the 2023 Census. https://www.stats.govt.nz/privacy-impact-assessments/privacy-impact-assessment-for-the-use-of-admin-data-in-the-2023-census/Retrieved from www.stats.govt.nz.

[10] Stats NZ(2024). Summary of admin data used in the 2023 Census dataset.  Editing, data sources, and imputation in the 2023 Census. https://www.stats.govt.nz/methods/editing-data-sources-and-imputation-in-the-2023-census/.  Retrieved from www.stats.govt.nz.

[11] Stats NZ (2020). Ngā Tikanga Paihere: a framework guiding ethical and culturally appropriate data use. https://data.govt.nz/assets/data-ethics/Nga-Tikanga/Nga-Tikanga-Paihere-Guidelines-December-2020.pdf.Retrieved from www.data.govt.nz.

[12] Milne et al. (2019). Data Resource Profile: The New Zealand Integrated Data Infrastructure (IDI). International Journal of Epidemiology. 48 (3). 677-677e.

[13] Stats NZ (2024). Linking 2023 Census responses to the Integrated Data Infrastructure. Retrieved from www.stats.govt.nz.

[14] Stats NZ (2023). Experimental administrative population(third iteration): Data sources, methods, and quality for household information. https://www.stats.govt.nz/research/experimental-administrative-population-census-third-iteration-data-sources-methods-and-quality-for-household-information/ Retrieved from www.stats.govt.nz.

[15] Stats NZ (2022). Experimental administrative population census: Data sources, methods, and quality (second iteration). Retrieved from www.stats.govt.nz.

Previous
Previous

Archives NZ: Managing Care Records – a te ao Māori Perspective

Next
Next

He Ara Poutama mō te reo Māori: Using HAP to Understand The Future of Aotearoa’s reo Māori Education Workforce