The aim is to make it easier to keep population values (and associated references) up-to-date in Australian place article Infoboxes. This module looks at population claims in a linked Wikidata item and filters for the latest and most appropriate population value. It extracts this value, along with all referencing information, and gives this to the article Infobox.
Wikimedia Australia designed this project to coincide with the first release of the 2021 census data (in June 2022). This module was created as part of a funded project with work done by in collaboration with (really...HUGE amounts of help from), and . The project was coordinated by .
It is an ongoing project and we will continue to refine the module. Of course anyone is welcome to contribute!
Head to the page if you have anything to discuss.
We wrote a summary of the project for the Wikimedia Australia blog here.
The module is designed to be invoked from the Infobox Australian place template and gathers data from the Wikidata item linked to each article. The module may be modified and used in other places/cases in the future.
Currently, this module is invoked in such a way that it will only give the Infobox a population figure if one isn't manually given for the Infobox Australian place pop argument. This means that initially the module will not impact many articles. Over time, once we're certain it is working well, we can remove the manually added population figures in favour of the Wikidata figures brought in by the module.
See line 110 of the Infobox Australian place template for the module invoke.
Currently the module will only give a population figure to the Infobox if one has not been manually added via the Infobox Australian place template pop field. This means if you want to see the module in action for a particular place article, you should follow these steps:
Here's an example of an article with Infobox using the module, and the diff of the edit made.
The list of articles using population values from Wikidata (via this module) is here.
The module works with the following assumptions:
The high level steps of the module work flow are outlined in the diagram below. There are three major steps in the process of selecting the best population figure from a Wikidata item.
As a minimum they are required to have:
After filtering for these requirements a subset of population claims is carried forward.
The next part of the module separates the valid population claims into those which have applies to part values (defined ABS geography types) that match the Infobox type and those that don't. For the Infobox types that can map to multiple ABS geography types (eg. type = town), the most common mapping is considered a match initially and the other mappings are considered later in the module if the first preference isn't available. For example, type = town is matched to Urban Centres and Localities (UCL) as a first preference, but also returns population values for Suburbs and Localities (SAL) and Indigenous Locations (ILOC) instead, if they exist.
The mappings are based on outputs of summary SPARQL queries pulling out Infobox place type versus ABS geography types specified in linked Wikidata item (for all Australian place articles). The module uses the following mappings.
Infobox type | ABS geographic area | |
---|---|---|
City | Urban Centres and Localities (UCL) | |
Suburb | Suburbs and Localities (SAL) | |
Town | Urban Centres and Localities (UCL) (or SAL or Indigenous Locations (ILOC)) | |
LGA | Local Government Areas (LGA) | |
Region | Local Government Areas (LGA) (for now) |
The next step is to check within the two sets of claims (applies to part geography matched or not) and find the most recent population figure per each applies to part value. For example, in the list of claims with applies to part geography not matching the Infobox, there are likely multiple applies to part values (UCL, SA1 etc) and multiple point in time values (2006, 2011, 2016 etc). This step finds the most recent population figures for each geography type (eg 2016 UCL; 2021 SA1).
There are then three different types of outputs depending on the outcomes of the Step 2 and Step 3 filtering.
This is Output Scenario 1 and gives the Infobox one formatted population figure, with the relevant applies to part, point in time year and full Cite web reference(s).Eg. 5,089 (Suburb and Locality 2021)[1]
This is Output Scenario 2 and gives the Infobox up to two formatted population figures, each with the relevant applies to part, point in time year and full Cite web reference(s). This happens when there is no valid UCL population claim and is the second preference output for type = town places.E.g.
OR
This is Output Scenario 3 and gives the Infobox (possibly) multiple formatted population figures (one for each applies to part value), each with the relevant applies to part, point in time year and full Cite web reference(s).Eg. If Infobox type = city that's mapped to UCL (ands leads to Output Scenario 1), but if there're no UCL population values you might get this output:
The references are formatted using the Cite web template.
The census population figure references take this form:Australian Bureau of Statistics (28 June 2022). "Cosmo Newberry (Indigenous Locations)". Australian Census 2021 QuickStats. Retrieved 28 June 2022.
The non-census derived population figure references take this form:Australian Bureau of Statistics (29 March 2022). "Population estimates by SA2 and above, 2001 to 2021 (Greater Capital City Statistical Areas)". Australian Regional Population. Retrieved 28 June 2022.
The references are named, using this method (for now): name = refwork.."_"..pointintime.."_"..appliespart.."_"..reftitle. This is long-winded because we are aiming for a unique reference name for each population value.
There are some example outputs in the Infobox Australian place Sandbox Test Cases page here.
There are some issues that we are aware of, have considered but haven't dealt with yet. These will be tackled in time in collaboration with other place article contributors. (No doubt there are many more to add to the list - please do).
All the references produced by this module are followed by an Edit at Wikidata pencil icon with link the relevant Wikidata item (and specific population claim). This is where people should go to fix any errors in the population figure outputs or references. See next section for lists of what should ideally be included in a Wikidata population claim.
In parallel to development of this module and have been working on ensuring all Australian place Wikipedia articles are linked to corresponding Wikidata items (describing that same place). This has largely been done. This enables the use of this module.
Population data has historically been manually entered to individual Wikidata items. Recently (since ~2017), and others have used QuickStatements to do bulk imports of population data from Australian Bureau of Statistics datasets. Part of developing this module was to refine the list of metadata (qualifiers and reference fields) that should be imported alongside the population values.
As at July 2022 the first release of the 2021 census population data has been uploaded for the geographic areas relevant to Australian place Infoboxes. This includes data for Suburbs and Localities (SAL), Indigenous Locations (ILOC) and Local Government Areas (LGA). The Urban Centres and Localities (UCL) data is due to be released in October 2022.
The module requires these qualifiers and reference components to have values in the Wikidata population claim.
An example of a Wikidata item with a correctly filled 2021 population claim (using Census data) is:.
Bulk uploads have been done for census data. They have not been done for between-census estimated residential population (ERP) or Data by Region figures, for example. These estimates are useful for capital cities, LGAs and regions.
The module requires that non-census population claims have these components:
An example of a Wikidata item with a correctly filled 2021 estimated resident population claim (not the other population claims) is:. An example of a Wikidata item with a correctly filled 2020 LGA Data by Region population claim (not the other population claims) is:.
The module exposes one function.
<nowiki>{{</nowiki>#invoke:PopulationFromWikidata |ListForInfobox |type=''t'' |wikidata=''w''<nowiki> }}</nowiki>
Parameters: