WDCM Biases Dashboard

This page presents the global gender bias in Wikidata usage across all WMF sister projects. The detailed overview of the gender bias in Wikidata usage per project is found on the following tabs.
Note. The WDCM Biases dashboards reports mainly on Wikidata re-use across the Wikimedia projects.
Precisely: we focus on the wbc_entity_usage table of the Wikibase Schema here.
To obtain the statistics for items as present in Wikidata itself, see: Denelezh — Gender Gap in Wikidata

No. of items per gender currently in Wikidata:


Loading...
Loading...
Loading...
Loading...
Loading...

Note. This statistics refer to the results of the latestet Wikidata JSON dump copy in HDFS, see Phab:T209655



Wikidata Item Re-Use per Gender: Rank vs Usage


Description. All Wikidata Male and Female items are ranked from the most to the least used on the horizontal axis. The vertical axis is the Wikidata usage statistic given on a logarithmic scale.

Wikidata Item Re-Use per Gender


Description. Each Wikidata item is represented by a point. The vertical axis is the Wikidata usage statistic given on a logarithmic scale.

Male and Female items and item re-use distributions

Loading...
Loading...

No. of Wikidata Items per Gender

Loading...
Loading...

Wikidata Item Re-Use per Gender

Loading...
Loading...

Definition. The statistics are the count of Wikidata items across the gender categories of male and female. The Male and Female Wikidata items are all members of Q5 (Human), in a P21 (sex or gender) relation with Q6581097 (male) or Q6581072 (female).

Definition. These are the WDCM usage statistics. The current Wikidata item usage statistic definition under WDCM is the count of the number of pages in a particular client project where the respective Wikidata item is used.



WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Gender bias in Wikidata usage per Project

Description of columns. M Usage and F Usage: the WDCM usage statistics for Male and Female items, respectively. M% and F%: the respective usage percents for Male and Female items. Project Type: Wikipedia, Wikiquote, Wikivoyage, Wikinews, Wikisource, Wiktionary, Wikiversity, Wikibooks, or Other. Probability (M>F) A result of a Bayesian A/B test across the M and F usage statistics: the posterior probability that the Male items are used more then the Female items given the observed usage statistics. CI 5% and CI 95%: credible interval on (M-F)/F. M/F is the ratio of Male to Female usage statistics (i.e. how many times are Male items used more than Female items).
Note. The test statistics and the M/F ratio are not computed when M or F - or F alone for M/F - usage statistics are zero.

Loading...



WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Gender Re-Use per Occupation

A selection of statistics and charts on gender bias in Wikidata usage per occupation. Scroll down for more results.
The top 100 most mentioned occupations in Wikimedia projects are selected and placed on the horizontal axis. The vertical axis represents the WDCM usage statictic, for Male and Female items separately.

Loading...

Gender Bias in Wikidata Re-Use per Occupation

Description of columns. Usage (M) and Usage (F): the WDCM usage statistics for Male and Female items, respectively, i.e. how much M and F items having the respective occupation value are used. Usage (Total): the sum of Usage (M) and Usage (F). Wikidata Items (M) and (F): the number of male and female Wikidata items having the respective occupation value. Probability Usage(M) > Usage(F) A result of a Bayesian A/B test across the M and F usage statistics: the posterior probability that the Male items are used more then the Female items per occupation, given the observed usage statistics. CI 5% and CI 95%: credible interval on (Usage(M)-Usage(F))/Usage(F). Usage(M)/Usage(F) is the ratio of Male to Female usage statistics (i.e. how many times are Male items used more than Female items).
Note. The test statistics and the M/F ratio are not computed when M or F - or F alone for M/F - usage statistics are zero.


Loading...

Occupations were Female items are mentioned more than Male items



Loading...



WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Gender diversity in Wikidata

Description. Think of Wikidata usage as a value that can be distributed among a number of individuals in an economy, drawing the following analogy: Wikidata items are taken to be individuals, and the sum of their total usage across our projects is taken to represent the total wealth. So, each Wikidata item's "worth" is measured by its Wikidata usage: the number of pages that make a mention of that item in our projects. Then rank all the items according to their usage in the Wikimedia universe and divide them in a number of equal-sized groups. For each group of items compute its share in the total Wikidata usage ("wealth"), and plot the cumulative percentage or proportion of Wikidata items covered by each successive group against their share (also expressed as cumulative percentage or proportion). What obtains is the Lorenz curve, a concept in economics widely used to express the distribution of economic inequality in a particular society. The following figure presents two Lorenz curves, for male and female Wikidata human (Q5) items usage.

The diagonal represents the line of equality: a category of items that would be found to have a straight, diagonal Lorenz function would be the one where all items are mentioned exactly the same number of times. The empirical Lorenz curves for the female (red) and male (blue) Wikidata items usage are found far away from the line of equality, which is nothing strange and unexpected. The finding is similar to, for example, the well-known facts about word usage frequency distributions (see: Zipf's Law) in any language, where a small fraction of words is used predominantly in comparison to the rest of the words that are used rarely. The closer an empirical curve gets to the line of equality, the more equal distribution of wealth - Wikidata usage, in this case - it represents. Also, the more equal the distribution of Wikidata usage, the larger the respective the Gini coefficient (reported for both genders under the chart title).



WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



M/F Wikidata usage ratio in large projects

Description. Wikimedia projects are represented on the horizontal axis and ranked from those that make the most use of Q5 (Human) items with P21 (sex or gender) defined (to the left) to those that make less use of them (to the right). Only the top 50 projects in respect to the usage of such items are presented. The M/F Usage Ratio, represented on the vertical axis, is the ratio of Male and Female Wikidata usage statistics (i.e. how many times are Male items used more than Female items). For example, if a particular project has an M/F value of 5, that means that five mentions of Male items are made in it for every single mention of a Female item. The detailed M/F statistics for all Wikimedia projects under consideration are provided in the table on the previous tab.
Loading...



WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



Gender bias and the North-South Divide

Description. Each marker in the maps below represents a birthplace of a person referred to by some Wikidata Q5 (Human) item. The size of the marker corresponds to the total Wikidata usage of all Q5 items who were born in the respective place.
Wikidata male item birthplaces


Wikidata female item birthplaces


Statistics

Description. Please take into your consideration that the following statictis are based only on the usage data for those Q5 (Human) items with a geo-localized birthplace in Wikidata. All numbers represent statistics are WDCM usage statistics, i.e. the count of the number of pages across the projects where the respective Wikidata items are used.
Gender Distribution
Loading...
Loading...

North-South Divide
Loading...
Loading...

Gender Bias and North-South Divide
Loading...
Loading...
Loading...
Loading...



WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



WDCM Navigation

Your orientation in the WDCM Dashboards System


  • WDCM Portal
    The entry point to WDCM Dashboards.

  • WDCM Overview
    The big picture. Fundamental insights in how Wikidata is used across the client projects.

  • WDCM Semantics
    Detailed insights into the WDCM Taxonomy (a selection of semantic categories from Wikidata), its distributional semantics, and the way it is used across the client projects. If you are looking for Topic Models - that’s where they live.

  • WDCM Usage
    Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar.

  • WDCM Geo
    Wikidata items interactive maps.

  • WDCM Structure
    A method to investigate the WDCM Taxonomy and improve the choice of items that undergo analyses.

  • WDCM Biases
    The WDCM gender bias and north-south divide statistics.

  • WDCM (S)itelinks
    The WDCM (S)itelinks usage aspect statistics.

  • WDCM (T)itles
    The WDCM (T)itles usage aspect statistics.


  • WDCM System Technical Documentation
    The WDCM Wikitech Page.

  • WDCM Wikidata Project Page
    The WDCM Wikidata Project Page.

  • The WDCM Journal
    A regularly update selection of the most interesting empirical findings from wmdeanalytics.




WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm



WDCM Biases Dashboard

Description


Introduction


This Dashboard is a part of the Wikidata Concepts Monitor (WDMC). The WDCM system provides analytics on Wikidata usage across the Wikimedia sister projects. The WDCM Biases Dashboard collects and visualizes usage statistics for Wikidata items that are members of Q5 (Human), in a P21 (sex or gender) relation with Q6581097 (male) or Q6581072 (female). All functions of this Dashboard are documented alongside the respective data visualizations and tables. To understand the WDCM usage statistics, check out the Definitions section.

If you are interested in gender statistics on Wikidata alone (reminder: the WDCM system tracks Wikidata usage across the WMF projects, not Wikidata item statistics per se), visit: Gender Gap in Wikidata on denelezh.org


Definitions


N.B. The current Wikidata item usage statistic definition is the count of the number of pages in a particular client project where the respective Wikidata item is used. For example, if a Wikidata item is used once or more than once on a page, we take that as one mention of that item - no matter the number of its occurrences on that page. The current definition thus ignores the usage aspects completely.




WDCM Biases :: Wikidata, WMDE 2019

Contact: Goran S. Milovanovic, Data Scientist, WMDE
e-mail: goran.milovanovic_ext@wikimedia.de
IRC: goransm