Machine learning to fight inequalities in the workplace
There has been a lot of controversy about how bias in machine learning portrays social inequalities and how this might affect the outcomes of minority groups (also see this article named “AI is sending people to jail”). To be blunt, I don’t agree. Looking past the headlines that try to stimulate the emotional part of our brain will show you that machine learning is a two-edged sword. This post was written to demonstrate the other side of this sword. I will describe a short experiment in which I will indicate how it can be used to identify, and possibly also counteract, a lack of diversity within organisations.
The idea came to mind during a search for a Covid pocketbook. I ran into this medical book publishing firm. It’s a company that was started a couple of years ago by two medical students who have become successful publishers and medical doctors. As a way to demonstrate the medical community's involvement in creating content for their books, they posted a collection of pictures with the names of medical professionals that contributed to the pocketbooks. Looking at this long list of contributors page, I was shocked. I was in the expectation that the ratio between white and non-white had improved in the 5 years after I had completed my study. An erroneous assumption it was. The website seems to indicate that little had changed.
Maybe machine learning can be used to evaluate and monitor diversity within organisations and make this information public. This would create a strong push towards more cultural diversity among high positions within public and private organisations. In this post, I will show you the steps I to get something like this running. The full code for this project can be found on my GitHub page.
1. Choosing the first target
The website discussed above looked like a good first test target to do a diversity check.
2. Scraping like crazy
I started with scraping the website links of all the subpages of the homepage. I explained this scraping algorithm in detail in this tutorial. The subpage links were saved in a JSON file.
3. Browse through web pages and download all images
We open the JSON file in which the links are stored, create a function for downloading images on webpages and use that function in a loop that goes through all the links from the JSON file. We use the rand_sleep_int function within the loop to create random time intervals between download requests. We use regex to search for jpg, jpeg, fig and png images:
4. Filter images in which the number of faces is equal to one
Use an if statement to check if the length of the list of face locations is equal to one:
5. Give an identity number to every face
We want to avoid letting our machine learning algorithm assess the same face multiple times so we give an identification number to each face and if the same face is encountered we give it the same identification number. The identification numbers with a corresponding image file name will be saved in a dictionary as a key with a value. In this way we can prevent the same person from being counted more than once:
6. The actual machine learning ethnicity check work
As Andrew Ng said in his interview with Lex Fridman about ML:
“In a software system the machine learning model is maybe five percent
or even fewer relative to the entire software system”
The guru’s expression is reflected in the second line of this code block (together with one line in step 6 it’s the only line until now that involves ML!):
After doing the ML work we save the data in a CSV file for further analysis as shown in the last line.
7. First results
Okeeeey, time for some results. We count ethnicity with pandas:
latino hispanic 3
middle eastern 2
Name: Ethnicity, dtype: int64
We find 114 faces with an identified ethnicity of which 102 are white according to the DeepFace module, which is equal to around 89%. This is an organisation in Amsterdam, a city wherein around 51% of the people have a migration background!!! My first intuition seems to be confirmed. Unbelievable… Let’s do a sanity check.
8. Accuracy of ethnicity
Now let’s check what happens if you add a semi-random sample of 10 pictures with black people: 7 black female models, two of my friends (shown below) and me (not shown below).
latino hispanic 4
middle eastern 2
Name: Ethnicity, dtype: int64Gender
Name: Gender, dtype: int64
We find in total 123 faces with an identified ethnicity:
- 7 new black people: 3 are black men (me and my friends), 4 are black women (models).
- 1 new Latino Hispanic woman.
- 1 new Asian woman.
- The face of one woman could not be detected.
The confusion about the ethnicity of the two women is in my opinion understandable considering the mixed genetic makeup of many individuals that are classified as “black”. What DeepFace seems to be NOT confused about is the difference between white and non-white in this case. This reconfirms the initial expectations based on scrolling through the website.
9. Assumptions for a feasible organisation diversity checker
The next step is to upscale and test the algorithm on a collection of organisations. We make a couple of potentially wrong assumptions to make this project feasible (I also have other things to do :-P):
- A potential target industry should have a work culture wherein the presentation of profile pictures of their employees on a company website is considered a good habit.
- The organisations in the collection of organisations displays only profile pictures of their employees. There could be profile pictures of other people sprinkled on the website but we assume these do not make up a relevant part of the total number of profile pictures.
- The employees of these organisations have only one profile picture on the website. If there is more than one profile picture of the same employee we assume this will not be relevant as this will be a) an incidental finding or b) a systemic abnormality. In the case of the (a) it will not make much of a difference. In the case of (b) it doesn’t make a difference because pictures of all people will be in the same number of multiples.
10. Identifying an industry for analysis
We should also target an industry in which we think that performance will be optimal if the level of diversity would be a reflection of the diversity within society. The first thing that came to my mind was law firms. There were two reasons to consider them as a good target:
- There is a tendency to display a list of pictures of friendly smiling employees that offer their legal support. (I don’t know how their smiley face would look if they know they were used for this project, but let’s neglect that for a moment…).
- I would also expect that it would be beneficial for society if not only the defendants would be from a diverse background but also the other people in court such as their respective lawyers.
11. Performing the analysis
Let’s take a look at what happens when we process a list of law firms in Amsterdam, the same area as where the book publishing organisation is located. Let’s give them a convenient name and call them Law Firm 1 to 5 in a JSON file.
1. DeepFace machine learning performance
2. Manual check
When manually checking the result we get the following table:
The real numbers are even worse than those from the machine learning algorithm…
A python script was created to evaluate diversity within organisations based on lists of online profile pictures. The first check showed that the algorithm identifies 9/10 manually added profile pictures of people of black ethnicity accurately as being non-white. It was however more confused about the exact origin of the persons, something quite understandable considering the mixed origin of many individuals that we identify in the vernacular as “black”.
The second check showed that the script overclassifies individuals with a white ethnicity to the Asian, Latino and middle-eastern ethnicity classes, but not to the black ethnicity class. So the algorithm tends to overestimate non-black diversity. I haven’t delved into the potential causes of these discrepancies, but I guess that this has something to do with the less difficult task of differing white from black than white from other ethnicities.
So in conclusion, the current algorithm seems to perform well in identifying the percentage of black individuals as non-white but overclassifies other ethnicities. Nonetheless, we might be able to improve the configurations, create a better training set, take names into account or sharpen the scraping function and by doing so be able to get a wider and better-delineated picture of diversity within different organisations and monitor progress toward more diversity.
A public website that would incorporate such an algorithm could help citizens, journals and political institutes assess the diversity and by doing so act as a stimulus for more diversity at high positions within their hierarchies, thereby potentially making them more receptive to the needs of a multicultural society.
Feel free to leave a comment or fork the code on Github.