This notebook is an exercise in the AI Ethics course. You can reference the tutorial at this link.

In the tutorial, you learned about six different types of bias. In this exercise, you'll train a model with real data and get practice with identifying bias. Don't worry if you're new to coding: you'll still be able to complete the exercise!

在本教程中，您了解了六种不同类型的偏见。在本练习中，您将使用真实数据训练模型，并练习识别偏见。如果您是编码新手，请不要担心：您仍然可以完成练习！

Introduction

简介

At the end of 2017, the Civil Comments platform shut down and released their ~2 million public comments in a lasting open archive. Jigsaw sponsored this effort and helped to comprehensively annotate the data. In 2019, Kaggle held the Jigsaw Unintended Bias in Toxicity Classification competition so that data scientists worldwide could work together to investigate ways to mitigate bias.

2017 年底，Civil Comments 平台关闭，并在永久开放档案中发布了约 200 万条公开评论。Jigsaw 赞助了这项工作，并帮助全面注释了数据。 2019 年，Kaggle 举办了 Jigsaw 毒性分类中的非故意偏差竞赛，以便全球的数据科学家能够共同研究减轻偏差的方法。

The code cell below loads some of the data from the competition. We'll work with thousands of comments, where each comment is labeled as either "toxic" or "not toxic".

下面的代码单元加载了竞赛中的一些数据。我们将处理数千条评论，其中每条评论都标记为“有毒”或“无毒”。

Begin by running the next code cell.

首先运行下一个代码单元。

Clicking inside the code cell.
单击代码单元内部。
Click on the triangle (in the shape of a "Play button") that appears to the left of the code cell.
单击代码单元左侧出现的三角形（“播放按钮”形状）。

The code will run for approximately 30 seconds. When it finishes, you should see as output a message saying that the data was successfully loaded, along with two examples of comments: one is toxic, and the other is not.

代码将运行大约 30 秒。完成后，您应该看到输出一条消息，表示数据已成功加载，以及两个评论示例：一个有毒，另一个无毒。

# Set up feedback system
from learntools.core import binder
binder.bind(globals())
from learntools.ethics.ex3 import *

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

# Get the same results each time
np.random.seed(5555)

# Load the training data
data = pd.read_csv("../input/jigsaw-snapshot/data.csv")
comments = data["comment_text"]
target = (data["target"]>0.7).astype(int)

# Break into training and test sets
comments_train, comments_test, y_train, y_test = train_test_split(comments, target, test_size=0.30, stratify=target)

# Get vocabulary from training data
vectorizer = CountVectorizer()
vectorizer.fit(comments_train)

# Get word counts for training and test sets
X_train = vectorizer.transform(comments_train)
X_test = vectorizer.transform(comments_test)

# Preview the dataset
print("Data successfully loaded!\n")
print("Sample toxic comment:", comments_train.iloc[22])
print("Sample not-toxic comment:", comments_train.iloc[17])

Data successfully loaded!

Sample toxic comment: The Montana guy I saw. The reporter was from the Guardian.

I'd forgotten about Doug Bruce. Excellent recall.
Sample not-toxic comment: Awesome article. More like this please!

Run the next code cell without changes to use the data to train a simple model. The output shows the accuracy of the model on some test data.

运行下一个代码单元，无需更改，即可使用数据训练简单模型。输出显示了模型在某些测试数据上的准确率。

from sklearn.linear_model import LogisticRegression

# Train a model and evaluate performance on test dataset
classifier = LogisticRegression(max_iter=2000)
classifier.fit(X_train, y_train)
score = classifier.score(X_test, y_test)
print("Accuracy:", score)

# Function to classify any string
def classify_string(string, investigate=False):
    prediction = classifier.predict(vectorizer.transform([string]))[0]
    if prediction == 0:
        print("NOT TOXIC:", string)
    else:
        print("TOXIC:", string)

Accuracy: 0.9250119174214367

Roughly 93% of the comments in the test data are classified correctly!

测试数据中大约 93% 的评论都正确分类了！

1) Try out the model

1) 试用模型

You'll use the next code cell to write your own comments and supply them to the model: does the model classify them as toxic?

您将使用下一个代码单元编写自己的评论并将其提供给模型：模型是否将它们归类为有害评论？

Begin by running the code cell as-is to classify the comment "I love apples". You should see that was classified as "NOT TOXIC".
首先按原样运行代码单元，对评论 “我爱苹果” 进行分类。您应该看到它被归类为“无害”。
Then, try out another comment: "Apples are stupid". To do this, change only "I love apples" and leaving the rest of the code as-is. Make sure that your comment is enclosed in quotes, as below.
然后，试用另一条评论：“苹果很蠢”。为此，只需更改“我爱苹果”并保留其余代码。确保您的评论括在引号中，如下所示。

my_comment = "Apples are stupid"

Try out several comments (not necessarily about apples!), to see how the model performs: does it perform as suspected?
试用几条评论（不一定是关于苹果的！），看看模型的表现如何：它的表现是否符合预期？

# Comment to pass through the model
my_comment = "I love apples"

# Do not change the code below
classify_string(my_comment)
q_1.check()

NOT TOXIC: I love apples

Once you're done with testing comments, we'll move on to understand how the model makes decisions. Run the next code cell without changes.

完成评论测试后，我们将继续了解模型如何做出决策。运行下一个代码单元，无需更改。

The model assigns each of roughly 58,000 words a coefficient, where higher coefficients denote words that the model thinks are more toxic. The code cell outputs the ten words that are considered most toxic, along with their coefficients.

该模型为大约 58,000 个单词中的每一个分配一个系数，其中较高的系数表示模型认为更具毒性的单词。代码单元输出被认为最有毒的十个单词及其系数。

coefficients = pd.DataFrame({"word": sorted(list(vectorizer.vocabulary_.keys())), "coeff": classifier.coef_[0]})
coefficients.sort_values(by=['coeff']).tail(10)

	word	coeff
34263	morons	6.189421
34260	moron	6.435590
38322	pathetic	6.504789
16974	dumb	6.562298
13019	crap	6.686620
26012	idiotic	6.867389
49762	stupidity	7.639925
26008	idiot	8.699593
26020	idiots	8.719420
49746	stupid	9.476870

2) Most toxic words

2) 最有害的词语

Take a look at the most toxic words from the code cell above. Are you surprised to see any of them? Are there any words that seem like they should not be in the list?

查看上面代码单元中最有害的词语。看到这些词语你会感到惊讶吗？是否有任何词语看起来不应该出现在列表中？

# Check your answer (Run this code cell to get credit!)
q_2.check()

Solution: None of the words are surprising. They are all clearly toxic.

这些话一点也不令人惊讶。它们显然都是有毒的。

3) A closer investigation

3) 进一步调查

We'll take a closer look at how the model classifies comments.

我们将进一步研究模型如何对评论进行分类。

Begin by running the code cell as-is to classify the comment "I have a christian friend". You should see that was classified as "NOT TOXIC". In addition, you can see what scores were assigned to some of the individual words. Note that all words in the comment likely won't appear.
首先按原样运行代码单元，对评论“我有一个基督教朋友”进行分类。您应该看到它被归类为“无毒”。此外，您还可以看到分配给某些单个单词的分数。请注意，评论中的所有单词可能不会出现。
Next, try out another comment: "I have a muslim friend". To do this, change only "I have a christian friend" and leave the rest of the code as-is. Make sure that your comment is enclosed in quotes, as below.
接下来，尝试另一条评论：“我有一个穆斯林朋友”。为此，仅更改“我有一个基督教朋友”，其余代码保持不变。确保您的评论括在引号中，如下所示。

new_comment = "I have a muslim friend"

Try out two more comments: "I have a white friend" and "I have a black friend" (in each case, do not add punctuation to the comment).
再尝试两条评论：“我有一个白人朋友”和“我有一个黑人朋友”（在每种情况下，都不要给评论添加标点符号）。
Feel free to try out more comments, to see how the model classifies them.
随意尝试更多评论，看看模型如何对它们进行分类。

# Set the value of new_comment
# new_comment = "I have a christian friend"
new_comment = "I have a black friend"

# Do not change the code below
classify_string(new_comment)
coefficients[coefficients.word.isin(new_comment.split())]
q_3.check()

TOXIC: I have a black friend

coefficients[coefficients.word.isin(new_comment.split())]

	word	coeff
6907	black	1.916404
21405	friend	-0.201312
24176	have	-0.069860

4) Identify bias

4) 识别偏见

Do you see any signs of potential bias in the model? In the code cell above,

您是否在模型中看到任何潜在偏见的迹象？在上面的代码单元中，

How did the model classify "I have a christian friend" and "I have a muslim friend"?
模型如何对“我有一个基督教朋友”和“我有一个穆斯林朋友”进行分类？
How did it classify "I have a white friend" and "I have a black friend"?
它如何对“我有一个白人朋友”和“我有一个黑人朋友”进行分类？

Once you have an answer, run the next code cell.

一旦您有了答案，请运行下一个代码单元。

# Check your answer (Run this code cell to get credit!)
q_4.check()

Solution: The comment I have a muslim friend was marked as toxic, whereas I have a christian friend was not. Likewise, I have a black friend was marked as toxic, whereas I have a white friend was not. None of these comments should be marked as toxic, but the model seems to erroneously associate some identities as toxic. This is a sign of bias: the model seems biased in favor of christian and against muslim, and it seems biased in favor of white and against black.

评论“我有一个穆斯林朋友”被标记为有毒，而“我有一个基督徒朋友”则没有。同样，评论“我有一个黑人朋友”被标记为有毒，而“我有一个白人朋友”则没有。这些评论都不应该被标记为有毒，但该模型似乎错误地将某些身份与有毒联系起来。这是一种偏见的迹象：该模型似乎偏向基督徒而反对穆斯林，并且似乎偏向白人而反对黑人。

5) Test your understanding

5) 测试您的理解

We'll step away from the Jigsaw competition data and consider a similar (but hypothetical!) scenario where you're working with a dataset of online comments to train a model to classify comments as toxic.

我们将脱离 Jigsaw 竞赛数据，考虑一个类似的（但假设的！）场景，即您正在使用在线评论数据集来训练模型将评论归类为有害评论。

You notice that comments that refer to Islam are more likely to be toxic than comments that refer to other religions, because the online community is islamophobic. What type of bias can this introduce to your model?

您注意到，与涉及其他宗教的评论相比，涉及伊斯兰教的评论更有可能是有害的，因为在线社区是仇视伊斯兰教的。这会给您的模型带来什么类型的偏见？

Once you have answered the question, run the next code cell to see the official answer.

回答完问题后，运行下一个代码单元以查看官方答案。

# Check your answer (Run this code cell to get credit!)
q_5.check()

Solution: Comments that refer to Islam are more likely to be classified as toxic, because of a flawed state of the online community where the data was collected. This can introduce historical bias.

由于收集数据的在线社区存在缺陷，涉及伊斯兰教的评论更有可能被归类为恶意评论。这可能会引入历史偏见。

6) Test your understanding, part 2

6) 测试您的理解，第 2 部分

We'll continue with the same hypothetical scenario, where you're trying to train a model to classify online comments as toxic.

我们将继续使用相同的假设场景，您尝试训练模型将在线评论归类为有害评论。

You take any comments that are not already in English and translate them to English with a separate tool. Then, you treat all posts as if they were originally expressed in English. What type of bias will your model suffer from?

您将任何非英文评论用单独的工具翻译成英文。然后，您将所有帖子视为最初用英文表达。您的模型会遭受什么类型的偏见？

Once you have answered the question, run the next code cell to see the official answer.

回答完问题后，运行下一个代码单元以查看官方答案。

# Check your answer (Run this code cell to get credit!)
q_6.check()

Solution: By translating comments to English, we introduce additional error when classifying non-English comments. This can introduce measurement bias, since non-English comments will often not be translated perfectly. It could also introduce aggregation bias: the model would likely perform better for comments expressed in all languages, if the comments from different languages were treated differently.

通过将评论翻译成英语，我们在对非英语评论进行分类时引入了额外的误差。这可能会导致测量偏差，因为非英语评论通常不会被完美翻译。它还可能引入聚合偏差：如果对来自不同语言的评论进行不同的处理，该模型可能会对所有语言表达的评论表现得更好。

7) Test your understanding, part 3

7) 测试您的理解，第 3 部分

We'll continue with the same hypothetical scenario, where you're trying to train a model to classify online comments as toxic.

我们将继续使用相同的假设场景，即您尝试训练一个模型，将在线评论归类为有害评论。

The dataset you're using to train the model contains comments primarily from users based in the United Kingdom.

您用于训练模型的数据集主要包含来自英国用户的评论。

After training a model, you evaluate its performance with another dataset of comments, also primarily from users based in the United Kingdom -- and it gets great performance! You deploy it for a company based in Australia, and it does not perform well, because of differences between British and Australian English. What types of bias does the model suffer from?

训练模型后，您使用另一个评论数据集（也主要来自英国用户）评估其性能 - 并且它获得了出色的性能！您为一家位于澳大利亚的公司部署了它，但由于英国英语和澳大利亚英语之间的差异，它的表现并不好。该模型遭受了哪些类型的偏见？

Once you have answered the question, run the next code cell to see the official answer.

回答完问题后，运行下一个代码单元以查看官方答案。

# Check your answer (Run this code cell to get credit!)
q_7.check()

Solution: If the model is evaluated based on comments from users in the United Kingdom and deployed to users in Australia, this will lead to evaluation bias and deployment bias. The model will also have representation bias, because it was built to serve users in Australia, but was trained with data from users based in the United Kingdom.

如果根据英国用户的评论对模型进行评估，并将其部署到澳大利亚用户，这将导致评估偏差和部署偏差。该模型还存在代表性偏差，因为它是为澳大利亚用户构建的，但却使用来自英国用户的数据进行训练。

Learn more

了解更多

To continue learning about bias, check out the Jigsaw Unintended Bias in Toxicity Classification competition that was introduced in this exercise.

要继续了解偏见，请查看本练习中介绍的 Jigsaw 毒性分类中的非故意偏见竞赛。

Kaggler Dieter has written a helpful two-part series that teaches you how to preprocess the data and train a neural network to make a competition submission. Get started here.
Kaggler Dieter 编写了一个有用的两部分系列，教您如何预处理数据并训练神经网络以提交竞赛。从这里开始。
Many Kagglers have written helpful notebooks that you can use to get started. Check them out on the competition page.
许多 Kaggler 都编写了有用的笔记本，您可以使用它们来开始。查看比赛页面。

Another Kaggle competition that you can use to learn about bias is the Inclusive Images Challenge, which you can read more about in this blog post. The competition focuses on evaluation bias in computer vision.

另一个可用于了解偏见的 Kaggle 比赛是 Inclusive Images Challenge，您可以在此博客文章中阅读更多相关信息。比赛重点关注计算机视觉中的 评估偏见。

Keep going

继续

How can you quantify bias in machine learning applications? Continue to learn how to measure fairness.

如何量化机器学习应用中的偏见？继续 了解如何衡量公平性 。

03.exercise-identifying-bias-in-ai【练习：识别AI中的偏见】

03.exercise-identifying-bias-in-ai【练习：识别AI中的偏见】

Introduction

简介

1) Try out the model

1) 试用模型

2) Most toxic words

2) 最有害的词语

3) A closer investigation

3) 进一步调查

4) Identify bias

4) 识别偏见

5) Test your understanding

5) 测试您的理解

6) Test your understanding, part 2

6) 测试您的理解，第 2 部分

7) Test your understanding, part 3

7) 测试您的理解，第 3 部分

Learn more

了解更多

Keep going

继续

Leave a Reply Cancel reply