This notebook is an exercise in the Pandas course. You can reference the tutorial at this link.

Introduction

介绍

Now you are ready to get a deeper understanding of your data.

现在您已准备好更深入地了解您的数据。

Run the following cell to load your data and some utility functions (including code to check your answers).

运行以下单元格来加载您的数据和一些实用函数（包括用于检查答案的代码）。

import pandas as pd
pd.set_option("display.max_rows", 5)
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.summary_functions_and_maps import *
print("Setup complete.")

reviews.head()

Setup complete.

	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend	Nicosia
1	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	87	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red	Quinta dos Avidagos
2	US	Tart and snappy, the flavors of lime flesh and...	NaN	87	14.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Rainstorm 2013 Pinot Gris (Willamette Valley)	Pinot Gris	Rainstorm
3	US	Pineapple rind, lemon pith and orange blossom ...	Reserve Late Harvest	87	13.0	Michigan	Lake Michigan Shore	NaN	Alexander Peartree	NaN	St. Julian 2013 Reserve Late Harvest Riesling ...	Riesling	St. Julian
4	US	Much like the regular bottling from 2012, this...	Vintner's Reserve Wild Child Block	87	65.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Sweet Cheeks 2012 Vintner's Reserve Wild Child...	Pinot Noir	Sweet Cheeks

Exercises

1.

What is the median of the points column in the reviews DataFrame?

DataFrame reviews 中points 列的中位数是多少？

#median_points = ____

median_points = reviews['points'].median()

# Check your answer
q1.check()

Correct

#q1.hint()
q1.solution()

Solution:

median_points = reviews.points.median()

2.

What countries are represented in the dataset? (Your answer should not include any duplicates.)

数据集中有哪些国家/地区？（您的答案不应包含任何重复项。）

#countries = ____

countries = reviews['country'].unique()

# Check your answer
q2.check()

Correct

#q2.hint()
q2.solution()

Solution:

countries = reviews.country.unique()

3.

How often does each country appear in the dataset? Create a Series reviews_per_country mapping countries to the count of reviews of wines from that country.

每个国家/地区在数据集中出现的频率是多少？创建一个Seriesreviews_per_country，将国家/地区映射到该国家/地区的葡萄酒评论数量。

#reviews_per_country = ____

reviews_per_country = reviews['country'].value_counts()

# Check your answer
q3.check()

Correct

#q3.hint()
q3.solution()

Solution:

reviews_per_country = reviews.country.value_counts()

4.

Create variable centered_price containing a version of the price column with the mean price subtracted.

创建变量centered_price，其中包含price列减去平均价格的版本。

(Note: this 'centering' transformation is a common preprocessing step before applying various machine learning algorithms.)

（注意：这种居中转换是应用各种机器学习算法之前的常见预处理步骤。）

#centered_price = ____

centered_price = centered_price = reviews['price'] - reviews['price'].mean()

# Check your answer
q4.check()

Correct

#q4.hint()
q4.solution()

Solution:

centered_price = reviews.price - reviews.price.mean()

5.

I'm an economical wine buyer. Which wine is the "best bargain"? Create a variable bargain_wine with the title of the wine with the highest points-to-price ratio in the dataset.

我是一个考虑经济实惠的葡萄酒买家。哪种酒是最划算的？创建一个变量bargain_wine，其中包含数据集中性价比最高的葡萄酒的名称。

reviews[reviews['points']/reviews['price'] == (reviews['points']/reviews['price']).max()]['title']

64590                         Bandit NV Merlot (California)
126096    Cramele Recas 2011 UnWineD Pinot Grigio (Viile...
Name: title, dtype: object

#bargain_wine = ____
# 这个答案有问题，这题应该有两个答案
bargain_wine = reviews[reviews['points']/reviews['price'] == (reviews['points']/reviews['price']).max()][['title']]
bargain_wine
#标准答案是错误的
bargain_idx = (reviews.points / reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx, 'title']

# Check your answer
q5.check()

Correct

#q5.hint()
q5.solution()

Solution:

bargain_idx = (reviews.points / reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx, 'title']

6.

There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series descriptor_counts counting how many times each of these two words appears in the description column in the dataset. (For simplicity, let's ignore the capitalized versions of these words.)

在描述一瓶酒时，你只能使用这么多的词语。葡萄酒更有可能是热带还是果味？创建一个系列descriptor_counts，计算这两个单词在数据集中的description列中出现的次数。（为简单起见，我们忽略这些单词的大写版本。）

#descriptor_counts = ____

n_tropical = reviews["description"].map(lambda p : "tropical" in p).sum()
n_fruity = reviews["description"].map(lambda p : "fruity" in p).sum()
descriptor_counts = pd.Series([n_tropical, n_fruity], index=['tropical', 'fruity'])

# Check your answer
q6.check()

descriptor_counts

Correct

tropical    3607
fruity      9090
dtype: int64

#q6.hint()
q6.solution()

Solution:

n_trop = reviews.description.map(lambda desc: "tropical" in desc).sum()
n_fruity = reviews.description.map(lambda desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])

7.

We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

我们希望在我们的网站上发布这些葡萄酒评论，但从 80 到 100 分的评级系统太难理解 - 我们希望将它们转化为简单的星级评级。 95 分或以上为 3 星，85 分以上但低于 95 分为 2 星。任何其他分数均为 1 星。

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

此外，加拿大葡萄酒商协会在该网站上购买了大量广告，因此任何来自加拿大的葡萄酒都应该自动获得 3 星，无论分数如何。

Create a series star_ratings with the number of stars corresponding to each review in the dataset.

创建一个Seriesstar_ ratings，其中包含与数据集中每条评论相对应的星星数量。

def rate_star(row):
    #star = 0
    if row['country'] == 'Canada':
        star = 3
    else:
        if row['points'] >= 95:
            star = 3
        elif row['points'] < 85:
            star = 1
        else:
            star = 2
    return star

#star_ratings = ____
star_ratings = reviews.apply(rate_star, axis=1)
# Check your answer
q7.check()

Correct

#q7.hint()
q7.solution()

Solution:

def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1

star_ratings = reviews.apply(stars, axis='columns')

Keep going

继续前进

Continue to grouping and sorting.

继续分组和排序。

03.exercise-summary-functions-and-maps【练习：摘要函数及映射】

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29

03.exercise-summary-functions-and-maps【练习：摘要函数及映射】

Introduction

介绍

Exercises

1.

2.

3.

4.

5.

6.

7.

Keep going

继续前进

Leave a Reply Cancel reply