Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

05.exercise-data-types-and-missing-values【练习:数据类型及缺失值】

This notebook is an exercise in the Pandas course. You can reference the tutorial at this link.


Introduction

介绍

Run the following cell to load your data and some utility functions.

运行以下单元格来加载数据和一些实用函数。

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.data_types_and_missing_data import *
print("Setup complete.")
Setup complete.

Exercises

练习

1.

What is the data type of the points column in the dataset?

数据集中points列的数据类型是什么?

# Your code here
#dtype = ____

dtype = reviews['points'].dtype

# Check your answer
q1.check()

dtype

Correct

dtype('int64')
#q1.hint()
q1.solution()

Solution:

dtype = reviews.points.dtype

2.

Create a Series from entries in the points column, but convert the entries to strings. Hint: strings are str in native Python.

points列中的条目创建一个Series,但将条目转换为字符串。 提示:字符串在本机 Python 中是str

#point_strings = ____

# point_strings = reviews['points'].apply(lambda x: str(x))
point_strings = reviews['points'].astype(str)
# Check your answer
q2.check()

point_strings

Correct

0         87
1         87
2         87
3         87
4         87
          ..
129966    90
129967    90
129968    90
129969    90
129970    90
Name: points, Length: 129971, dtype: object
#q2.hint()
q2.solution()

Solution:

point_strings = reviews.points.astype(str)

3.

Sometimes the price column is null. How many reviews in the dataset are missing a price?

有时价格列为空。 数据集中有多少评论缺少价格?

#n_missing_prices = ____

n_missing_prices = reviews['price'].isnull().sum()

# Check your answer
q3.check()

n_missing_prices

Correct

8996
#q3.hint()
q3.solution()

Solution:

missing_price_reviews = reviews[reviews.price.isnull()]
n_missing_prices = len(missing_price_reviews)
# Cute alternative solution: if we sum a boolean series, True is treated as 1 and False as 0
n_missing_prices = reviews.price.isnull().sum()
# or equivalently:
n_missing_prices = pd.isnull(reviews.price).sum()

4.

What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the region_1 field. This field is often missing data, so replace missing values with Unknown. Sort in descending order. Your output should look something like this:

最常见的葡萄酒产区有哪些? 创建一个系列,计算每个值在region_1字段中出现的次数。 该字段经常缺少数据,因此将缺少的值替换为Unknown。 按降序排列。 你的输出应该是这样的:

Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64
#reviews_per_region = ____

# reviews_per_region = reviews[['region_1']].fillna('Unknown').groupby(['region_1']).size().sort_values(ascending=False)
reviews_per_region = reviews.loc[:,'region_1'].fillna('Unknown').value_counts().sort_values(ascending=False)

# Check your answer
q4.check()

Correct

#q4.hint()
q4.solution()

Solution:

reviews_per_region = reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)

Keep going

继续

Move on to renaming and combining.

继续重命名和组合

05.exercise-data-types-and-missing-values【练习:数据类型及缺失值】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top