Introduction
介绍
Selecting specific values of a pandas DataFrame or Series to work on is an implicit step in almost any data operation you'll run, so one of the first things you need to learn in working with data in Python is how to go about selecting the data points relevant to you quickly and effectively.
选择要处理的 pandas DataFrame 或 Series 的特定值是您将运行的几乎所有数据操作中的隐式步骤,因此在 Python 中处理数据时需要学习的第一件事就是如何选择数据,从而快速有效地获取与您相关的要点。
import pandas as pd
reviews = pd.read_csv("../00 datasets/zynicide/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option('display.max_rows', 5)
/tmp/ipykernel_13479/3156387366.py:1: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
import pandas as pd
To start the exercise for this topic, please click here.
要开始本主题的练习,请单击此处。
Native accessors
原生访问器
Native Python objects provide good ways of indexing data. Pandas carries all of these over, which helps make it easy to start with.
原生 Python 对象提供了索引数据的好方法。 Pandas 继承了所有这些,这有助于轻松开始。
Consider this DataFrame:
考虑这个DataFrame:
reviews
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
129971 rows × 13 columns
In Python, we can access the property of an object by accessing it as an attribute. A book
object, for example, might have a title
property, which we can access by calling book.title
. Columns in a pandas DataFrame work in much the same way.
在Python中,我们可以通过将对象的属性作为属性来访问它。 例如,book
对象可能有一个title
属性,我们可以通过调用book.title
来访问该属性。 pandas DataFrame 中的列的工作方式大致相同。
Hence to access the country
property of reviews
we can use:
因此,要访问reviews
的country
属性,我们可以使用:
reviews.country
0 Italy
1 Portugal
...
129969 France
129970 France
Name: country, Length: 129971, dtype: object
If we have a Python dictionary, we can access its values using the indexing ([]
) operator. We can do the same with columns in a DataFrame:
如果我们有一个 Python 字典,我们可以使用索引 ([]
) 运算符访问它的值。 我们可以对 DataFrame 中的列执行相同的操作:
reviews['country']
0 Italy
1 Portugal
...
129969 France
129970 France
Name: country, Length: 129971, dtype: object
These are the two ways of selecting a specific Series out of a DataFrame. Neither of them is more or less syntactically valid than the other, but the indexing operator []
does have the advantage that it can handle column names with reserved characters in them (e.g. if we had a country providence
column, reviews.country providence
wouldn't work).
这是从 DataFrame 中选择特定 Series 的两种方法。 它们在语法上都或多或少比另一个有效,但是索引运算符[]
确实有一个优点,它可以处理其中包含保留字符的列名(例如,如果我们有一个country Providence
列,reviews.country Providence
不起作用)。
Doesn't a pandas Series look kind of like a fancy dictionary? It pretty much is, so it's no surprise that, to drill down to a single specific value, we need only use the indexing operator []
once more:
pandas 系列看起来不像一本精美的字典吗? 几乎是这样,所以毫不奇怪,要深入到单个特定值,我们只需要再次使用索引运算符[]
:
reviews['country'][0]
'Italy'
Indexing in pandas
在 pandas 中建立索引
The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, loc
and iloc
. For more advanced operations, these are the ones you're supposed to be using.
索引运算符和属性选择很好,因为它们的工作方式就像在 Python 生态系统的其他部分一样。 作为新手,这使得它们很容易上手和使用。 然而,pandas 有自己的访问器运算符loc
和iloc
。 对于更高级的操作,您应该使用这些操作。
Index-based selection
基于索引的选择
Pandas indexing works in one of two paradigms. The first is index-based selection: selecting data based on its numerical position in the data. iloc
follows this paradigm.
To select the first row of data in a DataFrame, we may use the following:
Pandas 索引以两种范式之一工作。 第一个是基于索引的选择:根据数据在数据中的数字位置来选择数据。 iloc
遵循这个范式。
reviews.iloc[0]
country Italy
description Aromas include tropical fruit, broom, brimston...
...
variety White Blend
winery Nicosia
Name: 0, Length: 13, dtype: object
Both loc
and iloc
are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.
loc
和 iloc
都是行第一,列第二。 这与我们在原生 Python 中所做的相反,即列在前,行在后。
This means that it's marginally easier to retrieve rows, and marginally harder to get retrieve columns. To get a column with iloc
, we can do the following:
这意味着检索行稍微容易一些,而检索列稍微困难一些。 要获取包含iloc
的列,我们可以执行以下操作:
reviews.iloc[:, 0]
0 Italy
1 Portugal
...
129969 France
129970 France
Name: country, Length: 129971, dtype: object
On its own, the :
operator, which also comes from native Python, means "everything". When combined with other selectors, however, it can be used to indicate a range of values. For example, to select the country
column from just the first, second, and third row, we would do:
就其本身而言,同样来自原生 Python 的:
运算符意味着所有
。 但是,当与其他选择器结合使用时,它可以用于指示值的范围。 例如,要从第一、第二和第三行中选择国家/地区
列,我们可以如此操作:
reviews.iloc[:3, 0]
0 Italy
1 Portugal
2 US
Name: country, dtype: object
Or, to select just the second and third entries, we would do:
或者,要仅选择第二个和第三个条目,我们会这样做:
reviews.iloc[1:3, 0]
1 Portugal
2 US
Name: country, dtype: object
It's also possible to pass a list:
也可以传递一个列表:
reviews.iloc[[0, 1, 2], 0]
0 Italy
1 Portugal
2 US
Name: country, dtype: object
Finally, it's worth knowing that negative numbers can be used in selection. This will start counting forwards from the end of the values. So for example here are the last five elements of the dataset.
最后,值得注意的是负数可以用于选择。 这将从值的 末尾 开始向前计数。 例如,这里是数据集的最后五个元素。
reviews.iloc[-5:,-2:]
variety | winery | |
---|---|---|
129966 | Riesling | Dr. H. Thanisch (Erben Müller-Burggraef) |
129967 | Pinot Noir | Citation |
129968 | Gewürztraminer | Domaine Gresser |
129969 | Pinot Gris | Domaine Marcel Deiss |
129970 | Gewürztraminer | Domaine Schoffit |
Label-based selection
基于标签的选择
The second paradigm for attribute selection is the one followed by the loc
operator: label-based selection. In this paradigm, it's the data index value, not its position, which matters.
属性选择的第二个范例是后跟loc
运算符的范例:基于标签的选择。 在此范例中,重要的是数据索引值,而不是其位置。
For example, to get the first entry in reviews
, we would now do the following:
例如,要获取reviews
中的第一个条目,我们现在需要执行以下操作:
reviews.loc[0, 'country']
'Italy'
iloc
is conceptually simpler than loc
because it ignores the dataset's indices. When we use iloc
we treat the dataset like a big matrix (a list of lists), one that we have to index into by position. loc
, by contrast, uses the information in the indices to do its work. Since your dataset usually has meaningful indices, it's usually easier to do things using loc
instead. For example, here's one operation that's much easier using loc
:
iloc
在概念上比loc
更简单,因为它忽略数据集的索引。 当我们使用iloc
时,我们将数据集视为一个大矩阵(列表的列表),我们必须按位置对其进行索引。 相比之下,loc
使用索引中的信息来完成其工作。 由于您的数据集通常具有有意义的索引,因此使用loc
通常更容易完成操作。 例如,这里有一个使用loc
更容易的操作:
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]
taster_name | taster_twitter_handle | points | |
---|---|---|---|
0 | Kerin O’Keefe | @kerinokeefe | 87 |
1 | Roger Voss | @vossroger | 87 |
... | ... | ... | ... |
129969 | Roger Voss | @vossroger | 90 |
129970 | Roger Voss | @vossroger | 90 |
129971 rows × 3 columns
Choosing between loc
and iloc
在 loc
和 iloc
之间进行选择
When choosing or transitioning between loc
and iloc
, there is one "gotcha" worth keeping in mind, which is that the two methods use slightly different indexing schemes.
在loc
和iloc
之间选择或转换时,有一个值得记住的要诀
,那就是这两种方法使用略有不同的索引方案。
iloc
uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10
will select entries 0,...,9
. loc
, meanwhile, indexes inclusively. So 0:10
will select entries 0,...,10
.
iloc
使用 Python stdlib 索引方案,其中包含范围的第一个元素,排除最后一个元素。 因此0:10
将选择条目0,...,9
。 同时,loc
包含索引。 因此0:10
将选择条目0,...,10
。
Why the change? Remember that loc can index any stdlib type: strings, for example. If we have a DataFrame with index values Apples, ..., Potatoes, ...
, and we want to select "all the alphabetical fruit choices between Apples and Potatoes", then it's a lot more convenient to index df.loc['Apples':'Potatoes']
than it is to index something like df.loc['Apples', 'Potatoet']
(t
coming after s
in the alphabet).
为什么要改变? 请记住,loc
可以索引任何 stdlib 类型:例如字符串。 如果我们有一个带有索引值苹果,...,土豆,...
的 DataFrame,并且我们想要选择苹果和土豆之间的所有按字母顺序排列的水果选择
,那么索引df.loc['Apples':'Potatoes']
会方便得多, 比索引类似 df.loc['Apples', 'Potatoet']
的东西要好(t
在字母表中位于 s
之后)。
This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000
. In this case df.iloc[0:1000]
will return 1000 entries, while df.loc[0:1000]
return 1001 of them! To get 1000 elements using loc
, you will need to go one lower and ask for df.loc[0:999]
.
当 DataFrame 索引是一个简单的数字列表(例如 0,...,1000
。 在这种情况下,df.iloc[0:1000]
将返回 1000 个条目,而df.loc[0:1000]
则返回其中的 1001 个! 要使用 loc
获取 1000 个元素,您需要向下一级并请求 df.loc[0:999]
。
# 注意loc含最后一项,iloc不含
reviews.loc[0:1000].shape, reviews.iloc[0:1000].shape
((1001, 13), (1000, 13))
Otherwise, the semantics of using loc
are the same as those for iloc
.
否则,使用loc
的语义与iloc
的语义相同。
Manipulating the index
操作索引
Label-based selection derives its power from the labels in the index. Critically, the index we use is not immutable. We can manipulate the index in any way we see fit.
基于标签的选择的力量来自于索引中的标签。 至关重要的是,我们使用的索引并不是一成不变的。 我们可以以任何我们认为合适的方式操纵索引。
The set_index()
method can be used to do the job. Here is what happens when we set_index
to the title
field:
可以使用set_index()
方法来完成这项工作。 当我们将set_index
设置为title
字段时,会发生以下情况:
reviews.set_index("title")
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
title | ||||||||||||
Nicosia 2013 Vulkà Bianco (Etna) | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | White Blend | Nicosia |
Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Portuguese Red | Quinta dos Avidagos |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Pinot Gris | Domaine Marcel Deiss |
Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caroline Gewurztraminer (Alsace) | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Gewürztraminer | Domaine Schoffit |
129971 rows × 12 columns
This is useful if you can come up with an index for the dataset which is better than the current one.
如果您可以为数据集找到一个比当前索引更好的索引,那么这将很有用。
Conditional selection
条件选择
So far we've been indexing various strides of data, using structural properties of the DataFrame itself. To do interesting things with the data, however, we often need to ask questions based on conditions.
到目前为止,我们已经使用 DataFrame 本身的结构属性对各种数据进行索引。 然而,为了利用数据做有趣的事情,我们经常需要根据条件提出问题。
For example, suppose that we're interested specifically in better-than-average wines produced in Italy.
例如,假设我们对意大利生产的优于平均水平的葡萄酒特别感兴趣。
We can start by checking if each wine is Italian or not:
我们可以首先检查每种葡萄酒是否是意大利的:
reviews.country == 'Italy'
0 True
1 False
...
129969 False
129970 False
Name: country, Length: 129971, dtype: bool
This operation produced a Series of True
/False
booleans based on the country
of each record. This result can then be used inside of loc
to select the relevant data:
此操作根据每个记录的国家/地区
生成一系列True
/False
布尔值。 然后可以在 loc
内部使用该结果来选择相关数据:
reviews.loc[reviews.country == 'Italy']
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
6 | Italy | Here's a bright, informal red that opens with ... | Belsito | 87 | 16.0 | Sicily & Sardinia | Vittoria | NaN | Kerin O’Keefe | @kerinokeefe | Terre di Giurfo 2013 Belsito Frappato (Vittoria) | Frappato | Terre di Giurfo |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129961 | Italy | Intense aromas of wild cherry, baking spice, t... | NaN | 90 | 30.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | COS 2013 Frappato (Sicilia) | Frappato | COS |
129962 | Italy | Blackberry, cassis, grilled herb and toasted a... | Sàgana Tenuta San Giacomo | 90 | 40.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | Cusumano 2012 Sàgana Tenuta San Giacomo Nero d... | Nero d'Avola | Cusumano |
19540 rows × 13 columns
This DataFrame has ~20,000 rows. The original had ~130,000. That means that around 15% of wines originate from Italy.
We also wanted to know which ones are better than average. Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points.
We can use the ampersand (&
) to bring the two questions together:
该 DataFrame 有约 20,000 行。 原数据集大约有 130,000 行。 这意味着大约 15% 的葡萄酒源自意大利。
我们还想知道哪些比平均水平更好。 葡萄酒的评分标准为 80 到 100 分,因此这可能意味着至少获得 90 分的葡萄酒。
我们可以使用与号(&
)将两个问题放在一起:
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
120 | Italy | Slightly backward, particularly given the vint... | Bricco Rocche Prapó | 92 | 70.0 | Piedmont | Barolo | NaN | NaN | NaN | Ceretto 2003 Bricco Rocche Prapó (Barolo) | Nebbiolo | Ceretto |
130 | Italy | At the first it was quite muted and subdued, b... | Bricco Rocche Brunate | 91 | 70.0 | Piedmont | Barolo | NaN | NaN | NaN | Ceretto 2003 Bricco Rocche Brunate (Barolo) | Nebbiolo | Ceretto |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129961 | Italy | Intense aromas of wild cherry, baking spice, t... | NaN | 90 | 30.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | COS 2013 Frappato (Sicilia) | Frappato | COS |
129962 | Italy | Blackberry, cassis, grilled herb and toasted a... | Sàgana Tenuta San Giacomo | 90 | 40.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | Cusumano 2012 Sàgana Tenuta San Giacomo Nero d... | Nero d'Avola | Cusumano |
6648 rows × 13 columns
Suppose we'll buy any wine that's made in Italy or which is rated above average. For this we use a pipe (|
):
假设我们要购买意大利生产的任何葡萄酒,或评级高于平均水平的葡萄酒。 为此,我们使用管道符(|
):
reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
6 | Italy | Here's a bright, informal red that opens with ... | Belsito | 87 | 16.0 | Sicily & Sardinia | Vittoria | NaN | Kerin O’Keefe | @kerinokeefe | Terre di Giurfo 2013 Belsito Frappato (Vittoria) | Frappato | Terre di Giurfo |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
61937 rows × 13 columns
Pandas comes with a few built-in conditional selectors, two of which we will highlight here.
The first is isin
. isin
is lets you select data whose value "is in" a list of values. For example, here's how we can use it to select wines only from Italy or France:
Pandas 带有一些内置的条件选择器,我们将在这里重点介绍其中两个。
第一个是isin
。 isin
允许您选择其值位于
值列表中的数据。 例如,我们可以使用它来仅选择来自意大利或法国的葡萄酒:
reviews.loc[reviews.country.isin(['Italy', 'France'])]
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
6 | Italy | Here's a bright, informal red that opens with ... | Belsito | 87 | 16.0 | Sicily & Sardinia | Vittoria | NaN | Kerin O’Keefe | @kerinokeefe | Terre di Giurfo 2013 Belsito Frappato (Vittoria) | Frappato | Terre di Giurfo |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
41633 rows × 13 columns
The second is isnull
(and its companion notnull
). These methods let you highlight values which are (or are not) empty (NaN
). For example, to filter out wines lacking a price tag in the dataset, here's what we would do:
第二个是isnull
(及其同伴notnull
)。 这些方法可让您突出显示空(或非空)(NaN
)的值。 例如,要过滤掉数据集中缺少价格标签的葡萄酒,我们将执行以下操作:
reviews.loc[reviews.price.notnull()]
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos |
2 | US | Tart and snappy, the flavors of lime flesh and... | NaN | 87 | 14.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Rainstorm 2013 Pinot Gris (Willamette Valley) | Pinot Gris | Rainstorm |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
120975 rows × 13 columns
Assigning data
赋值数据
Going the other way, assigning data to a DataFrame is easy. You can assign either a constant value:
另一方面,将数据分配给 DataFrame 很容易。 您可以分配一个常量值:
reviews['critic'] = 'everyone'
reviews['critic']
0 everyone
1 everyone
...
129969 everyone
129970 everyone
Name: critic, Length: 129971, dtype: object
Or with an iterable of values:
或者使用可迭代的值:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']
0 129971
1 129970
...
129969 2
129970 1
Name: index_backwards, Length: 129971, dtype: int64
Your turn
到你了
If you haven't started the exercise, you can get started here.
如果您还没有开始练习,可以从这里开始。