Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

01.exercise-getting-started-with-sql-and-bigquery【练习:开始使用SQL和BQ】

This notebook is an exercise in the SQL course. You can reference the tutorial at this link.


Introduction

介绍

The first test of your new data exploration skills uses data describing crime in the city of Chicago.

对您的新数据探索技能的首次测试,使用描述芝加哥市犯罪的数据。

Before you get started, run the following cell. It sets up the automated feedback system to review your answers.

在开始之前,请运行以下单元格。 它设置自动反馈系统来审查您的答案。

# Set up feedack system
from learntools.core import binder
binder.bind(globals())
from learntools.sql.ex1 import *
print("Setup Complete")
Using Kaggle's public dataset BigQuery integration.
Setup Complete

Use the next code cell to fetch the dataset.

使用下一个代码单元来获取数据集。

from google.cloud import bigquery

# Create a "Client" object
client = bigquery.Client()

# Construct a reference to the "chicago_crime" dataset
# 构建对“chicago_crime”数据集的引用
dataset_ref = client.dataset("chicago_crime", project="bigquery-public-data")

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)
Using Kaggle's public dataset BigQuery integration.

Exercises

练习

1) Count tables in the dataset

1) 统计数据集中的表数

How many tables are in the Chicago Crime dataset?

芝加哥犯罪数据集中有多少张表?

# Write the code you need here to figure out the answer
tables = list(client.list_tables(dataset))
# Print names of all tables in the dataset (there are four!)
for table in tables:  
    print(table.table_id)
crime
num_tables = 1  # Store the answer as num_tables and then run this cell

# Check your answer
q_1.check()

Correct

For a hint or the solution, uncomment the appropriate line below.

# q_1.hint()
q_1.solution()

Solution:


# List all the tables in the "chicago_crime" dataset
tables = list(client.list_tables(dataset))

# Print number of tables in the dataset
print(len(tables))

num_tables = 1

2) Explore the table schema

2) 探索表模式

How many columns in the crime table have TIMESTAMP data?

crime表中有多少列有TIMESTAMP数据?

# Write the code to figure out the answer
table_ref = dataset_ref.table("crime")
table = client.get_table(table_ref)
field_count = 0
for schema in table.schema:
    if schema.field_type == 'TIMESTAMP':
        field_count += 1
print(field_count)
2
num_timestamp_fields = 2 # Put your answer here

# Check your answer
q_2.check()

Correct

For a hint or the solution, uncomment the appropriate line below.

如需提示或解决方案,请取消注释下面相应的行。

# q_2.hint()
q_2.solution()

Solution:


# Construct a reference to the "crime" table
table_ref = dataset_ref.table("crime")

# API request - fetch the table
table = client.get_table(table_ref)

# Print information on all the columns in the "crime" table in the "chicago_crime" dataset
print(table.schema)

num_timestamp_fields = 2

3) Create a crime map

3) 创建犯罪地图

If you wanted to create a map with a dot at the location of each crime, what are the names of the two fields you likely need to pull out of the crime table to plot the crimes on a map?

如果您想创建一个在每个犯罪地点都有一个点的地图,那么您可能需要从犯罪表中提取两个字段的名称来在地图上绘制犯罪情况?

# Write the code here to explore the data so you can find the answer
table.schema
[SchemaField('unique_key', 'INTEGER', 'REQUIRED', None, (), None),
 SchemaField('case_number', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('date', 'TIMESTAMP', 'NULLABLE', None, (), None),
 SchemaField('block', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('iucr', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('primary_type', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('description', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('location_description', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('arrest', 'BOOLEAN', 'NULLABLE', None, (), None),
 SchemaField('domestic', 'BOOLEAN', 'NULLABLE', None, (), None),
 SchemaField('beat', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('district', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('ward', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('community_area', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('fbi_code', 'STRING', 'NULLABLE', None, (), None),
 SchemaField('x_coordinate', 'FLOAT', 'NULLABLE', None, (), None),
 SchemaField('y_coordinate', 'FLOAT', 'NULLABLE', None, (), None),
 SchemaField('year', 'INTEGER', 'NULLABLE', None, (), None),
 SchemaField('updated_on', 'TIMESTAMP', 'NULLABLE', None, (), None),
 SchemaField('latitude', 'FLOAT', 'NULLABLE', None, (), None),
 SchemaField('longitude', 'FLOAT', 'NULLABLE', None, (), None),
 SchemaField('location', 'STRING', 'NULLABLE', None, (), None)]
fields_for_plotting = ['latitude', 'longitude'] # Put your answers here

# Check your answer
q_3.check()

Correct

For a hint or the solution, uncomment the appropriate line below.

# q_3.hint()
q_3.solution()

Solution:

fields_for_plotting = ['latitude', 'longitude']

Thinking about the question above, there are a few columns that appear to have geographic data. Look at a few values (with the list_rows() command) to see if you can determine their relationship. Two columns will still be hard to interpret. But it should be obvious how the location column relates to latitude and longitude.

思考上面的问题,有几列似乎有地理数据。 查看几个值(使用list_rows()命令),看看是否可以确定它们的关系。 两列仍然很难解释。 但位置列与纬度经度的关系应该是显而易见的。

# Scratch space for your code

client.list_rows(table, max_results=10).to_dataframe()[['latitude', 'longitude','location']]
latitude longitude location
0 41.885879 -87.630979 (41.885879253, -87.630979425)
1 41.882478 -87.637028 (41.882477553, -87.637028112)
2 41.886013 -87.629505 (41.886012928, -87.629505318)
3 41.885908 -87.626289 (41.885908101, -87.626289429)
4 41.883375 -87.629428 (41.883375237, -87.629427772)
5 41.885280 -87.626267 (41.885279532, -87.626266981)
6 41.878864 -87.638503 (41.878864086, -87.638503099)
7 41.885482 -87.627936 (41.885481891, -87.627935689)
8 41.885347 -87.627930 (41.885347387, -87.627929824)
9 41.878931 -87.639586 (41.878931004, -87.639585621)

Keep going

继续前进

You've looked at the schema, but you haven't yet done anything exciting with the data itself. Things get more interesting when you get to the data, so keep going to write your first SQL query.

您已经查看了架构,但还没有对数据本身做任何令人兴奋的事情。 当您获取数据时,事情会变得更加有趣,因此请继续编写您的第一个 SQL 查询

01.exercise-getting-started-with-sql-and-bigquery【练习:开始使用SQL和BQ】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top