Welcome to Computer Vision!

欢迎来到计算机视觉！

Have you ever wanted to teach a computer to see? In this course, that's exactly what you'll do!

你有没有想过教计算机看东西？在本课程中，这正是您要做的！

In this course, you'll:

在本课程中，您将：

Use modern deep-learning networks to build an image classifier with Keras
使用现代深度学习网络通过 Keras 构建图像分类器
Design your own custom convnet with reusable blocks
使用可重复使用的块设计您自己的自定义卷积网络
Learn the fundamental ideas behind visual feature extraction
了解视觉特征提取背后的基本思想
Master the art of transfer learning to boost your models
掌握迁移学习的艺术来提升您的模型
Utilize data augmentation to extend your dataset
利用数据增强来扩展您的数据集

If you've taken the Introduction to Deep Learning course, you'll know everything you need to be successful.

如果您已经学习了深度学习简介课程，您将了解所需的一切。

Now let's get started!

现在让我们开始吧！

Introduction

介绍

This course will introduce you to the fundamental ideas of computer vision. Our goal is to learn how a neural network can "understand" a natural image well-enough to solve the same kinds of problems the human visual system can solve.

本课程将向您介绍计算机视觉的基本思想。我们的目标是了解神经网络如何充分理解自然图像，以解决人类视觉系统可以解决的同类问题。

The neural networks that are best at this task are called convolutional neural networks (Sometimes we say convnet or CNN instead.) Convolution is the mathematical operation that gives the layers of a convnet their unique structure. In future lessons, you'll learn why this structure is so effective at solving computer vision problems.

最擅长此任务的神经网络称为卷积神经网络（有时我们会说Convnet或CNN。）卷积是一种数学运算，它赋予了卷积网络各层独特的特性结构。在以后的课程中，您将了解为什么这种结构在解决计算机视觉问题方面如此有效。

We will apply these ideas to the problem of image classification: given a picture, can we train a computer to tell us what it's a picture of? You may have seen apps that can identify a species of plant from a photograph. That's an image classifier! In this course, you'll learn how to build image classifiers just as powerful as those used in professional applications.

我们将把这些想法应用到图像分类问题：给定一张图片，我们可以训练计算机告诉我们它的图片是什么？您可能见过可以从照片中识别植物种类的应用程序。这就是图像分类器！在本课程中，您将学习如何构建与专业应用程序中使用的图像分类器一样强大的图像分类器。

While our focus will be on image classification, what you'll learn in this course is relevant to every kind of computer vision problem. At the end, you'll be ready to move on to more advanced applications like generative adversarial networks and image segmentation.

虽然我们的重点是图像分类，但您将在本课程中学到的内容与各种计算机视觉问题相关。最后，您将准备好继续学习更高级的应用程序，例如生成对抗网络和图像分割。

The Convolutional Classifier

卷积分类器

A convnet used for image classification consists of two parts: a convolutional base and a dense head.

用于图像分类的卷积网络由两部分组成：卷积基和密集头。

The parts of a convnet: image, base, head, class; input, extract, classify, output.

The base is used to extract the features from an image. It is formed primarily of layers performing the convolution operation, but often includes other kinds of layers as well. (You'll learn about these in the next lesson.)

基础用于从图像中提取特征。它主要由执行卷积运算的层组成，但通常也包括其他类型的层。（您将在下一课中了解这些内容。）

The head is used to determine the class of the image. It is formed primarily of dense layers, but might include other layers like dropout.

头部用于确定图像的类别。它主要由密集层组成，但可能包括其他层，例如 dropout。

What do we mean by visual feature? A feature could be a line, a color, a texture, a shape, a pattern -- or some complicated combination.

视觉特征是什么意思？特征可以是线条、颜色、纹理、形状、图案——或者一些复杂的组合。

The whole process goes something like this:

整个过程是这样的：

The idea of feature extraction.

The features actually extracted look a bit different, but it gives the idea.

实际提取的特征看起来有点不同，但它给出了示意。

Training the Classifier

训练分类器

The goal of the network during training is to learn two things:

训练期间网络的目标是学习两件事：

which features to extract from an image (base),
从图像（基础）中提取哪些特征，
which class goes with what features (head).
哪个类具有什么特征（头）。

These days, convnets are rarely trained from scratch. More often, we reuse the base of a pretrained model. To the pretrained base we then attach an untrained head. In other words, we reuse the part of a network that has already learned to do 1. Extract features, and attach to it some fresh layers to learn 2. Classify.

如今，卷积网络很少从头开始训练。更常见的是，我们重用预训练模型的基础。然后，我们在预训练的基础上附加一个未经训练的头部。换句话说，我们重用了网络中已经学会做 1. 提取特征，并附加一些新的层来学习2. 分类。

Attaching a new head to a trained base.

Because the head usually consists of only a few dense layers, very accurate classifiers can be created from relatively little data.

由于头部通常仅由几个致密层组成，因此可以从相对较少的数据创建非常准确的分类器。

Reusing a pretrained model is a technique known as transfer learning. It is so effective, that almost every image classifier these days will make use of it.

重用预训练模型是一种称为迁移学习的技术。它非常有效，以至于现在几乎每个图像分类器都会使用它。

Example - Train a Convnet Classifier

示例 - 训练 Convnet 分类器

Throughout this course, we're going to be creating classifiers that attempt to solve the following problem: is this a picture of a Car or of a Truck? Our dataset is about 10,000 pictures of various automobiles, around half cars and half trucks.

在本课程中，我们将创建分类器来尝试解决以下问题：这是汽车还是卡车的图片？我们的数据集大约有 10,000 张各种汽车的图片，大约一半是汽车，一半是卡车。

Step 1 - Load Data

步骤 1 - 加载数据

This next hidden cell will import some libraries and set up our data pipeline. We have a training split called ds_train and a validation split called ds_valid.

下一个隐藏单元将导入一些库并设置我们的数据管道。我们有一个名为ds_train的训练分割和一个名为ds_valid的验证分割。

# Imports
import os, warnings
import matplotlib.pyplot as plt
from matplotlib import gridspec

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory

# Reproducability
def set_seed(seed=55555):
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
set_seed(55555)

# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells

# Load training and validation sets
ds_train_ = image_dataset_from_directory(
#     '../00 datasets/ryanholbrook/car-or-truck/train',
    '../input/car-or-truck/train',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)
ds_valid_ = image_dataset_from_directory(
#     '../00 datasets/ryanholbrook/car-or-truck/valid',
    '../input/car-or-truck/valid',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=False,
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
    ds_train_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
    ds_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

2024-05-28 08:24:20.481106: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-28 08:24:20.481277: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-28 08:24:20.610759: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Found 5117 files belonging to 2 classes.
Found 5051 files belonging to 2 classes.

Let's take a look at a few examples from the training set.

让我们看一下训练集中的一些示例。

import matplotlib.pyplot as plt

# 取一个batch
# sample_data = ds_train.take(1)

plt.figure(figsize=(4,20))

# 使用这个方法后可以正常训练
for images, labels in ds_train:
    # 'images' is a batch of images
    # 'labels' is a batch of corresponding labels
    # You can access individual images in 'images' batch using indexing, e.g., images[0]
    # You can process the images here or save them to disk
    # For example, to save the first image in each batch:
    for i in range(5):
        image = images[i].numpy()  # Convert TensorFlow tensor to NumPy array
        label = labels[i].numpy()  # Convert TensorFlow tensor to NumPy array
#         image_file_name = f'image_{label}_{i}.jpg'  # Adjust file naming as needed
#         tf.keras.preprocessing.image.save_img(image_file_name, image) 
        plt.subplot(5, 1, i+1)
        plt.title(label[0])
        plt.imshow(image)

# for key, i in enumerate(list(ds_train_.as_numpy_iterator())[0][0][:5]):
#     plt.subplot(5, 1, key+1)
#     plt.imshow(i)

png

Step 2 - Define Pretrained Base

步骤 2 - 定义预训练基础

The most commonly used dataset for pretraining is ImageNet, a large dataset of many kind of natural images. Keras includes a variety models pretrained on ImageNet in its applications module. The pretrained model we'll use is called VGG16.

最常用的预训练数据集是 ImageNet，这是一个包含多种自然图像的大型数据集。 Keras 在其 applications 模块中包含了 ImageNet 上预训练的各种模型。我们将使用的预训练模型称为 VGG16。

pretrained_base = tf.keras.models.load_model(
#     '../00 datasets/ryanholbrook/cv-course-models/cv-course-models/vgg16-pretrained-base',
    '../input/cv-course-models/cv-course-models/vgg16-pretrained-base',
)
pretrained_base.trainable = False

Step 3 - Attach Head

步骤 3 - 连接头部

Next, we attach the classifier head. For this example, we'll use a layer of hidden units (the first Dense layer) followed by a layer to transform the outputs to a probability score for class 1, Truck. The Flatten layer transforms the two dimensional outputs of the base into the one dimensional inputs needed by the head.

接下来，我们附加分类器头。对于此示例，我们将使用隐藏单元层（第一个Dense层），然后使用一个层将输出转换为第 1 类Truck的概率分数。 Flatten层将基础的二维输出转换为头部所需的一维输入。

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    pretrained_base,
    layers.Flatten(),
    layers.Dropout(rate=0.3),
    layers.Dense(6, activation='relu'),
    layers.Dropout(rate=0.3),
    layers.Dense(1, activation='sigmoid'),
])

Step 4 - Train

步骤 4 - 训练

Finally, let's train the model. Since this is a two-class problem, we'll use the binary versions of crossentropy and accuracy. The adam optimizer generally performs well, so we'll choose it as well.

最后，让我们训练模型。由于这是一个二类问题，我们将使用交叉熵和准确性的二进制版本。 adam优化器通常表现良好，因此我们也会选择它。

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['binary_accuracy'],
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=50,
    verbose=False,
)

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1716884706.586514     101 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.

When training a neural network, it's always a good idea to examine the loss and metric plots. The history object contains this information in a dictionary history.history. We can use Pandas to convert this dictionary to a dataframe and plot it with a built-in method.

训练神经网络时，检查损失图和度量图始终是个好主意。 history对象在字典history.history中包含此信息。我们可以使用 Pandas 将此字典转换为数据框，并使用内置方法将其绘制出来。

import pandas as pd

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();

png

Conclusion

结论

In this lesson, we learned about the structure of a convnet classifier: a head to act as a classifier atop of a base which performs the feature extraction.

在本课中，我们了解了卷积网络分类器的结构：标头充当执行特征提取的基础之上的分类器。

The head, essentially, is an ordinary classifier like you learned about in the introductory course. For features, it uses those features extracted by the base. This is the basic idea behind convolutional classifiers: that we can attach a unit that performs feature engineering to the classifier itself.

本质上，标头是一个普通的分类器，就像您在入门课程中学到的那样。对于特征，它使用基础提取的那些特征。这是卷积分类器背后的基本思想：我们可以将执行特征工程的单元附加到分类器本身。

This is one of the big advantages deep neural networks have over traditional machine learning models: given the right network structure, the deep neural net can learn how to engineer the features it needs to solve its problem.

这是深度神经网络相对于传统机器学习模型的一大优势：给定正确的网络结构，深度神经网络可以学习如何设计解决问题所需的特征。

For the next few lessons, we'll take a look at how the convolutional base accomplishes the feature extraction. Then, you'll learn how to apply these ideas and design some classifiers of your own.

在接下来的几节课中，我们将了解卷积基如何完成特征提取。然后，您将学习如何应用这些想法并设计一些您自己的分类器。

Your Turn

到你了

For now, move on to the Exercise and build your own image classifier!

现在，继续进行练习并构建您自己的图像分类器！

01.course-the-convolutional-classifier【卷积分类器】