393 lines
11 KiB
Text
393 lines
11 KiB
Text
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# 实验3-3 LeNet模型定义和训练\n",
|
|||
|
"\n",
|
|||
|
"实验目标:\n",
|
|||
|
"\n",
|
|||
|
"* 初步掌握模型构建和训练\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"### 1. 定义网络\n",
|
|||
|
"\n",
|
|||
|
"首先定义一个LeNet网络:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LeNet(\n",
|
|||
|
" (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
|
|||
|
" (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
|
|||
|
" (fc1): Linear(in_features=400, out_features=120, bias=True)\n",
|
|||
|
" (fc2): Linear(in_features=120, out_features=84, bias=True)\n",
|
|||
|
" (fc3): Linear(in_features=84, out_features=10, bias=True)\n",
|
|||
|
")\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import torch\n",
|
|||
|
"import torch.nn as nn # 类\n",
|
|||
|
"import torch.nn.functional as F # 函数\n",
|
|||
|
"\n",
|
|||
|
"class LeNet(nn.Module):\n",
|
|||
|
" def __init__(self):\n",
|
|||
|
" super().__init__()\n",
|
|||
|
" # 1 input image channel, 6 output channels, 5x5 square convolution\n",
|
|||
|
" # kernel\n",
|
|||
|
" self.conv1 = nn.Conv2d(1, 6, 5) # 通道: 1 => 6, kernel size: 5\n",
|
|||
|
" self.conv2 = nn.Conv2d(6, 16, 5)\n",
|
|||
|
" # an affine operation: y = Wx + b\n",
|
|||
|
" self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension \n",
|
|||
|
" self.fc2 = nn.Linear(120, 84)\n",
|
|||
|
" self.fc3 = nn.Linear(84, 10)\n",
|
|||
|
"\n",
|
|||
|
" def forward(self, x):\n",
|
|||
|
" ''' 定义网络前馈的过程,而其反向推导则基于此自动进行\n",
|
|||
|
" '''\n",
|
|||
|
" # Max pooling over a (2, 2) window\n",
|
|||
|
" x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
|
|||
|
" # If the size is a square, you can specify with a single number\n",
|
|||
|
" x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
|
|||
|
" x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension\n",
|
|||
|
" x = F.relu(self.fc1(x))\n",
|
|||
|
" x = F.relu(self.fc2(x))\n",
|
|||
|
" x = self.fc3(x)\n",
|
|||
|
" return x\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"net = LeNet()\n",
|
|||
|
"print(net)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"> 请回答:\n",
|
|||
|
"> 1. 前文代码块中`nn.Conv2d`和`nn.Linear`分别是什么模块?\n",
|
|||
|
"> 2. `nn.Conv2d`和`nn.Linear`在初始化时,其构造函数的参数是怎样的?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"ename": "SyntaxError",
|
|||
|
"evalue": "invalid syntax (1425913863.py, line 1)",
|
|||
|
"output_type": "error",
|
|||
|
"traceback": [
|
|||
|
"\u001b[0;36m Cell \u001b[0;32mIn[7], line 1\u001b[0;36m\u001b[0m\n\u001b[0;31m 1. 卷积层和线性层\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"1. 卷积层和线性层\n",
|
|||
|
"2.分别为:输入通道数量 输出通道数量 卷积核大小\n",
|
|||
|
" 输入维度 输出维度"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"您只需要重新定义`nn.Module`类的`forward`成员函数(即前向传播),而无需定义`backward`成员函数(即反向传播)。在`forward`函数中,可以自由地使用任何张量运算。\n",
|
|||
|
"\n",
|
|||
|
"这是因为,`autograd`计算图机制将自动化的定义`backward`成员函数。\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"### 2. 获取权重\n",
|
|||
|
"\n",
|
|||
|
"在定义好神经网络(本例中即为`LeNet`)后,通过 `net.parameters()`成员函数可获取该网络中可学习的参数(又称权重)。\n",
|
|||
|
"\n",
|
|||
|
"所获取的权重可以交给PyTorch利用梯度计算机制统一更新。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"10\n",
|
|||
|
"torch.Size([6, 1, 5, 5])\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"params = list(net.parameters())\n",
|
|||
|
"print(len(params))\n",
|
|||
|
"print(params[0].size()) # conv1's .weight\n",
|
|||
|
"#print(params)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 3. 测试输入输出\n",
|
|||
|
"\n",
|
|||
|
"让我们尝试一个随机的 32x32 输入。\n",
|
|||
|
"\n",
|
|||
|
"注意:此网络`LeNet`的预期输入大小为 32x32。要将此网络用于MNIST 数据集,请将数据集中的图像大小调整为 32x32。\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"tensor([[ 0.0719, 0.0145, -0.1102, -0.0903, -0.0850, -0.0460, 0.0280, -0.0440,\n",
|
|||
|
" -0.0256, 0.0834]], grad_fn=<AddmmBackward0>)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"input = torch.randn(1, 1, 32, 32)\n",
|
|||
|
"out = net(input)\n",
|
|||
|
"print(out)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"> 请回答:\n",
|
|||
|
"> 1. 请在代码中测试,若输入不是(1, 1, 32, 32)尺寸的张量,会导致什么效果?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"将所有参数的梯度缓存清零,并用伪真值(随机值)计算损失并反向传播(BP)。\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"tensor(0.8519, grad_fn=<MseLossBackward0>)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"output = net(input)\n",
|
|||
|
"target = torch.randn(10) # a dummy target, for example\n",
|
|||
|
"target = target.view(1, -1) # make it the same shape as output\n",
|
|||
|
"criterion = nn.MSELoss()\n",
|
|||
|
"\n",
|
|||
|
"loss = criterion(output, target)\n",
|
|||
|
"print(loss)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"至此,从`input`到`loss`计算过程,可以用如下计算图(computational graph)来表示:\n",
|
|||
|
"\n",
|
|||
|
" input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n",
|
|||
|
" -> flatten -> linear -> relu -> linear -> relu -> linear\n",
|
|||
|
" -> MSELoss\n",
|
|||
|
" -> loss\n",
|
|||
|
"\n",
|
|||
|
"所以,当调用`loss.backward()`时, 整个计算图是可以求解导数的,即该网络中所有具有`requires_grad=True`属性的张量皆会计算其梯度,并保存于`.grad`属性之中。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 4. 反向推导Backprop\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"要计算反向传播的梯度,我们所要做的就是调用`loss.backward()`成员函数。\n",
|
|||
|
"\n",
|
|||
|
"但请注意,需要预先清除已有的权重梯度值,否则权重的梯度值将是若干次梯度反向传播的累积值。\n",
|
|||
|
"\n",
|
|||
|
"现在我们将调用`loss.backward()`函数,并观察conv1的梯度的前后变化\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"conv1.bias.grad before backward\n",
|
|||
|
"None\n",
|
|||
|
"conv1.bias.grad after backward\n",
|
|||
|
"tensor([ 0.0115, 0.0092, 0.0135, 0.0001, -0.0111, 0.0149])\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"net.zero_grad() # zeroes the gradient buffers of all parameters\n",
|
|||
|
"\n",
|
|||
|
"print('conv1.bias.grad before backward')\n",
|
|||
|
"print(net.conv1.bias.grad)\n",
|
|||
|
"\n",
|
|||
|
"loss.backward()\n",
|
|||
|
"\n",
|
|||
|
"print('conv1.bias.grad after backward')\n",
|
|||
|
"print(net.conv1.bias.grad)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"> 请回答:\n",
|
|||
|
"> 1. 说明前文代码块中,conv1的梯度在`loss.backward()`执行前后变化。以及为什么会这样?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"执行之后产生梯度值. 执行损失计算后梯度得到了更新"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 5. 更新权重\n",
|
|||
|
"\n",
|
|||
|
"实际中最简单的权重优化原则,即为随机梯度下降Stochastic Gradient Descent (SGD)\n",
|
|||
|
"\n",
|
|||
|
" ``weight = weight - learning_rate * gradient``\n",
|
|||
|
"\n",
|
|||
|
"以下Python代码来实现SGD功能:\n",
|
|||
|
"\n",
|
|||
|
"```python\n",
|
|||
|
" learning_rate = 0.01\n",
|
|||
|
" for f in net.parameters():\n",
|
|||
|
" f.data.sub_(f.grad.data * learning_rate)\n",
|
|||
|
"```\n",
|
|||
|
"\n",
|
|||
|
"但是,当使用神经网络时,通常希望使用各种不同的权重优化规则,如SGD,Nesterov-SGD,Adam,RMSProp等。因此,前文所述代码在实际中并不常用。\n",
|
|||
|
"\n",
|
|||
|
"为了实现这一点,需要用到PyTorch的包:`torch.optim`,其中实现所有这些方法。使用非常简单。\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import torch.optim as optim\n",
|
|||
|
"\n",
|
|||
|
"# create your optimizer\n",
|
|||
|
"optimizer = optim.SGD(net.parameters(), lr=0.01)\n",
|
|||
|
"\n",
|
|||
|
"for i in range(10):\n",
|
|||
|
" # input = ...\n",
|
|||
|
" # in your training loop:\n",
|
|||
|
" optimizer.zero_grad() # zero the gradient buffers\n",
|
|||
|
" output = net(input)\n",
|
|||
|
" loss = criterion(output, target)\n",
|
|||
|
" loss.backward()\n",
|
|||
|
" optimizer.step() # Does the update"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"> 请回答:\n",
|
|||
|
"> 1. 请结合实验结果,解释前文代码块中`for`循环内每行代码的功能。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"attachments": {},
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"分别为:\n",
|
|||
|
"清空梯度数据\n",
|
|||
|
"数据输入网络得到输出\n",
|
|||
|
"计算损失值\n",
|
|||
|
"反向传播\n",
|
|||
|
"更新参数"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 6. 完整训练LeNet网络(可选)\n",
|
|||
|
"\n",
|
|||
|
"参考 https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html\n",
|
|||
|
" \n",
|
|||
|
"> 在此执行一次完整的LeNet训练。给出实验代码和效果。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": []
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": ".venv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.8.16"
|
|||
|
},
|
|||
|
"vscode": {
|
|||
|
"interpreter": {
|
|||
|
"hash": "0733c54d9044ea299f7b7f48049f3576c8ad4e6ff5a97e2c60d8a9e3bff0bc54"
|
|||
|
}
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 1
|
|||
|
}
|