deep learning笔记:使网络能够更深——ResNet简介与pytorch实现

之前我用pytorch把ResNet18实现了一下,但由于上周准备国家奖学金答辩没有时间来写我实现的过程与总结。今天是祖国70周年华诞,借着这股喜庆劲,把这篇文章补上。

References

电子文献:
https://blog.csdn.net/weixin_43624538/article/details/85049699
https://blog.csdn.net/u013289254/article/details/98785869

参考文献:
[1]Deep Residual Learning for Image Recognition


简介

ResNet残差网络是由何恺明等四名微软研究院的华人提出的,当初看到论文标题下面的中国名字还是挺高兴的。文章引入部分,作者就探讨了深度神经网络的优化是否就只是叠加层数、增加深度那么简单。显然这是不可能的,增加深度带来的首要问题就是梯度爆炸、消散的问题,这是由于随着层数的增多,在网络中反向传播的梯度会随着连乘变得不稳定,从而变得特别大或者特别小。其中以梯度消散更为常见。值得注意的是,论文中还提到深度更深的网络反而出现准确率下降并不是由过拟合所引起的。
为了解决这个问题,研究者们做出了很多思考与尝试,其中的代表有relu激活函数的使用,Batch Normalization的使用等。关于这两种方法,可以参考网上的资料以及我的博文deep-learning笔记:开启深度学习热潮——AlexNetdeep-learning笔记:学习率衰减与批归一化
对于上面这个问题,ResNet作出的贡献是引入skip/identity connection。如下所示就是两个基本的残差模块。

上面这个block可表示为:$ F(X)=H(X)-x $。在这里,X为浅层输出,H(x)为深层的输出。当浅层的X代表的特征已经足够成熟,即当任何对于特征X的改变都会让loss变大时,F(X)会自动趋向于学习成为0,X则从恒等映射的路径继续传递。
这样,我们就可以在不增加计算成本的情况下使得在前向传递过程中,如果浅层的输出已经足够成熟(optimal),那么就让深层网络后面的层仅实现恒等映射的作用。
当X与F(X)通道数目不同时,作者尝试了两种identity mapping的方式。一种即对X缺失的通道直接补零从而使其能够对齐,这种方式比较简单直接,无需额外的参数;另一种则是通过使用1x1的conv来映射从而使通道也能达成一致。


论文

老规矩,这里还是先呈上我用黄色荧光高亮出我认为比较重要的要点的论文原文,这里我只有英文版
如果需要没有被我标注过的原文,可以直接搜索,这里我仅提供一次,可以点击这里下载。
不过,虽然没有pdf中文版,但其实深度学习CV方向一些比较经典的论文的英文、中文、中英对照都可以到Deep Learning Papers Translation上看到,非常方便。


自己实现

在论文中,作者提到了如下几个ResNet的版本的结构。

这里我实现的是ResNet18。
由于这不是我第一次使用pytorch进行实现,一些基本的使用操作我就不加注释了,想看注释来理解的话可以参考我之前VGG的实现。
由于残差的引入,导致ResNet的结构比较复杂,而论文中并没有非常详细的阐述,在研究官方源码之后,我对它的结构才有了完整的了解,这里我画出来以便参考。

注:此模块在2016年何大神的论文中给出了新的改进,可以参考我的博文deep-learning笔记:记首次ResNet实战

ResNet18的每一layer包括了两个这样的basic block,其中1x1的卷积核仅在X与F(X)通道数目不一致时进行操作,在我的代码中,我定义shortcut函数来对应一切通道一致、无需处理的情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
import torch
import torch.nn as nn
import torch.nn.functional as F

class ResNet(nn.Module):
def __init__(self):
super(ResNet, self).__init__()

self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 64, kernel_size = 7, stride = 2, padding = 3, bias = False)
self.max = nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)

self.bn1 = nn.BatchNorm2d(64)
self.bn2 = nn.BatchNorm2d(64)
self.bn3 = nn.BatchNorm2d(128)
self.bn4 = nn.BatchNorm2d(256)
self.bn5 = nn.BatchNorm2d(512)

self.shortcut = nn.Sequential()
self.shortcut3 = nn.Sequential(nn.Conv2d(64, 128, kernel_size = 1, stride = 2, bias = False), nn.BatchNorm2d(128))
self.shortcut4 = nn.Sequential(nn.Conv2d(128, 256, kernel_size = 1, stride = 2, bias = False), nn.BatchNorm2d(256))
self.shortcut5 = nn.Sequential(nn.Conv2d(256, 512, kernel_size = 1, stride = 2, bias = False), nn.BatchNorm2d(512))

self.conv2 = nn.Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1, bias = False)

self.conv3_1 = nn.Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 2, padding = 1, bias = False)
self.conv3_2 = nn.Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1, bias = False)

self.conv4_1 = nn.Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 2, padding = 1, bias = False)
self.conv4_2 = nn.Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1, bias = False)

self.conv5_1 = nn.Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 2, padding = 1, bias = False)
self.conv5_2 = nn.Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1, bias = False)

self.avg = nn.AdaptiveAvgPool2d((1, 1))
#adaptive自适应,只给定输入和输出大小,让机器自行调整选择核尺寸和步长大小

self.fc = nn.Linear(512, 1000)

def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x1 = self.max(x)

#layer1
x = F.relu(self.bn2(self.conv2(x1)))
x = self.bn2(self.conv2(x))
x += self.shortcut(x1) #pytorch0.4.0之后这里要改为x = x + self.shortcut(x1)
x2 = F.relu(x)

x = F.relu(self.bn2(self.conv2(x2)))
x = self.bn2(self.conv2(x))
x += self.shortcut(x2)
x3 = F.relu(x)

#layer2
x = F.relu(self.bn3(self.conv3_1(x3)))
x = self.bn3(self.conv3_2(x))
x += self.shortcut3(x3)
x4 = F.relu(x)

x = F.relu(self.bn3(self.conv3_2(x4)))
x = self.bn3(self.conv3_2(x))
x += self.shortcut(x4)
x5 = F.relu(x)

#layer3
x = F.relu(self.bn4(self.conv4_1(x5)))
x = self.bn4(self.conv4_2(x))
x += self.shortcut4(x5)
x6 = F.relu(x)

x = F.relu(self.bn4(self.conv4_2(x6)))
x = self.bn4(self.conv4_2(x))
x += self.shortcut(x6)
x7 = F.relu(x)

#layer4
x = F.relu(self.bn5(self.conv5_1(x7)))
x = self.bn5(self.conv5_2(x))
x += self.shortcut5(x7)
x8 = F.relu(x)

x = F.relu(self.bn5(self.conv5_2(x8)))
x = self.bn5(self.conv5_2(x))
x += self.shortcut(x8)
x = F.relu(x)

#ending
x = self.avg(x)

#变换维度,可以设其中一个尺寸为-1,表示机器内部自己计算,但同时只能有一个为-1
x = x.view(-1, self.num_flat_features(x))
x = self.fc(x)

x = F.softmax(x, dim = 1)

return x

def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features

net = ResNet()

同样的,我们可以随机生成一个张量来进行验证:

1
2
3
input = torch.randn(1, 3, 48, 48)
out = net(input)
print(out)

如果可以顺利地输出,那么模型基本上是没有问题的。


出现的问题

在这里我还是想把自己踩的一些简单的坑记下来,引以为戒。

  1. softmax输出全为1

    当我使用F.softmax之后,出现了这样的一个问题:

    查找资料后发现,我错误的把对每一行softmax当作了对每一列softmax。因为这个softmax语句是我从之前的自己做的一道kaggle题目写的代码中ctrl+C+V过来的,复制过来的是x = F.softmax(x, dim = 0),在这里,dim = 0意味着我对张量的每一列进行softmax,这是因为我之前的场景中需要处理的张量是一维的,也就是tensor()里面只有一对“[]”,此时它默认只有一列,我对列进行softmax自然就没有问题。
    而放到这里,我再对列进行softmax时,每列上就只有一个元素。那么结果就都是1即100%了。解决的方法就是把dim设为1。
    下面我在用一组代码直观地展示一下softmax的用法与区别。

    1
    2
    3
    4
    5
    6
    7
    import torch
    import torch.nn.functional as F
    x1= torch.Tensor( [ [1, 2, 3, 4], [1, 3, 4, 5], [3, 4, 5, 6]])
    y11= F.softmax(x1, dim = 0) #对每一列进行softmax
    y12 = F.softmax(x1, dim = 1) #对每一行进行softmax
    x2 = torch.Tensor([1, 2, 3, 4])
    y2 = F.softmax(x2, dim = 0)

    我们输出每个结果,可以看到:

  2. bias

    或许你可以发现,在我的代码中,每个卷积层中都设置了bias = False,这是我在参考官方源码之后补上的。那么,这个bias是什么,又有什么用呢?
    我们在学深度学习的时候,最早接触到的神经网络应该是感知器,它的结构如下图所示。 要想激活这个感知器,就必须使x1 * w1 + x2 * w2 + ... + xn * wn > T(T为一个阈值),而T越大,想激活这个感知器的难度越大。
    考虑样本较多的情况,我不可能手动选择一个阈值,使得模型整体表现最佳,因此我们不如使得T变成可学习的,这样一来,T会自动学习到一个数,使得模型的整体表现最佳。当把T移动到左边,它就成了bias偏置,x1 * w1 + x2 * w2 + ... + xn * wn - T > 0。显然,偏置的大小控制着激活这个感知器的难易程度。
    在比感知器高级的神经网络中,也是如此。
    但倘若我们要在卷积后面加上归一化操作,那么bias的作用就无法体现了。
    我们以ResNet卷积层后的BN层为例。
    可参考我的上一篇博文,BN处理过程中有这样一步: 对于分子而言,无论有没有bias,对结果都没有影响;而对于下面分母而言,因为是方差操作,所以也没有影响。因此,在ResNet中,因为每次卷积之后都要进行BN操作,那就不需要启用bias,否则非但不起作用,还会消耗一定的显卡内存。

官方源码

如果你此时对ResNet的结构已经有了比较清晰的理解,那么可以尝试着来理解一下官方源码的思路。其实我觉得先看像我这样直观的代码实现再看官方源码更有助理解且更高效。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
import torch
import torch.nn as nn
from .utils import load_state_dict_from_url


__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
'resnet152', 'resnext50_32x4d', 'resnext101_32x8d',
'wide_resnet50_2', 'wide_resnet101_2']


model_urls = {
'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}


def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=dilation, groups=groups, bias=False, dilation=dilation)


def conv1x1(in_planes, out_planes, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
expansion = 1
__constants__ = ['downsample']

def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None):
super(BasicBlock, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if groups != 1 or base_width != 64:
raise ValueError('BasicBlock only supports groups=1 and base_width=64')
if dilation > 1:
raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
# Both self.conv1 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)
self.downsample = downsample
self.stride = stride

def forward(self, x):
identity = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)

if self.downsample is not None:
identity = self.downsample(x)

out += identity
out = self.relu(out)

return out


class Bottleneck(nn.Module):
expansion = 4

def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
base_width=64, dilation=1, norm_layer=None):
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
width = int(planes * (base_width / 64.)) * groups
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, groups, dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes * self.expansion)
self.bn3 = norm_layer(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride

def forward(self, x):
identity = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)

out = self.conv3(out)
out = self.bn3(out)

if self.downsample is not None:
identity = self.downsample(x)

out += identity
out = self.relu(out)

return out


class ResNet(nn.Module):

def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
groups=1, width_per_group=64, replace_stride_with_dilation=None,
norm_layer=None):
super(ResNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer

self.inplanes = 64
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False]
if len(replace_stride_with_dilation) != 3:
raise ValueError("replace_stride_with_dilation should be None "
"or a 3-element tuple, got {}".format(replace_stride_with_dilation))
self.groups = groups
self.base_width = width_per_group
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
dilate=replace_stride_with_dilation[2])
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)

for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)

# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros, and each residual block behaves like an identity.
# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
if zero_init_residual:
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0)
elif isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0)

def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
conv1x1(self.inplanes, planes * block.expansion, stride),
norm_layer(planes * block.expansion),
)

layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
self.base_width, previous_dilation, norm_layer))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, groups=self.groups,
base_width=self.base_width, dilation=self.dilation,
norm_layer=norm_layer))

return nn.Sequential(*layers)

def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)

return x


def _resnet(arch, block, layers, pretrained, progress, **kwargs):
model = ResNet(block, layers, **kwargs)
if pretrained:
state_dict = load_state_dict_from_url(model_urls[arch],
progress=progress)
model.load_state_dict(state_dict)
return model


def resnet18(pretrained=False, progress=True, **kwargs):
r"""ResNet-18 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress,
**kwargs)


def resnet34(pretrained=False, progress=True, **kwargs):
r"""ResNet-34 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress,
**kwargs)


def resnet50(pretrained=False, progress=True, **kwargs):
r"""ResNet-50 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress,
**kwargs)


def resnet101(pretrained=False, progress=True, **kwargs):
r"""ResNet-101 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress,
**kwargs)


def resnet152(pretrained=False, progress=True, **kwargs):
r"""ResNet-152 model from
`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, progress,
**kwargs)


def resnext50_32x4d(pretrained=False, progress=True, **kwargs):
r"""ResNeXt-50 32x4d model from
`"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
kwargs['groups'] = 32
kwargs['width_per_group'] = 4
return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3],
pretrained, progress, **kwargs)


def resnext101_32x8d(pretrained=False, progress=True, **kwargs):
r"""ResNeXt-101 32x8d model from
`"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
kwargs['groups'] = 32
kwargs['width_per_group'] = 8
return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],
pretrained, progress, **kwargs)


def wide_resnet50_2(pretrained=False, progress=True, **kwargs):
r"""Wide ResNet-50-2 model from
`"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_

The model is the same as ResNet except for the bottleneck number of channels
which is twice larger in every block. The number of channels in outer 1x1
convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
kwargs['width_per_group'] = 64 * 2
return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3],
pretrained, progress, **kwargs)


def wide_resnet101_2(pretrained=False, progress=True, **kwargs):
r"""Wide ResNet-101-2 model from
`"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_

The model is the same as ResNet except for the bottleneck number of channels
which is twice larger in every block. The number of channels in outer 1x1
convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
kwargs['width_per_group'] = 64 * 2
return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3],
pretrained, progress, **kwargs)


pth文件

在阅读官方源码时,我们会注意到官方提供了一系列版本的model_urls,其中,每一个url都是以.pth结尾的。
当我下载了对应的文件之后,并不知道如何处理,于是我通过搜索,简单的了解了pth文件的概念与使用方法。
简单来说,pth文件就是一个表示Python的模块搜索路径(module search path)的文本文件,在xxx.pth文件里面,会书写一些路径,一行一个。如果我们将xxx.pth文件放在特定位置,则可以让python在加载模块时,读取xxx.pth中指定的路径。
下面我使用pytorch对pth文件进行加载操作。
首先,我把ResNet18对应的pth文件下载到桌面。

1
2
3
4
5
6
7
8
9
10
11
import torch

import torchvision.models as models

# pretrained = True就可以使用预训练的模型
net = models.resnet18(pretrained = False)
#注意,根据model的不同,这里models.xxx的内容也是不同的,比如models.squeezenet1_1

pthfile = r'C:\Users\sheny\Desktop\resnet18-5c106cde.pth'#pth文件所在路径
net.load_state_dict(torch.load(pthfile))
print(net)

输出结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=1000, bias=True)
)

这样你就可以看到很详尽的参数设置了。
我们还可以加载所有的参数。

1
2
3
4
5
6
7
import torch

pthfile = r'C:\Users\sheny\Desktop\resnet18-5c106cde.pth'

net = torch.load(pthfile)

print(net)

输出如下:

1
2
3
4
5
6
7
8
OrderedDict([('conv1.weight', Parameter containing:
tensor([[[[-1.0419e-02, -6.1356e-03, -1.8098e-03, ..., 5.6615e-02,
1.7083e-02, -1.2694e-02],
[ 1.1083e-02, 9.5276e-03, -1.0993e-01, ..., -2.7124e-01,
-1.2907e-01, 3.7424e-03],
[-6.9434e-03, 5.9089e-02, 2.9548e-01, ..., 5.1972e-01,
2.5632e-01, 6.3573e-02],
...,


碰到底线咯 后面没有啦

本文标题:deep learning笔记:使网络能够更深——ResNet简介与pytorch实现

文章作者:高深远

发布时间:2019年10月01日 - 18:42

最后更新:2020年02月15日 - 22:09

原始链接:https://gsy00517.github.io/deep-learning20191001184216/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%