Lightweight Deep Learning Model

이민예 2019-08-13

Prior Research

NO.	PAPER
1	Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1.
2	Mobilenets: Efficient convolutional neural networks for mobile vision applications.
3	Finn: A framework for fast, scalable binarized neural network inference.
4	MobileNetV2: Inverted Residuals and Linear Bottlenecks
5	XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.
6	Model compression via distillation and quantization.
7	Amc: Automl for model compression and acceleration on mobile devices.
8	SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size.
9	BNN+: Improved binary network training.
10	Squeezenext: Hardware-aware neural network design.
11	Loss-aware binarization of deep networks.
12	Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.
13	Loss-aware weight quantization of deep networks.
14	Scalpel: Customizing dnn pruning to the underlying hardware parallelism.
15	Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.
16	ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.
17	Lq-nets: Learned quantization for highly accurate and compact deep neural networks.
18	Alternating Multi-bit Quantization for Recurrent Neural Networks.
19	Densely Connected Convolutional Networks.
20	Deeptwist: Learning model compression via occasional weight distortion.
21	All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification.
22	Analysis of Quantized Models.
23	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
24	Learning to quantize deep networks by optimizing quantization intervals with task loss.
25	+edge computing..

Benchmark Analysis Deep Neural Networks

Benchmark Analysis of Representative Deep Neural Network Architectures 은 약 43개의 딥러닝 모델들을 모델 forward 수행시간, 메모리 사용량, 파라미터 총수, 정확도 등 다양한 요소에 대해 비교 분석한 리서치로, 해당 어플리케이션에 탑재가능한 효율적이고 적합한 모델을 선택하는데 인사이트를 제공한다.

정확도 vs 계산 복잡도 vs 모델 복잡도

정확도 : 1000개의 카테고리로 이미지를 분류하는 ImageNet-1k validation set을 이용하여 TOP1, TOP5 정확도를 측정함. TOP1 정확도는 알고리즘이 가장 높은 확률로 측정한 결과가 정답일 확률이며, TOP5 정확도는 알고리즘이 가장 높은 확률로 선택한 5가지 카테고리 안에 정답이 있을 확률임.
계산 복잡도 : 합성곱 연산의 수 , FLOPs (floating-point operations, in # of multiply-adds)
모델 복잡도 : 총 파라미터의 수, 더 정확히 하자면, 파라미터가 담긴 파일 MB 단위의 사이즈

가장 높은 정확도와 가장 복잡한 계산 연산을 가지고 있는 모델은 Neural Architecture Serch Net (NASNET-A-Large) 임.
SE-ResNeXt50 (Squeeze-and-Excitation, 32x4d) 은 계산 복잡도가 5이내인 모델 중 가장 높은 성능과 약 2.76M 으로 낮은 모델 복잡도를 가지고 있음.
계산 복잡도와 정확도는 관련이 없음 (ex. SENET-154는 SEResNetXt-101에 비해 3배가 큰 계산 복잡도를 가지고 있음에도, 정확도에는 차이가 없음)
모델 복잡도와 정확도는 관련이 없음 (ex. VGG-13은 ResNet-18에 비해 훨씬 큰 모델 복잡도를 가지고 있음에도, 정확도에는 차이가 없음)
MobileNet이 SqueezeNet 보다 높은 정확도를 보임.

파라미터에 초기 할당된 사이즈 (Memory Complexity) 와 메모리 사용의 효율성 은 선형관계에 있음.
Memory Complexity 가 높으면, 메모리 사용의 효율성이 증가한다고 생각해도 무방함.

Model

Memory Complexity

MAX Memory Utilization

MobileNet V1

17MB

650MB

MobileNet V2

14MB

648MB

SqueezeNet V1.0

5MB

943MB

SqueezeNet V1.1

5MB

921MB

[ Reference ]

https://arxiv.org/pdf/1810.00736.pdf

SqueezeNet

경량화된 딥러닝 모델로,

모델의 파라미터의 수를 줄이고, (전략 1,2)
제한된 파라미터 수에서 성능을 극대화를 함. (전략 3)

SqueezeNet Strategy

[ Strategy#1 ] 1*1 차원의 필터
필터 3*3 을 1*1으로 변환 시키어, N을 필터의 수라 하면, 3*3*N 을 1*1*N 으로 줄여, 학습해야 할 매개변수를 9/1로 줄임.
[ Strategy#2 ] 적은 채널의 수
3*3 컨볼루션을 사용하면, 총 파라미터의 수는 (필터의 수) * (채널의 수) * (3*3) 이다. Strategy#1에서 3*3의 필터를 1*1으로 변경함으로서 파라미터 수를 줄였다면, 필터의 차원 뿐만 아니라, 채널의 수도 줄임으로서, 전체 파라미터를 줄인다.
When we use Fire modules we set s1x1 to be less than (e1x1 + e3x3), so the squeeze layer helps to limit the number of input channels to the 3x3 filters, as per Strategy 2 from Section 3.1.
[ Strategy#3 ] 지연된 다운샘플링
다운 샘플링을 네트워크 마지막에 배치함으로써, 큰 사이즈의 활성화 맵을 같게 한다. 큰 사이즈의 활성화 맵을 가질수록, 모델의 성능을 높일 수 있다.
관련 논문
- (He & Sun, 2015) 4개의 다른 CNN 구조에 지연된 다운 샘플링을 실험하여, 높은 분류 정확도 얻음.

컨볼루션 Activation Map 수식

O = (I-F+2P)/S +1

Fire Module이란 ?

Fire Module=Squeeze Layer + Expand Layer

SqueezeNet 의 다른 특징

ReLU 활성화 함수를 사용함

SqueezeNet 의 구현 (tf.keras)

# Modular function for Fire Node

def fire_module(x, fire_id, squeeze=16, expand=64):
    s_id = 'fire' + str(fire_id) + '/'

    if K.image_data_format() == 'channels_first':
        channel_axis = 1
    else:
        channel_axis = 3
    
    x = Convolution2D(squeeze, (1, 1), padding='valid', name=s_id + sq1x1)(x)
    x = Activation('relu', name=s_id + relu + sq1x1)(x)

    left = Convolution2D(expand, (1, 1), padding='valid', name=s_id + exp1x1)(x)
    left = Activation('relu', name=s_id + relu + exp1x1)(left)

    right = Convolution2D(expand, (3, 3), padding='same', name=s_id + exp3x3)(x)
    right = Activation('relu', name=s_id + relu + exp3x3)(right)

    x = concatenate([left, right], axis=channel_axis, name=s_id + 'concat')
    return x

def SqueezeNet(include_top=True, weights='imagenet',
               input_tensor=None, input_shape=None,
               pooling=None,
               classes=1000):
    """Instantiates the SqueezeNet architecture.
    """

    x = Convolution2D(64, (3, 3), strides=(2, 2), padding='valid', name='conv1')(img_input)
    x = Activation('relu', name='relu_conv1')(x)
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)

    x = fire_module(x, fire_id=2, squeeze=16, expand=64)
    x = fire_module(x, fire_id=3, squeeze=16, expand=64)
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool3')(x)

    x = fire_module(x, fire_id=4, squeeze=32, expand=128)
    x = fire_module(x, fire_id=5, squeeze=32, expand=128)
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool5')(x)

    x = fire_module(x, fire_id=6, squeeze=48, expand=192)
    x = fire_module(x, fire_id=7, squeeze=48, expand=192)
    x = fire_module(x, fire_id=8, squeeze=64, expand=256)
    x = fire_module(x, fire_id=9, squeeze=64, expand=256)
  
    model = Model(inputs, x, name='squeezenet')
    return model

[ Reference ]

https://github.com/rcmalli/keras-squeezenet

MdobileNet V1

일반적인 합성곱 연산 대신 Depthwise Separable Convolution을 사용함으로,

기존 CNN의 계산량을 줄이고, ==> 8~9배의 속도향상!
파라미터수를 줄인다.

일반적인 합성곱이 공간방향과 채널방향으로 동시에 합성곱 연산을 진행하는 반면, 공간방향의 합성곱(Depthwise), 채널방향의 합성곱(Pointwise)을 분해(Factorization)해서 따로 계산한 후 합쳐서 적용하여, 공간 방향 및 채널 방향을 동시해 하는 연산하는 합성곱보다 적은 파라미터와 계산량으로 근사할 수 있다.

Depthwise Separable Convolution

합성곱 연산을 분해(Factorization)한 연산

Depthwise Conv
- 공간방향의 합성곱 연산
- 입력 이미지의 각 채널마다 독립적인 컨볼루션
- width * height * depth (=1)
Pointwise Conv (1*1 Conv)
- 채널 방향의 합성곱 연산
- width(=1) * height(=1) *depth