Logarithm Decreasing Inertia Weight Particle Swarm Optimization Algorithms for Convolutional Neural Network

- The convolutional neural network (CNN) is a technique that is often used in deep learning. Various models have been proposed and improved for learning on CNN. When learning with CNN, it is important to determine the optimal parameters. This paper proposes an optimization of CNN parameters using logarithm decreasing inertia weight (LogDIW). This paper is used two datasets, i.e., MNIST and CIFAR-10 dataset. The MNIST learning experiment, the CIFAR-10 dataset, compared its accuracy with the CNN standard based on the LeNet-5 architectural model. When using the MNIST dataset, CNN's baseline was 94.02% at the 5th epoch, compared to CNN's LogDIWPSO, which improves accuracy. When using the CIFAR-10 dataset, the CNN baseline was 28.07% at the 10th epoch, compared to the LogDIWPSO CNN accuracy of 69.3%, which increased the accuracy.


I. INTRODUCTION
Deep learning is a set of algorithms in machine learning that tries to learn at several levels, according to different levels of abstraction. This is usually done using an artificial neural network (ANN). The levels in the statistical model studied correspond to different concept levels, where higher-level concepts are defined from lower-level concepts, and lower-level concepts can help define some higher-level concepts. Deep learning is one of the fields of research that works based on artificial intelligence (AI). According to [1] , deep learning can be grouped into 3 models, namely: discriminatory models, generative models and hybrid models. The discriminatory model consists of deep neural network (DNN), convolutional neural network (CNN) and recurrent neural network (RNN). Generative models include deep belief networks (DBN), restricted Boltzmann machines etc. While the hybrid model is a structure that combines identification models and generative models.
In several studies such as that conducted by [2] who proposed an algorithm based on particle swarm optimization, to automatically perform searches to be applied to the deep convolutional neural network architecture for image classification, which is called psoCNN. However, when learning to use machine learning such as CNN, it is important to determine the optimal hyperparameters. There are many parameters in deep learning, and it is difficult to determine the optimal value manually, so some studies are trying to do it automatically to determine the optimal value.
One of the known automation methods is the metaheuristic algorithm. Some metaheuristic algorithms as an automation method. Metaheuristic algorithms are a way to solve difficult optimization problems, and in recent years they have been used for CNN optimization [3][4][5][6]. Algorithms are inspired by nature and are based on animal behavior, physics, biology, and so on. This algorithm is based on biological phenomena including evolutionary strategies (ES) and genetic algorithms (GA) [7]. Meanwhile, those based on physical phenomena include simulated annealing (SA) [8]. While based on animal behavior, among others: Ant Colony Optimization (ACO) [9], Firefly Algorithm (FA) [10], Bat Algorithm (BA) [11], and Particle Swarm Optimization (PSO) [12].
For some of these metaheuristic algorithms, PSO is widely used for network optimization. PSO has good convergence when compared to GA. Several studies that use PSO for CNN optimization include the research of [13] proposed a hybrid algorithm in which particle swarm optimization (PSO) is used to reduce the complexity of the overall algorithm. PSO used in conjunction with CNN will reduce the number of epochs in the training process and its dependence on the GPU system. The proposed algorithm can increase the accuracy by 3-4% with a smaller number of epochs. Besides that, using this algorithm is able to overcome the local minima problem which is a common problem in the backpropagation training methodology. PSO is also used in image classification along with CNN, referred to as psoCNN. Where a new strategy of direct coding and operator speed is designed to allow optimization of the use of PSO with CNN. The experimental results show that psoCNN can quickly find the CNN architecture well [14] Likewise, CNN's PSOs were also combined as done by [14][15]. One of the important things in PSO is the inertia weight used. The inertia weight parameter is important for optimization. Some of them are inertia weights that have been introduced in research [16][17][18][19][20][21]. In this study, the logarithm decreasing inertia weight (LogDIW) of PSO [22] was used to optimize the CNN hyperparameter. The aim was to improve the accuracy of the MINST, CIFAR-10 and Imagenet datasets, which are often used as benchmarks, using a CNN based on the LeNet-5 architecture [23]. In this study, an optimization algorithm for the convolution neural network is proposed to overcome the many parameters that exist.
This paper is structured as follows: Section 1 describes the background and previous research conducted related to metaheuristic algorithms and CNN, especially the use of PSO for network optimization. Section 2 describes the proposed method. Section 3 result and discussion. Section 4 describes the conclusion and future work.

II. METHOD
In this study, a LogDIWPSO was used to optimize the CNN parameters and improve the accuracy obtained via CNN. Figure 1 shows a flowchart of the proposed method.

A. Particle Swarm Optimization
Ref. [24] introduced the particle swarm optimization (PSO) algorithm which is a stochastic optimization technique. The basic idea of PSO is to involve a flock of birds in search of food sources in a particular area. The birds do not know for sure where the food source is. Through iteration, they will find out how far the food source can be found. The best individual strategy will be followed by the bird that is close to the food source and also from the previous best position that the individual has achieved. PSO looks for the optimal solution by updating the position and velocity of each particle through (1) and (2) [25].
Where t is the-t iteration, while d is the d-the dimension in the optimization search space. 1 dan 2 are acceleration constants, representation the weighting of the stochastic acceleration terms that push each particle to the best ( ) dan the global best ( ). The values 1 and 2 are random numbers that have a uniform distribution in [0,1].

B. Logarithm Decreasing Inertia Weight
From the research that has been done, it shows that PSO with a large inertial weight value (w) has a good ability in global optimal search, compared to a small inertial weight. The more inertial weights tend to have faster convergence. In his research, [22] introduced an inertia weight called the logarithm decreasing inertia weight (LogDIW) which is written as (3) w = w max + (w min − w max ) * log 10 Where a is a constant for the evolutionary velocity adjustment, here a=1. w min and w max are minimum and maximum weight, t is iteration

C. Convolutional Neural Network
Several models have been proposed for the CNN architecture, including LeNet-5 [23]. In this study, LeNet-5 is used, which is one of the well-known basic architecture. The LeNet-5 architecture is shown in Fig. 1 and the proposed algorithm's flow chart is presented in Fig. 2.  Fig. 1 The LeNet-5 architecture [23] In the LeNet-5 architecture, each convolution layer consists of 3 parts: convolution, pooling and nonlinear activation functions. Convolution is used to extract spatial features and is often referred to as receptive fields originally. In the LeNet-5 architecture, there is also a subsampling average pooling layer, tanh activation function, MLP classifier, and connection space between layers to reduce the complexity of the calculation. There are 7 layers consisting of 3 convolution layers, 2 subsampling layers and 2 fully connected layers.
In the experiment, first determine the position and velocity of each particle. Next, a CNN is run for each particle, and the position, velocity, pBest and best are updated based on the results obtained. This operation is repeated until a certain time and the parameter gBest of the particle in the experiment is cross-entropy and is written as the entropy equation.
The layers in the LeNet-5 architecture as shown in Fig. 1, each consist of: Layer 1 is the input layer. The input layer can support 32 x32 pixels. Layer 2, namely layer C1 is a convolution layer with six convolution kernels measuring 5 x 5, while the size of feature allocation is 28 x 28. Layer 3 is layer S3 which is a grouping layer producing 6 function graphs with a length of 14 x 14. Each cell is a function map mapped to are 2 x 2 on the corresponding function map in C1. Layer 4 is the convolution layer of C3 which consists of 16 convolution kernels of 5 x 5 inputs derived from the 6 main function maps of C3. Layer 5 is layer S4 which is the same as S2 with size 2 x 2 and the output of sixteen function graphs of size 5 x 5. Layer 6 is layer C5 is a convolution layer with 120 core convolutions with a length of 5 x 5, the output length of C5 is 1 * 1 so that S4 and S5 are actually connected. Layer 7 is layer F6 which is linked to C5 and the resulting 84 feature charts. In this study, the activation function of each layer uses Sigmoid, ReLu and Tanh, and the batch size is optimized from 10 to 50, shown in Table I. The optimizer used is Adam or Stochastic Gradient Descent (SGD) with a learning rate of 0.01. In the implementation of the model using google colab.
The main parameters of the LogDWPSO are shown in Table II

III. RESULT AND DISCUSSION
In this research, the experiment with 2 datasets and Lenet-5 CNN architectures. The first experiment was to use the MNIST dataset and the CIFAR-10 dataset on the Lenet-5 architecture. The two image datasets are shown in Fig.3.
MNIST is an image database consisting of 600 handwritten number images for learning and 10000 for testing. Each image is given a label from 0 to 9 In this study, optimization was carried out with LogDIWPSO for every 5 epochs, and learning was displayed based on the parameters obtained. Each experiment was carried out in 30 times, and the average value was used. CIFAR-10 is a labeled subset of 80 million small image datasets. The dataset was collected by [26]. The CIFAR-10 dataset consists of 60,000 32 x 32 color images in 10 classes, with 6000 images per class. In the CIFAR-10 dataset there are 50,000 training images and 10,000 testing images. The dataset is divided into five training batches and one testing batch, each batch consisting of 10,000 images. The accuracy and variance are shown in Table  III and IV. At the time of learning 10 epochs, the accuracy of CNN optimized with LogDIWPSO was 98.22%, higher than CNN without optimization which was 97.42%. In Fig.4

IV. CONCLUSION
In this research, a convolutional neural network method using logarithmic decreasing inertia weight (LogDIW) for PSO has been proposed. LogDIW is used for particle swarm optimization (PSO) hyperparameter optimization. The CNN architecture used is Lenet-5 with two datasets, namely MNIST and CIFAR-10. In this paper is used two datasets, i.e., MNIST and CIFAR-10 dataset. The MNIST learning experiment, the CIFAR-10 dataset, compared its accuracy with the CNN standard based on the LeNet-5 architectural model. When using the MNIST dataset, CNN's baseline was 94.02% at 5th epoch, compared to CNN's LogDIWPSO, which improves accuracy. When using the CIFAR-10 dataset, the CNN baseline was 28.07% at the 10th epoch, compared to the LogDIWPSO CNN accuracy of 69.3% which increased the accuracy.
The next research is to apply the proposed method for different classification of new datasets. The effectiveness of the proposed method is seen in the image classification process. In addition, it is necessary to analyze the conditions for stopping PSO if it is applied in the classification process. Different architectures from those used in this study can also be considered for future research.