SEARCHING SIMILARITY DIGITAL IMAGE USING COLOR HISTOGRAM

is widely used

In the era of globalization and modernization, as now, information technology is widely used in the fields of education, trade, animal husbandry, agriculture and even to the legal sector.One branch of science in the field of information technology that is growing rapidly is computer vision.One of the important roles of computer vision in everyday life is the use of computer vision.This can be applied in terms of face recognition, object detection, and can be applied to group images based on the order of similarity of the image, the ability of computer vision is applied to facilitate human work in selecting from several images to find the most similar images.In this study described the process of finding the similarity of an image with other images through several stages of research flow, the method used is to use RGB values that have been converted to grayscale, then the euclidean distance is calculated to determine the value of proximity of an image while calculating performance accuracy algorithm using confusion matrix.The search trial process resulted in an accuracy rate of 0.42, precision of 0.42 and recall 1 of 1000 datasets and 30 random data were taken.Found images that differ in color and shape but when converted into histograms the data has a fairly high similarity to the query.The disadvantage of this research is that images that have histograms similar to queries are displayed as similar images even though the reality is that images are very different from colors and shapes.

INTRODUCTION
In the era of globalization and modernization, as now, information technology is widely used in the fields of education, trade, farming, agriculture and even to the legal sector.One branch of science in the field of information technology that is rapidly developing is computer vision, where its application is widely used for needs relating to the ability to see.One of the important roles of computer vision in everyday life is the use of computer vision in making selections from several images to find the most similar images.This can be applied in terms of face recognition, object detection, and can be applied to group images based on the order of the image's similarity.Face recognition is one of the biometric studies.Until now face recognition is still an interesting and challenging field of research.Face recognition has been widely used in applications such as system security, credit card verification, criminal identification etc [1].
Histograms represent a popular representation of features in computer vision.Examples of applications include: object detection, human detection, texture analysis and tracking.Histograms encode the distribution of irregular spatial measurements in an area.More formally, histograms are defined as numeric arrays, where each element (termed bin) matches the frequency calculation of the range of values (eg image intensity, color, gradient orientation, etc.) in the given image or subset.From a probabilistic point of view, a normalized histogram can be seen as a function probability distribution.In terms of intensity and color-based histograms, this histogram shows invariance of the translation, rotation in the plane, and changes slowly without aircraft rotation, changes in object distance and occlusion.In addition, the lighting invariant can be realized by changing the input image using appropriate transformations before histogram construction (eg, normalized RGB, HSV, YUV color spaces) [2].In the image processing process there is basic information that can be processed from an image that is in the form of color features, features in the form and features in the form of textures.The color feature in an image is a feature that is quite dominating because of the feature sensitivity information is obtained about the viewpoint of an image, the translation of an image and the rotation of an image [3].Data that has been extracted from an image can usually be in the form of numerical data that is ready for calculation, because data is generated in a digital image shaped matrix with length m and width n, where n itself is the size in pixels.

Research Flow
From the research made, the plot is first the image data that will be searched for similarities are entered into the prototype, the image in the form of RGB is converted to grayscale then the image formed from grayscale is converted again to the histogram, from the histogram calculation is done to find the closest distance from some images look for similarities, the results of calculations are entered into the database then in the closest distance filter from the image that has similarities and the final results are displayed filtered image.The flow of research is more clearly seen in

Degree of Gray (Grayscale)
Digital images can also be expressed in two-dimensional matrices with.x and y are pixel coordinates in the matrix and f is the degree of intensity of the pixel [4].The matrix formed from images with size m X n is as follows: Graysclae are color pixels that are in a gradation range between black and white.Graysclae is a blend of minimum black and minimum white color [5].
The process of converting RGB images to Graysclae can be done using the following equation [6].

Grayscale Color Histogram
The image histogram refers to the probability mass function of the image intensity.This is extended to color images to capture the combined probability of the intensity of three color channels [7].The gray histogram formed from an image consists of 256 points on the x axis consisting of numbers 0 to 255.The yaxis contains the number of repetitions of each color on the x-axis [8].

Prototype of Image Similarity Test
Based on previous research in the similarity of the image to the search process based on shape and color, a sequencing process is based on the threshold value of the sample image through the use of the threshold algorithm [9], and get the comparison between the threshold value and aggregation value almost the same.Conversely, if it approaches 0, the comparison becomes very different.Other studies mention color composition can be displayed in the form of a histogram that represents the distribution of the number of pixels for each color intensity in the image.In determining the level of maturity of apples, it can be determined based on the composition of the color, with the results of experiments on programs that have been made show that the image that has similar color image distribution exactly has the difference in distance  [10].For current research using a simple prototype in the search process, prototype like image 4. which consists of several menus using the php programming language, namely: Figure 4. Image Similarity Prototype From Figure 4 above, the search menu is used to search for images that are similar to the image being tested, the image upload menu is used to add data sets, the menus of all images are used to display all images that have been inputted to the prototype, menus to insert images to DB are used to store image that has been uploaded to the database.

Extraction Query Feature
The initial stage in the image search process is to extract features that are in an image that become queries.The extraction process is carried out to obtain RGB value information in an image in the form of numbers, which will then be converted from RGB to grayscale, as shown in figure 5 below using the formula in equation ( 1):   The results of calculation of the Query Distance and Nearby Dataset can be seen in table 6. below: Queri Database Distance

Measurement of Algorithm Performance
Performance measurement of a study is very important, this is done in order to obtain information on how high the accuracy of an algorithm when compared with other algorithms, so also in this study to be able to see how high the accuracy of the algorithm used, Confusion Matrix method is used [12].In the confusion matrix, the results of the trial results will be divided into two classes, positive class and negative class.Where the positive class contains the correct test results that are considered true (true positive) and the correct test results are considered wrong (true negative).Whereas in the Negative class there is a wrong trial result (false positive) and the wrong test result is false (false negative).Where: 1. TP is True Positive, which is the amount of positive data that is correctly classified by the system.2. TN is True Negative, which is the amount of negative data that is correctly classified by the system.3. FN is False Negative, which is the amount of negative data but is incorrectly classified by the system.4. FP is False Positive, which is the number of positive data but incorrectly classified by the system.
From the research conducted to measure the performance of the algorithm, sampling was carried out by taking each of the two images in each category so that 30 random images were generated using the formula (4) and the results as below:  8, the value of TP=20, FP=11, FN=0, and TN=0.The results of the calculation produced are as follows Based on the calculation of the confusion matrix algorithm above, it can be concluded that the accuracy is 0.42, the precission is 0.42, and the recall is 1 of the random data of 30 datasets. .

Figure 2 .
Figure 2. Converting RGB images to Grayscale

Figure 5 .
Figure 5.The Process of Converting RGB Images to Grayscale While the results of calculations stored in the database are shown in table 1 below (from the 1000 datasets tested are shown in table 1 only 19 sample data):

Figure 20 .
Figure 20.Classes in Confusion Matrix