Analysis and Implementation of the Apriori

- Sakinah Mart is a retail business that focuses on determining the layout of goods based on perceptions and implementing a discount system for specific items, but without offering bundling packages. This research aims to provide recommendations using the apriori algorithm as a decision-making tool for analyzing the layout of goods and bundling packages. The apriori algorithm is a data mining technique used to discover association rules and analyze customer purchases, specifically identifying the likelihood of customers buying item X along with item Y. The algorithm consists of two main components: support and confidence. The research applies the Cross-Industry Standard Process for Data Mining (CRISP-DM) method, utilizing the apriori algorithm to analyze sales transaction data. The dataset includes 2000 sales transactions with two attributes, resulting in the identification of 2 and 3 itemsets. The findings include 16 rules with a minimum support value of 42% and a minimum confidence of 85% for the layout of goods. For bundling packages, 5 rules with a minimum support value of 40% and a minimum confidence of 90% were generated. These results offer valuable recommendations to the company, using the apriori algorithm for analyzing the layout of goods and bundling packages.


I. INTRODUCTION
The development of the world trade business through the free market economy and the development of existing information technology led to competition, especially in the retail sector, which is increasing to meet customer needs [1].This can be seen from the information processed by the Business Competition Commission about commercial competition which is included in the category of high commercial competition nationally.In 2021 the value of the Business Competition Index was 4.81, in 2020 it was 4.65 points.The competition index is increasing, driven by customer demand and supply factors [2].Therefore, companies must be able to implement good business strategies in order to compete and increase their sales [3].
Sakinah Mart is one of the Hidayatullah As-Sakinah East Java Pesantren Cooperative companies that operates retail stores in the form of minimarkets and supermarkets.One of the Sakinah Mart branches is located on Jl.Kyai Tambak Deres No. 64, Bulak District, Bulak District, Surabaya City, East Java Province.More than 3,000 food and non-food items are available at competitive prices to meet customers' daily needs.Sakinah Mart can process hundreds of transactions in a day.However, with the support of information technology and applications such as Microsoft Excel, the amount of transaction data that exists is still not effective in creating information on how to implement sales strategies [4].
Sakinah Mart always strives to achieve customer satisfaction by offering quality goods, excellent service and a pleasant shopping atmosphere.Competing with minimarkets or other selected stores, a strategy is needed to keep the retail business running.In this sense, the store must also understand what the customer really wants to buy [5].For example, goods laid out on the shelves should be adapted to the habits of customer patterns [6] [7].The current strategy carried out by Sakinah Mart, the laying of goods on the shelves is based on management perceptions alone.In addition, Sakinah Mart applies a discount system only for certain products, but does not implement a bundling package system for products that are nearing expiration.
Arrangement of the layout of goods and bundling packages is one of the strategies for selling goods that can increase business profits [8][9].If this continues, one of them leads to ignorance of consumption habits that affect sales [10].From these terms, to inform sales strategies, it is necessary to know the consumption habits that are visible from their proximity through association rules [11].Association rules can also make it easier for customers to do business, for example picking up goods because the items purchased together are close together.In addition it can help recommend products that customers need [12].
Some strategies that can be used include analyzing the customer buying process based on sales transactions [13].But Sakinah Mart does not use sales transaction data to filter out information that can increase sales.Sales transaction data collected over a period of time is analyzed to provide information on the layout of goods and bundling packages.The processing of such data is carried out using certain techniques.One of the techniques used using the apriori algorithm method.
The apriori algorithm method is one of the classic data mining algorithms that can be used to examine association rules, look for relationship patterns in data sets and analyze in-store purchases to find out how likely a customer is to buy item X purchased together with Y [14].Apriori algorithm is a data retrieval algorithm with association rule [15].Association rule is done through the mechanism of calculating the support and confidence of an item relationship.An association rule is said to be good if the support value is greater than the minimum support and the confidence value is greater than the minimum confidence [16].Therefore, apriori algorithm can be used for management in arranging the arrangement of goods and bundling packages in the store with the aim of providing optimal service, so that customers feel comfortable when shopping, easily get the goods to be purchased and maximize sales [17].
As in the research conducted by I Komang et al (2022), the research theme is to analyze the transaction of selling goods using the apriori method.Research focuses on the placement of goods.The study obtained the results of 2 combinations of itemsets [18].There is also research conducted by Surya Listanto et al (2022), the research theme is to implement data mining on sales data with apriori algorithm.Research focuses on bundling packages [1].There is also research on the use of apriori algorithm as conducted by Alfie Nur et al (2021), the research theme is to implement apriori algorithm to determine purchasing patterns for customers.The study obtained the results of 2 combinations of itemsets [19].

II. METHOD
This research was conducted at Sakinah Mart located on Jl.Kyai Tambak Deres No. 64, Bulak District, Bulak District, Surabaya City.The implementation of this study lasted for 16 weeks.Data collection techniques in this study with interview, observation, and documentation techniques.Data analysis in this study using apriori algorithm method and Cross-Standard Industry for Data Mining (CRISP-DM) method.The Cross-Standard Industry for Data Mining CRISP-DM method is one of the data mining process models that is still widely used in the industry because of its excellence in solving many problems in data projects Mining.In addition, it is the most popular method because it is often used in the research process [18].In the CRISP-DM process has 6 (six) stages , including business understanding, data understanding, data preparation, modelling, evaluation, and deployment [20][21] as in Fig. 1.

A. Business Understanding
In this phase, it is the first phase in which problems are identified in more detail.After the problem identification process is complete, this business understanding will emphasize more on the research objectives, namely the Sakinah Mart Branch, Jl.Kyai Tambak Deres.The problems obtained are as follows: Sakinah Mart's sales strategy in determining the placement of goods is based on perception only.Sakinah Mart applies a discount system only for certain products, not a bundling package system.From these problems, the goal can be derived to find out the close customer patterns seen by the rules of the apriori algorithm association and recommendations on the layout of goods and bundling packages to improve the right sales strategy for the company.

B. Data Understanding
In this phase, it is the second phase where collecting sales transaction data at Sakinah Mart from April to September 2022.The data used is primary data.From this data, it represents a total of 2000 data with 2 attributes.

C. Data Preparation
In this phase, it is the third phase where preparation of data is carried out, namely the identification of the goods purchased as in Table I.After that, it is necessary to identify again before entering the modeling stage.Identification is carried out by category of goods (Table II).

D. Modelling
In this phase, it is the fourth phase where data modeling is carried out using apriori algorithm.Based on observations and interviews, the attributes used on the apriori algorithm are the transaction number and the product purchased.Data saved in CSV format is entered into the WEKA tools.
The data analysis step requires several steps that must be taken to carry out the troubleshooting process appropriately.Therefore, the steps taken in the study are illustrated with a block diagram as in Fig. 2.  1) Analyzing Sales Transactions: From the transaction data obtained in April -September 2022, an analysis was carried out first by selecting transactions of at least 2 sales of goods.Then from the transactions that have been selected, the separation of goods is carried out according to the category of goods.

2) Setting the value of Minimal Support and Minimal Confidence:
In this process, the determination of the minimum support value and minimum confidence that has been agreed with Sakinah Mart is carried out.The support value in the agreed goods layout process is at least 25% and the confidence value is at least 80%.Meanwhile, in the bundling package process, the agreed support value is at least 20% and the confidence value is at least 90%.
3) Defining a Combination of 2 and 3 Items: In this process, the determination of the combination of 2 and 3 itemsets is carried out.The combination of itemsets is by searching for items that often appear in sales transaction data.In determining the combination of itemsets based on k-itemset.The k referred to in the itemset refers to the number of items to be combined such as k=itemset (k=2), k=itemset (k=3).

4) Support Value Calculation:
In this process, the calculation of the support value is carried out.The calculation of support becomes a very important measure in the rules of association.Support is a percentage of the number of transactions for a particular combination of items.Here's the formula to get the calculation of the support value: From the calculation of the support value if the results obtained exceed the minimum support value that has been determined, the calculation of the confidence value is continued.If the result of calculating the support value obtained does not meet the minimum support value, it is eliminated.

5) Calculation of Confidence Value:
In this process, the calculation of the confidence value is carried out.The calculation of confidence becomes a very important measure in the rules of association.Confidence is the percentage of accuracy of the resulting association rules.Here's the formula for obtaining the calculation of the confidence value: Confidence P (B|A) = 100% From the calculation of the confidence value if the result obtained exceeds the predetermined minimum confidence value, it is continued with the selection of the results between the largest support and confidence.If the result of calculating the confidence value obtained does not meet the minimum confidence value, it is eliminated.

E. Evaluation
In this phase, it is the fifth phase where an evaluation is carried out by looking at the apriori results and comparing them with initial goal to see if they have been achieved.The evaluation stage is also seen in determining the value of support and confidence in WEKA software.

F. Deployment
In this phase, it is the sixth phase.This stage was not carried out in this study.

A. Data Preparation
Based on the data that has been collected previously, the next steps are as follows: 1) Data Selection: At this stage, data selection is carried out using sales transaction data from April to September 2022.The data that has been selected can be seen in Table III.

NoTransaksi,
In this phase, transaction data is tested apriori algorithm using data mining software, namely WEKA Studio version 3.9.5 to get layout recommendations and bundling packages.WEKA input data is tabular data in CSV format.The initial display of WEKA 3.9.5 can be seen in Fig. 3.

1) Item Layout:
The item layout process uses parameters with support for a minimum support of 42% or 0.42 and a minimum confidence of 85% or 0.85.Then click Ok.To start processing the data that has been selected, click Start, the calculation results of the apriori algorithm will appear.Based on the results of calculations with WEKA produce 2 and 3 combinations of itemsets.The test results using WEKA software obtained the results of the 16 rules in accordance with the minimum support and confidence that have been determined.Items that are related to one another.For 2 itemsets of goods: 2) Item Layout Recommendations: Based on the test results using the WEKA software, items that are related to one another need to be placed on adjacent shelves.The following is a recommendation for item layout can be seen in Fig. 4.

Fig. 4 Layout recommendation results
 Beverages on shelf 4 which were previously adjacent to breakfast food were moved to the right of basic food on shelf 4 to replace canned & bottled food.This is done in accordance with the results of the association rules formed where beverages will trigger customers to buy basic food, snack & biscuits, instant noodle. Milk on shelf 2 which was originally adjacent to baby food was moved to the left of instant noodle on shelf 4 to replace the diary food.This is done so that the milk remains close to beverages and snack & biscuits, that is, facing each other. Hair care and oral care on shelf 1 which were originally adjacent to stationery were moved to the left of the bakery on shelf 3 to replace the position of instant food.This is done so that hair care and oral care remain close to the snack & biscuits, that is, facing each other.
3) Bundling Package: The package bundling process uses parameters with a minimum support of 40% or 0.4 and a minimum confidence of 90% or 0.9.Then click Ok.To start processing the data that has been selected, click Start, the results of the calculation of the Apriori algorithm will appear.Based on the results of calculations with WEKA, it produces 2 and 3 combinations of itemsets.The test results using the WEKA software obtained the results of the 5 rules according to the specified minimum support and confidence.Goods that are related to one another are: 4) Bundling Package Recommendations: Based on the test results with the WEKA software, the results obtained for recommendations for bundling packages that can accommodate the number of packages with items that have a link between the goods purchased and goods that are close to expiration by putting them in the bundling package.Then these items can be recommended as bundling packages to keep selling.The following is the recommended bundling package which can be seen in Table VI.

C. Evaluation
At this stage it is explained about the comparison between the current layout calculation and the recommended layout calculation using the apriori algorithm.The results of the recommendations obtained from the apriori algorithm using WEKA tools obtained results totaling 16 rules with a minimum support of 42% and a minimum confidence of 85%.In addition, the bundling package recommendations obtained results amounted to 5 rules with a minimum support of 40% and a minimum confidence of 90%.The current layout calculation can be seen in Table VII and the recommended layout calculation using the apriori algorithm can be seen in Table VIII.Based on the evaluation results, there are some differences in the current layout calculation and the recommended layout calculation using the apriori algorithm.For example, the apriori algorithm calculation recommendation results in: {cooking oil & margarine} {snack & biscuit} (support = 43.3%,confidence = 91%).This means that customers who buy cooking oil & margarine have a 43.3% chance of also buying snacks & biscuits with a confidence level of 91%.Meanwhile, the calculation of the current layout results in: {cooking oil & margarine}  {chilled & frozen food} (support = 21.7%,confidence = 46%).This means that customers who buy cooking oil & margarine have a 21.7% chance of also buying chilled & frozen food with a confidence level of 46%.With this difference, the cooking oil & margarine layout in the current calculation which is adjacent to chilled & frozen food is recommended to be moved next to snacks & biscuits.This is because the support and confidence generated from the apriori algorithm recommendation is better than the current layout calculation, and in accordance with the minimum value set by Sakinah Mart.This research successfully achieved its goal in generating recommendations using the apriori algorithm.

IV. CONCLUSION
The results of the apriori algorithm calculation in analyzing the layout of goods are 16 rules with a minimum support value of 42% and a minimum confidence of 85%.Meanwhile, for bundling packages, there are 5 rules with a minimum support value of 40% and a minimum confidence of 90%.This results in recommendations for companies to use the apriori algorithm in analyzing the layout of goods and bundling packages.