Boyer-Moore String Matching Algorithm and SHA512 Implementation for JPEG/Exif File Fingerprint Compilation in DSA

Data integrity, authenticity and non-repudiation are security parameters provided by Digital Signature Algorithm (DSA). The hash value is an important element inside DSA to identify information data integrity using the hash function to generate a message digest. JPEG/Exif is an image file format that produces by a digital camera as in smartphone. Hardware technology development made the image file has a higher resolution than before. This condition made image file fingerprinting need more time to compile JPEG/Exif fingerprint. This research purpose is to develop a fingerprinting process for JPEG/Exif files using the Boyer-Moore string matching algorithm and SHA512. Research conducted in four stages. The first stage is JPEG/Exif file structure identification, the second stage is segmented content acquisition and hashing, the third stage is image file modification experiments and the last stage is JPEG/Exif file fingerprint compilation. The obtained result showed that the JPEG/Exif file fingerprint comprises three hash values from the SOI segment, APP1 segment, and SOF0 segment. The JPEG/Exif file fingerprint can use to detect six types of image modification there are image resizing, text addition, metadata modification, image resizing, image cropping and file type conversion.


INTRODUCTION
Image files and videos are two types of files used in many ways communication. Both image and video files produced by digital camera usage such as in smartphones. The image file that produces has a format file as JPEG/Exif. JPEG/Exif file format structured from a few parts that each part store specific information about image [1]. The information security must be provided to ensure that the information received is in the same condition as when it was sent. The information must also arrive at the right recipient. The recipients of information must also be able to ensure that the information received comes from senders he or she knows.
Digital Signature Algorithm (DSA) is a cryptographic method that used to full-fill information security parameters [2]. DSA uses three elements to secure information, data fingerprint, asymmetric key-pair and digital certificate [3]. Data fingerprint used to secure information data integrity. Asymmetric key-pair used to identify the right full information owner and receiver. The digital certificate used to provide non-repudiation before the law [4]. DSA performed on both sides, information sender and receiver as shown in Figure 1. The first process until third in Figure 1 shows the information fingerprint generating process. The fourth process is information fingerprint encryption with the asymmetric key pair. The fifth process is embedding ciphertext from the previous process into information [5]. DSA processes on the receiver side have similarities with the sender side. Start with generating fingerprint, extracting ciphertext, decrypting ciphertext and verifying two information fingerprints.
The fingerprint of information is the most important element among DSA elements. The fingerprint generating process also called fingerprinting, starting with identity information structure. The identifications result are location data segments representing by segment start index value. Segments index values used as boundary parameters for segment content acquisition in DSA second process and twelfth process.
A segment identifying in DSA conduct by use a string matching algorithm. String matching algorithms categorized into two categories based on comparative methods, exact string matching and approximate string matching [6]. Boyer-Moore (BM) string matching algorithm includes exact string matching categories. BM string matching starts to compare character from the most-right element of the pattern then shift to left. Figure 2 shown the BM searching process.  Figure 2 is comparing the fifth element of pattern to the fifth element of the string. If not match then compare the fourth element of pattern to the fifth element of the string. If comparison has reached to first pattern element and still not match then the pattern will slide as far as pattern length as shown in the second step in Figure 2. Step 2, 3 and 4 have the same process as step 1.
Step 5 has a condition called the Good-Character Rule that allows aligning patterns. This happens when the matching character found for the first time after not match condition. Pattern aligning occurred in steps 6 and 8.
Pattern location that has found used as a parameter to generate sub-string for hashing process on DSA third and thirteenth process. The hashing process is a method to generate message digest by executed hash function [7]. The Secure Hash Algorithm (SHA) is a cryptographic hash function developed by the National Institute of Standards and Technology (NIST) [8]. SHA has multiple variants, SHA0, SHA1 with two versions of hash functions MD4 and MD5 and SHA2 which have six hash functions include SHA512. Every SHA variants have a different hashing process and different output size. SHA512 have 512-bit message digest. SHA512 output size makes this SHA variant have better performance than its predecessor [9]. SHA512 has two main processes, padding, and hashing. Padding is adding input data bit to form data which has 1024 bit block length. The hashing process comprises three steps.
Step one is buffer value assignment, step two is register initiation and step three is message block expanding. Buffer value for SHA512 hash function shown as in Table  1. Table 1. SHA512 Buffer Value Register initiation is assigning buffer value into each eight register (a,b,c,d,e,f,g,h). Six logic function in equation 1 used to initiate eight registers.
After eight registers initiated, message block expanding executed as shown in Figure 3. The output from message block expanding are eight hash value that store in eight registers. Figure 4 shown eight registers with hash values for string input "abc". Eight hash values then add with eight buffer value. The "a" register with H1, the "b" register with H2 and so on. Message digest from the input as final hash value compiles from eight hash value that arranged as one string. Figure 5 shown the SHA512 message digest compiling process from eight registers and eight buffer values.  The time required for identifying the information segment is linear with information data size. Digital camera technology development has given higher image quality that causes an increase in image file size. The bigger size of the image file becomes a problem to execute the fingerprinting process. The focus of this research is to develop the fingerprinting method for JPEG/Exif files using the Boyer-Moore string matching algorithm and SHA512.

METHODS
The research study conducted in four stages shown in Figure 7. The first stage is the image file segment identification. This stage has the purpose of identification JPEG/Exif file structure. Image files as research object acquired from two smartphone types, Asus Z00UD and Samsung Galaxy A5. The result from the first stage is the location index of each file parts. This location index will use on the second stage as a parameter to identify the beginning and end of file parts.

Figure 7. Research stages
Second stage is segment content acquisition and hashing process. Segment content acquisition conducted by duplicating segment content based location index file parts from first stage. Hashing process conducted by executing of SHA512 with segment content as input.
Research's third stage is modification experiments that consist six types file modification, recoloring, image resizing, metadata manipulation, file format conversion, text addition and image cropping,. This third stage have purpose to identified segments that changed caused by image modification. The last stage is file fingerprint compiling. JPEG/Exif file fingerprint compiling from selected segment hash values.

RESULTS AND DISCUSSION
JPEG/Exif file structure consists of seven segments. Each segment has a segment marker that has a function to identify the beginning of the segment and the end of the previous segment. JPEG/Exif file segments and segment markers are shown in Table  2. JPEG/Exif file segment searches using the Boyer-Moore string matching algorithm with a segment marker as an input pattern. Figure 8 shown the Boyer-Moore string matching algorithm flowchart. Boyer-Moore string matching algorithm compares character from the rightmost character on input pattern and shifts to left until reaching the leftmost character from a pattern. The searching process stopped when all characters have found in the sequence.  Table 2 for JPEG/Exif file from smartphone Asus Z00UD and Table 3 from Samsung Galaxy A5.  Segment content length determined by two indexes from the two-segment start index. SOI content has the length as 4 bit from SOI start index (0) until one index before the APP1 segment start index (4-1). APP1 content has the length as 1997 bit from APP1 start index (4) until one index before the DQT segment start index (2000-1). Those two index values used as a parameter to generate substring for SHA512 hash function input in the third stage. Figure 9 shown hashing result from Asus Z00UD JPEG/Exif file. Figure 9. Hash values from JPEG/Exif segments Six hash values from six jpeg/segments form Figure 9 can use as file fingerprint on the research fourth stage. File fingerprint selection process purposes are to determine segments that affected if the image file altered. Image file modification experiments conduct in six experiments as describe in section 2 (methods). Table 5 shown affected segments for each experiment. Recoloring Resizing The result from Table 3 categorized into three groups. The first group showed the metadata modification experiment affected the APP1 segment only. The second group showed that the SOI segment altered for image file conversion and text/object addition experiments. The third group shown all segments altered for image display modification. Each result group used as file fingerprint components. The first component is SOI hash value and the second component is APP1 hash value. Third component selected from four segments (DQT, SOF0, DHT, SOS) based on segment content and segment length. Table 6 shown segments length from JPEG/Exif image file that counted use segment index from research first stage result. Table 6. JPEG/Exif segments length Figure 10. JPEG/Exif file fingerprint

CONCLUSION
The compilation of fingerprint files for JPEG/Exif files is conducted by identifying three segments location (SOI, APP1, SOF0) using the Boyer-Moore string matching algorithm. Content from all three segments is used as input for hash function SHA512. The selection of the three hash values determined based on the information inside segments and the size of the segment. Segment size affects the length of time needed to identify the location of the segment and compile the hash value.