Parallel Multi-core Implementation of the Optimized Speck Cipher

ABSTRACT


INTRODUCTION
Given the availability of resource limited devices, including those that are connected to Internet of Things (IoT), lightweight cryptography has become an important research topic.Furthermore, when it comes to these low-end devices with limited processing power and memory, implementation of any robust cryptosystems is often hampered due to their strong security [1].
Recently, a lot of attention has been focused on Speck; a lightweight block cipher because it operates efficiently in this context and can be easily implemented.In 2013, the United States' National Security Agency (NSA) introduced Speck cipher as an alternative to AES which was more affordable.These are classes of block ciphers that work on 32-or 64-bit words depending on the key sizes and use fixed number of cycles.For instance, if one needs secure communication but has devices having limitations in resources, then he or she will definitely choose Speck since its simplicity and efficacy appear to be very appealing [2].
However much beneficial Speck maybe but there is more need for enhancing its operations especially through making them more efficient so as to meet the growing demand for cryptographic algorithms.Parallel processing is a computational methodology utilized to partition the running of an algorithm for cryptography into more manageable tasks.These tasks are then executed concurrently on several computing resources, including multi-core central processing units (CPUs).
In this work, we proposed new configured speck cipher to increase its performance capabilities.The approach prioritizes the reduction of round count and is executed on multicore CPUs with the intention of enhancing performance.In addition, the parallel implementation was devised strategically to take advantage of the capabilities of the multi-core architecture of modern CPUs.By utilizing the CTR mode of operation, the proposed parallel implementation of Speck achieves a high degree of parallelism and enables efficient data encryption and decryption.This work examines the outcomes of the security analysis and assesses the performance of the parallel implementation in relation to the original Speck algorithm.It evaluates the implementation's efficacy in terms of throughput and speedup.
The rest of this paper is organized as follows.Section 2 provides a brief overview of the related works.Section 3 describes the background of Speck encryption.Section 4 describes the proposed optimized Speck algorithm.Sections 5 demonstrates the parallel implementation of the optimized Speck and its original version on multi-core CPU.Section 6 validates and compares the security results of the proposed optimized Speck cipher.Section 7 presents the experimental results of sequential and parallel implementations.Section 8 concludes the proposed work and summarizes the obtained results.

RELATED WORKS
The two primary categories of encryption methods are symmetric and asymmetric.The symmetric-key strategy is favoured in practical implementations because it requires less computational complexity, memory usage, and resources than the asymmetric key scheme.Stream ciphers and block ciphers are the two main types of symmetric ciphers.The Counter mode (CTR) variation of AES is one of the most common implementations of this widely-used algorithm [3,4].The block cipher functions as a stream cipher here because the ciphering process is decoupled from the plaintext.To achieve the appropriate degree of security, however, block ciphers often need numerous rounds.Feistel or Substitution-Permutation Networks (FN or SPN) may form the basis goal of keeping the confusion and diffusion features intact by increasing the number of rounds.Several notable research endeavours have been dedicated to exploring innovative techniques and optimizations for reducing the number of rounds in Speck encryption, aiming to enhance its performance without compromising security.In previous research, researchers of Ren and Chen [5], and Gohr [6] utilized deep learning and cryptanalysis approaches on reduced Speck to achieve reductions to 11 rounds.In contrast, researchers of Sleem and Couturier [7], Yue and Wu [8] focused on reducing the number of rounds of Speck to 7 for lightweight cryptographic schemes, with the former proposing an ultra-lightweight cryptographic scheme for IoT and the latter presenting an improved neural differential distinguish model for the lightweight cipher Speck.
Cryptanalysis studies have been made in the state-of-the-art researches to determine the resistance of speck cipher when reducing its number of rounds.A new method for developing lightweight and universally applicable deep learning-aided differential distinguishers is presented by Liu et al. [9].The work of differential cryptanalysis against the NSA block cipher SPECK32/64 demonstrates how traditional approaches in cryptanalysis have been advanced by the inclusion of deep learning techniques when the speck rounds are reduced.
While demonstrating the effectiveness of deep learning in cryptanalysis, the work of Deng et al. [10] break new ground by incorporating attention processes into differential cryptanalysis on SPECK.Their results outperform those of classic residual networks in terms of accuracy and interpretability, uncovering possible vulnerabilities in the process.
An enhanced method of differential-neural cryptanalysis for round-reduced is introduced by Zhang et al. [11].They outperform DDT-based approaches in certain rounds after using a neural network to improve the accuracy of neural distinguishers.They accomplish realistic key recovery attacks by enhancing both classical differentials and neural distinguishers; this raises the attack threshold by two rounds, allowing assaults on up to 17 rounds.
By continually striving to reduce the number of rounds in Speck encryption, this work presents an innovative approach that significantly minimizes the number of Speck128/128 rounds to 5. The proposed work shows a strong design that is specifically made for the parallel architecture of multi-core CPUs.

SPECK CRYPTOGRAPHY
Speck, a symmetric key block cipher, is an algorithm that provides secure and efficient methods of encoding or decoding data.The lightweight cipher Speck was introduced by the National Security Agency (NSA) of the United States in 2013.It belongs to a family of ciphers named ARX (Addition/ Rotation/ XOR) which are renowned worldwide.Speck is a suitable encryption technique for many applications that require data integrity and confidentiality because it is very efficient and straightforward in its operation.Speck works on blocks of data with predetermined length using a single key for both encryption and decryption processes.Maximum quantity of data that can be processed in a single operation and level of security are directly affected by the block and key sizes.
In this work, a Speck cipher with a block size of 128 bits and a key size of 128 bits are used, ensuring strong cryptographic capabilities.The speck function in this configuration takes 32 rounds.The Speck round function can be broken down into three distinct operations: XOR, modulo addition, and rotation, as shown in Eq. ( 1).
In this case, Li and Ri denoted the left and right halves of the data at round i, respectively.ki denotes the round key at round i, and α and β are the rotation constants.Figure 1 shows the CTR Speck round function.

THE PROPOSED OPTIMIZED SPECK CIPHER
The proposed optimized SPECK algorithm employs a substitution table to reduce the number of SPECK rounds, thereby improving the efficiency of the encryption process.
We started with iterative testing and experimentation to see how various changes affected the cipher's efficiency and security.To assess the cryptographic properties, the fine-tune settings like the number of rounds, the size and structure of the S-box is used.Starting with a single Speck round, we then applied randomization and statistical testing as part of our optimization method and so on till reaching to the acceptable security level.Using this strategy, we were able to determine the optimization threshold at which we could achieve our desired degree of security.The Speck-Encrypt algorithm accomplishes this optimization by accepting the key schedule, input data, output buffer, initialization vector (IV), and substitution box (Sbox) as parameters.The optimized Speck encryption function, illustrated in Figure 2, incorporates the substitution box subsequent to every Speck round.[1] 7: for j = 0 → 5 do 8: SpeckRound(cryptediv [1], cryptediv[0], keySchedule[j]) Algorithm 1 commences by initializing the cryptediv array containing the current version of IV, which is updated for every data block in order to guarantee the encryption process's uniqueness.Each byte of the cryptediv array undergoes a substitution operation via the substitution box (Sbox).This operation contributes to the improvement of the encryption's confusion and diffusion properties, thereby enhancing its security.Within the optimization process, five rounds of SPECK encryption are executed within the inner iteration, in contrast to the conventional 32 rounds.The significant reduction in computational expense facilitates encryption to execute at an accelerated pace, with only a marginal compromise on security.Each iteration of the encryption process involves calling the SpeckRound function, which applies the speck basic round function to the cryptediv array in order to update it.In Algorithm 2, the details of the speck round function are illustrated.Furthermore, the algorithm applies a bitwise left rotation to the solitary byte of the substitution box (Sbox) by utilizing the least significant bits of the byte in cryptediv that corresponds to the rotator (Sbox).This operation provides diffusion further and strengthens the encryption's cryptographic integrity.
Notably, the proposed optimization makes use of the widely implemented substitution concept in block ciphers, which has been implemented to enhance performance while maintaining a solid security foundation.Nevertheless, a comprehensive security analysis and performance evaluation of the optimized SPECK algorithm that has been proposed is imperative to ascertain its robustness against prospective attacks and its appropriateness for particular use cases.Indeed, the RC4 stream algorithm (Algorithm 3) will be utilized to create the substitution Sbox.To create a single reliable Sbox, iterations are performed based on the previously generated DK.RC4 is employed because of its popularity and ease of implementation in both hardware and software.The dynamic Sbox is generated during the Key Setup Algorithm (KSA) step of RC4 setup.

THE PARALLEL EXECUTION OF THE OPTIMIZED SPECK OVER MULTICORE PROCESSOR
The parallel message-passing execution over a multicore processor for both the optimized and original Speck ciphers is demonstrated in this section.Algorithm 4 of the optimized speck has various input data, such as the key, the plain message (in), the substitution Sbox, the initial vector (IV), the block size, and the process ID to output the encrypted message.
The algorithm encrypts two blocks per iteration, and at the ends of the iteration, the Sbox is rotated.Algorithm 5 demonstrates the steps of the original speck algorithm over a multicore platform.Both algorithms are called using the main function that is presented in Algorithm 6 to perform MPI communication routines.The algorithm applies encryption using the Speck cipher in a parallelized manner using MPI (Message Passing Interface) through scatter and gather operations.The code employs non-blocking scatter and gathers operations (MPI_Iscatter and MPI_Igather), which overlap communication and computation.This allows the encryption process to proceed on each process while data is being exchanged, maximizing computational efficiency.
Moreover, the unique index of the block, taskid * blocksize, is used as a counter (CTR) that updates the first block of the initial vector.

ANALYSIS OF THE SECURITY RESULTS
A proposed encryption scheme's security and safety are evaluated using well established techniques including statistical, linear, differential, or brute force attacks [12,13].Here, we conduct rigorous tests to demonstrate the security of the proposed optimized Speck encryption.While the proposed encryption system may be used for any kind of data, the results for multimedia content are.

Statistical analysis tests
Randomness and uniformity are two features a cipher must have in order to be regarded as safe against statistical assaults [14,15].The following statistical integrity tests are carried out to evaluate the level of randomness: Analysis of the entropy of the data, the histogram of the plain and encrypted messages, the correlation between the original and encrypted messages, and the Probability Density Function (PDF).

Uniform analysis test
The most important test is the uniformity of the encrypted message's probability density function (PDF).The possibility of seeing any given symbol in the resulting ciphertext is about 1/n, where n is the total number of symbols.In Figure 3, the original PDF and the encrypted messages using both speck versions are shown.For all ciphertext symbols, the PDFs are close to 0.039 (1/256 = 3.9 × 10 −3 ), which is consistent with a uniform distribution.

Entropy analysis
The entropy of information in a message M, which stands for the measure of dispersion [14], may be described as the following when applied to a random variable: Entropy is measured in bits, and the probability of observing a given symbol is denoted by pr(mi), where mi is an integer between 1 and NS.If the ciphertext's entropy is the same as or near to log2(N S), then the ciphertext represents a truly random source with a normal distribution.In Figure 5, the comparison of the encrypted messages' entropy at the 16×16 (256-element) sub-matrix level using a random dynamic key is presented.The entropy of the resulting ciphertexts also matches the target value of 8 as shown by the results of the analysis.As a result, the proposed optimized speck cipher system is sufficiently safe against any given entropy attack.

Correlation analysis
To make sure that the proposed encryption system holds up in practice, it is crucial to get rid of any relationship between the sequence of the components [15,16].When the correlation coefficient is small (close to zero), it indicates that the cipher scheme is actually random.To perform the correlation test, we choose pairs of adjacent pixels from the original message and the encrypted version at random.Correlation may be performed in any of the three possible dimensions (horizontal, vertical, and diagonal).The following equation is used to determine the value of the correlation coefficient rxy: The correlation test between the original and encrypted messages of both speck versions is shown for one random key at a time and for a total of 1000 random keys in Figure 6.The results indicate that the correlation coefficient is extremely small, very near to 0, which substantiates the ciphertext's randomness and, by extension, its own independence.

Practrand randomness test
To ensure PRNGs are random, the PractRand testing suite uses statistical methods [17].As was previously said, the proposed optimized Speck cipher was tested with 64 seeds under Practrand, and it succeeded in every case.The created sequence is evaluated by PractRand, and a report is generated indicating whether or not the sequence passes the tests.The encrypted message was then subjected to PractRand, which is one of the most difficult statistical tests available.This verification ensures that the generated cipher of the optimized speck algorithm is sufficiently random and reliable.

Linear and differential analysis tests
When designing cryptographic algorithms, it is essential that they be resistant to linear and differential attacks.This is particularly true for symmetric key block ciphers [18].These assaults show two common ways that cryptanalysts attack the algorithms of cryptography and get private keys or plaintext.

The linear analysis test
In the study of Aly et al. [18], the idea of linear probability approximation (LPF) was initially proposed as a strategy for linear cryptanalysis of the DES block cipher, which is used in this test.The main idea is to find a linear connection or approximation that links certain parts of the plaintext with their corresponding ciphertext counterparts.Establishing a linear relationship between the plaintext and the ciphertext makes the key more sensitive to extraction.
We compute the linear probability value for both optimized and original speck ciphers for each discrete set of 16-byte plaintext and ciphertext blocks with 1000 iterations.See Figure 7 for the test results.For optimized speck ciphers, the mean of LPF is 0.517, whereas for original speck ciphers, it is 0.479.To compute the resistance to the linear attack, the study of Matsui [19] computed the absolute difference of the desired probability 0.5 with the computed linear probability P. If the resulted value is close to zero, then the cipher has more resistance to liner attacks.According to this proposition, the optimized speck gives a bias of 0.017, whereas the original speck bias is equal to 0.021.Thus, the optimized speck is more resistance to the linear attack compared to its original version.The differential attack test An efficient way to evaluate the safety of cryptographic algorithms is differential cryptanalysis, which involves looking at how small changes in the input data affect the output data [20].In this test, the correlation coefficient between two sets of encrypted texts that represent two plaintexts that varied by one bit every block is computed.To evaluate the resistance of differential attacks on cipher messages, the correlation coefficient is computed and must give a lower value.In this test, 1000 sub-matrices of size 16 bytes are randomly selected from two different ciphertexts.The average of the correlation coefficient is computed for all sub-metrics of the two ciphertexts.For the optimized speck cipher, the average correlation coefficient is equal to 0.0003 and is equal to -0.0018 for the original one.Accordingly, the optimized speck has more resistance to the differential attack than the original version.Figure 8 demonstrates the computed correlation coefficient of all selected cipher blocks.

PERFORMANCE RESULTS AND COMPARISON
This section presents the performance of the proposed optimized speck cipher that uses two scenarios.In the initial scenario, the optimized and original Speck ciphers are executed sequentially on a single CPU core.The execution of message passing algorithms for both ciphers on a multicore CPU constitutes the second scenario.This section presents a comparison of the speedup, encryption time, and throughput metrics for both instances of the Speck crypto algorithm.

Results of sequential execution
The calculated performance comparison of the optimized and original Speck ciphers was performed on one core of the Intel(R) i7-7700HQ processor.The CPU utilizes a frequency speed of 2.80 GHz for all its processors.The assessment results of the time and throughput of the singlethread optimized and original ciphers on the processor are as demonstrated in Table 1.
Upon analysing the data presented, it is evident that the optimized Speck algorithm features considerable improvements in the execution time and throughput compared to the initial Speck algorithm.The average execution time and throughput speedup ratio equal to about 2.58, which means that the Speck algorithm optimized performs cryptographic operations faster than the original one.In other words, the operations it performs cope with larger amounts of transmissions at a time, boosting the efficacy of data processing.

Parallel execution results over multicore processor
Experiment of this subsection was performed on multicore processor Intel i7-7700HQ and its purpose was to determine the efficiency of the optimized Speck algorithm compared to the initial one.Algorithms of this subsection were developed using message passing interface parallel primitives MPI.To show the improvement of Speck algorithm, four different combinations of threads 2, 4, 6, 8 are used for each of the executions on the multicore processor.Figure 9 shows all results of the execution time, throughput, and speedup comparing with the single core of Speck algorithms.The mean throughput of the optimized Speck algorithm is 3.83 gigabits per second.In order to determine the speedup ratio, the sequential execution time of the optimized Speck cipher is compared to the parallel execution time.The mean speedup ratio is 2.63 on average.The mean acceleration obtained by the two Speck parallel algorithms is 2.64.Table 2 shows the average throughput and execution time for all thread counts.The advantage of the optimized algorithm relative to the original algorithm is immediately obvious.At various message sizes and thread configurations, it completes cryptographic operations much faster.

CONCLUSION
This work presents a new version of the Speck cipher which also examines its multi-core processor effectiveness by exploiting parallel processing capabilities of the CPUs.The dynamic, Randomized S-box layer of optimized speck cipher enhances its security and enables reduction in rounds.Statistical tests, randomness and immunity to linear or differential attack for both the original as well as the optimized algorithm are compared and evaluated.Consequently, security obtained indicates that the optimized speck can retain the same level of safety and face attacks.It is indicated through experimental findings that the optimized Speck algorithm has superior performance in terms of execution time, throughput, and speedup compared with its original version.The suggested optimization method integrates substitution-based adjustments with a decreased round count in order to improve encryption efficiency while maintaining security levels at an acceptable level.By average sequential execution performance increase, it has been established that the optimized Speck algorithm was 2.58 times faster than its previous generation.When executing on multi-core processor, the new Speck cipher manages larger data more efficiently by running 2.64 times faster in comparison to the original speck.
In the future, we are going to acknowledge the significance of investigating key sizes over 128 bits, such as 192 and 256 bits.The objective is to examine the effects of varying key sizes on the optimized Speck algorithm.To accommodate larger key sizes, we may contemplate making modifications to security primitives or employing round configurations.Moreover, it is interesting to implement the proposed cipher over GPUs to show its performance and efficiency.

Figure 2 .
Figure 2. The proposed optimized speck encryption function

Figure 3 .Figure 4 .
Figure 3.The recurrence of the original message (a), cipher using optimized speck (b), and the cipher using original speck (c).The related PDF of the original message (d), the produced cipher of the optimized speck (e), and the original speck (f) 6.1.2Histogram analysis testIn this section, the optimized and original speck ciphers are evaluated and compared to histogram analysis tests.When the histogram of the encrypted image is normally distributed, can we say that this encryption successfully meets the uniformity criteria.This suggests that the occurrence of each symbol in the message is proportionate to the number of symbols in it.In other words, it should be somewhat close to the message size/character count.A histogram is shown in Figure4, contrasting the 512-by-512-pixel plain images with their cipher image equivalents.It is shown that the encrypted image of both speck versions has a histogram very close to the uniform distribution given by (512*512) /256 =1024).

Figure 5 .
Figure 5. Entropy analysis for the encrypted message using both optimized and original speck in (a) and (b) respectively

Figure 6 .
Figure 6.The PDF distribution of the correlation coefficient between the plain and encrypted message using (a) the optimized and (b) the original speck

Figure 7 .
Figure 7.The linear attack analysis for (a) the optimized and (b) the original speck 6.2.2The differential attack testAn efficient way to evaluate the safety of cryptographic algorithms is differential cryptanalysis, which involves looking at how small changes in the input data affect the output data[20].In this test, the correlation coefficient between two sets of encrypted texts that represent two plaintexts that varied by one bit every block is computed.To evaluate the resistance of differential attacks on cipher messages, the correlation coefficient is computed and must give a lower value.In this test, 1000 sub-matrices of size 16 bytes are randomly selected from two different ciphertexts.The average of the correlation coefficient is computed for all sub-metrics of the two ciphertexts.For the optimized speck cipher, the average correlation coefficient is equal to 0.0003 and is equal to -0.0018 for the original one.Accordingly, the optimized speck has more resistance to the differential attack than the original version.Figure8demonstrates the computed correlation coefficient of all selected cipher blocks.

Figure 8 .
Figure 8.The analysis of differential attack for (a) the optimized and (b) the original speck

Table 1 .
The sequential execution results comparison of optimized and original Speck

Table 2 .
Comparison of the parallel average results of all thread configurations for optimized and original speck