Randomized Information Hiding in RGB Images Using Genetic Algorithm and Huffman Coding

ABSTRACT


INTRODUCTION
The increase in the amount of data that is exchanged over the Internet has led to the importance of multimedia security greatly.Methods of protecting information varied between encryption and steganography.Encryption is the conversion of confidential data into an unreadable format [1] which is a sprinkling of letters that do not give any meaning only those who possess the encryption keys can decipher the text and return it to an understandable format [2].
In the field of information protection information steganography is one of the prominent addresses among researchers who have enriched this field with a lot of research [3].Information steganography is the science of delivering confidential [4] information to the concerned parties through public channels in a hidden manner that no one can notice or realize and this is achieved by concealing it with a digital medium such as sound, image, video and text [5,6].
Information hiding is the technique of protecting information from direct change or tampering with its content by unauthorized parties and delivering the information confidentially and securely to the requested party.The field of information hiding is considered important and vital due to its many applications in protecting copyrights, protecting personal information, credit cards, and exchanging confidential information [2].
The genetic algorithm is considered as one of the best optimization methods used by computer scientists and engineers.It relies on natural selection to solve practical problems [7].The genetic algorithm starts with a random initial population of individuals where each individual represent a solution to a problem and it updates the population iteratively a number of times by performing a set of operations selection, crossover and mutation [8].
Text compression is still one of the most important fields despite companies trying to introduce and produce many devices dedicated to [9] dealing with huge amounts of information because it has become impossible to deal with data without compression in light of the increasing use of Internet networks [10].Compression is a vital and important technology that cannot be [11] dispensed with as a result of the development in the age of technology and information in which we live.Its goal is to reduce the amount of data that we need to store or transmit over the Internet [12,13].There are two types of compression, the first type includes data loss when decompressing and is called lossy while the second type is called lossless because the data is retrieved after decompressing without loss any data such as Huffman coding and arithmetic coding [14,15].
One of the classic compression algorithms that do not lose information during its work was invented by the world davied Huffman and is widely used in many fields [16].The output of this method is codes of different lengths where the shortest code is allocated to the most frequent symbols and the longest code to less frequent so in this way there will be a reduction in size of the data to be compressed [17,18].
The challenges and limitations facing hiding systems center around capacity, secrecy, and efficiency.The challenge of security has been overcome using GA which is consideration one of the optimization methods.It is characterized by proposing a set of solutions improving these solutions and then choosing the best solution.This helps to strengthen efficiency and confidentiality.Increasing the capacity of the media cover used to embedding data and improving its quality by reducing the size of the data using one of the compression methods such as Huffman [14].

RELATED WORK
Abdulwahed [19] presented a proposed hiding system called key adaptive LSB (NSKA-LSB) which consists of four stages pixel identification, secret data collection, embedding, and finally retrieval.The methods used in this system are the integration of random functions with a chaotic map, PSNR= 72.44 .
In the year 2020 a hiding system was built by Al-Khateeb and Jader [20] dependent on encryption text with DNA to be hidden by hyper chaotic technology and then using this system to choose pixels from the cover image to hide an encrypted text inside it, PSNR=65.7599.The research suggests a way [21] to hide the data to protect and preserve it.The system starts by using RLE for compressing data to reduce its size and increase the efficiency of the system, then apply the LSB method to hide the data by adopting a random form by choosing pixels' sites, PSNR=57.16.The researchers in this work Barovih et al. [22] aim to combine the modified least significant bit method with the shape modification technique to determine the pixel of the cover image that will be used to hide the encrypted data inside it.The Message Length for the test is 96 and PSNR=54,88.
Sari et al. [23] proposed an AES algorithm to encrypt the data and DWT to hide it and noticed that the DWT method suffers from a lack of capacity so to overcome this problem they used Huffman to reduce the size of the data and thus increase the capacity, PSNR=46db.This method [24] combines techniques MSB with LSB based on color images.The researchers proposed a new method that used MSB as a check and then replace bits of LSB with secrets message, PSNR=72.023.Ali et al. [25] presented a data-hiding method based on the use of a Pseudo Random Number Generator (PRNG) to generate 7 locations randomly for each band of the color image.Make xor operation between the bit of randomly generated location with the data bits and the output is included in the 8 bit and the process is repeated for each band of the color cover image, PSNR=90 when embedding data" HelloWorld".The following Table 1 explains summary of related work.

PROPOSED SYSTEM
The problem that the proposed system addresses is to provide a system to protect important information by hiding it randomly in RGB cover images.Reducing the size of the confidential data by compressing it using Huffman encoding to converting it into a code of (0, 1) then embedding the compressed string by going through two stages: the stage of selecting row locations randomly based on using genetic algorithm and stage two represent generate columns through the use of seed in addition to the process of including data within the image based on the index of the site.The idea of adopting genetic algorithm for selecting rows came as a result of the need to select a number of rows randomly without the intervention of the system designer and the same time it is difficult for manipulators and hackers to know which rows were chosen for hiding.The columns were identified using the seed to increase randomness and thus increase data protection procetures.Figure 1 shows the details of block diagram for sender.The following steps represent the sender and recipient side: Sender Side "generation hiding sites and embedding process" 1. Read secret data and cover images.A genetic algorithm is characterized by generating several generations and each generation contains a set of solutions this solution is called a chromosome which is represented by a vector that contains a group of cells called genes and an initial population generated randomly.
Step one: Coefficients of Genetic Specify the parameters for the genetic that is a need in this step as follows: 1. Chromosome Length=No. of rows in RGB cover image (where each gene from chromosome allocate to each row in the image).Step two: Initial Population Randomly generate several chromosomes from (0 and 1) with a length equal to the number of rows for the cover image and then interpret this chromosome into numbers between (1 and the number of rows-2) where each gene represents a row of the cover rows except the last two rows.If the value of the gene is 0 the row number corresponding to it will be ignored and will not be included in the number of rows used for hiding while if its value is 1 we will take the row number corresponding to it to use for hiding as in the following example supposing the NO.rows of the cover image are 512: The chromosomes above=1,3,4,5,509,510 represent the rows that are used to hide data.
Step three: Evolution of chromosome A PSNR measure is used to evaluate the chromosomes (representing the rows selected from the cover image) in each generation where the higher chromosome is the most candidate for selection to be the number of rows used for hiding.
Step four: Generate the Number of Generations One of the advantages of the genetic algorithm is that it is not satisfied with one generation of solutions but rather continues for several generations by using its procedures to look for the best solutions.In this paper, the following genetic processes were used: a. Selection process based on binary set method.b.Perform crossover by using a uniform method.c.Using (2m) mutation.

Stage two: "Generate columns randomly based on seed":
Firstly, find the number of columns according to the equation: where, key1: represents the number of rows provided from genetic, codeword: represents text after compression, Columns: This represents the number of columns we need to hide in each row.
The seed is used with one of the simple shuffle functions to specify which columns will be selected in each row to hide the data.The following example shows how to generate columns(y) from the cover image.

Embedding
The distribution of confidential data on pixels selected randomly is done as follows: a.If the index (i, j) of the selected pixel from the cover RGB image is even then it is selected Rchanel.
b.If the index (i, j) of the selected pixel from the cover RGB image is odd then it is selected Gchanel.
c.If one of the indexes (i, j) is odd and the other is even from the cover RGB image then it will be chosen Bchanel.
The pixel index was used as an indicator for choosing the band to distract hackers and attackers attention which bands contain data hidden and added some of randomness at embedding.The index pixels are either )even or odd( or one of them is odd and the other is even also there are three bands(R,G,B) for cover image.Therefore, we took advantage of this thing by adopting the index as an indicator .
The final step is to store the information that needs to be exchanged with the recipient in last two rows.This information represents the rows key and the seed value with the required number of columns in addition to the length of the secret text.The rows key(0,1) occupies the penultimate row where the recipient takes what corresponds to 1 and discards what corresponds to 0. The rest of the information is convert to binary and stored in the last row.
Assuming the following locations are randomly generated [(4,8), (3,1), (2,2), (10,5)], the process of distributing data bits to bands will be shown in the following Table 2.The first step that the recipient takes when receiving the hiding image is to extract the important keys from last two rows.After that, the data bits are extracted collected as a vector and then decompressed to return the original data.Convert the binary string to decimals and then letters.The Figure 2 illustrates the block diagram of receiver side.

Information collection
The proposed system is tested using a set of secret texts with different sizes and standard cover images of dimensions (512*512) that obtain from the Internet as shown in Figure 3.

Quality measures
There are several ways to measure the efficiency of the proposed system such as Peak Signal-to-Noise Ratio (PSNR).It can be used to evaluate the level of distortion or noise introduced by the steganography technique so a higher PSNR means a more efficient method.The formula for calculating PSNR is as follows: where, MAX is the maximum possible pixel value of the image (typically 255 for 8-bit images).

Test proposed system
A set of experiments was built to check the proposed system through several texts and cover images of size 512*512.Table 3 and Table 4 present the results of the proposed system.The proposed system is characterized by following the method of randomness when choosing the locations of pixels in the hiding proses.This gives it strength in terms of even if the stego image falls into the hands of thieves and hackers the data inside it remains under protection because hackers still need to know the hidden locations in which the data is carried.The data is also present in one of the bands according to the index of pixels as previously described.
The size of the data plays an important and influential role in the embedding process because the size of the data if it is small will take less space from cover media and therefore the steganography system is strong and difficult for the human eye to notice a difference in the stego image.In this proposed system Huffman method was used to compress the data and the results of the difference in size are shown in Figure 4 whereas Figure 5 shows the histogram for the cover image before and after embedding 1000 bits.Table 5 displays the values of the best row chosen by genetic algorithm and the columns using the seed when hiding data in Lena cover image and for two cycles of execution where the first cycle hides data of size (408 bits) and the second cycle (3608 bits).
A sample of the data (408 bits) was taken and hidden inside Lena cover image.Making a change in the value of GA algorithm parameters for each execution as noted in Table 6 in order to study the effect of this change.

CONCLUSIONS
It is indispensable to exchange information between people via the Internet in various fields so there must be a way to protect information from theft and manipulation by hiding it in a digital medium.The proposed system includes hiding data after compressing it by the Huffman method to reduce the size of the data and thus increase the efficiency of the system.The compressed data is hidden randomly by selecting the pixel locations in two stages: the stage of selecting the row by adopting a genetic algorithm and the stage of selecting the column through the seed and one of the shuffle functions.The proposed system will have an impact on every field in the real world that specializes or is interested in encryption, information security and data transfer via media cover.Proposed future work is to increase the protection protocol by adding a method of encrypting the data before including it in the cover image.It is also possible to scatter the data using one of the chaotic map methods.The system achieved good results by conducting several experiments.

2 .
Compress data based on Huffman encoding to convert the text into a codeword (0, 1).3.Hiding process: This process includes the stage of generating hiding sites and the stage of embedding the data in the cover image .

Figure 1 .
Figure 1.The block diagram of the sender side

2 .
NO.Of chromosome in each generation=50.3. NO.Of generations=30.4.Suggested crossover probability=0.8. 5. Suggested mutation probability=0.2Determine the values of the above parameters left to the designer of algorithm through the experiment except chromosome length is fixed which is equal to the NO. of rows of the cover image.

Figure 2 .
Figure 2. The block diagram of the receiver side

Figure 3 .
Figure 3. Cover images for the proposed system

Figure 4 .Figure 5 .
Figure 4.Text size before and after compression

Table 3 .
Experiment (1) for the proposed system

Table 4 .
Experiment (2) for the proposed system

Table 5 .
Examples of better row and column

Table 6 .
Results for vary GA parameters