Enhanced Malware Detection for Mobile Operating Systems Using Machine Learning and Dynamic Analysis

ABSTRACT


INTRODUCTION
By the end of the decade, an array of technology-enabled cognition support tools emerged, transforming both manually based intelligence dispatching activities and routine workflows in several industries, organizations and government agencies [1].This shift has been facilitated by the innovations that have had a far-reaching favorable impact on the acceptance of mobile technology at the global level.Based on UN report, it is expected that the percentage of people who carry the smartphones will reach 82% in 2022 [2].
Despite the fact that mobile technology has enhanced digital solutions for many workloads, it has also made consumers' data more vulnerable.Due to the lack of oversight in the Google Play Store, app developers are able to post Android programs with little to no filtering, increasing the likelihood of harmful apps being posted and endangering users' personal information and data [3].Economic losses are another consequence of malware assaults on gadgets.The many hazards associated with mobile technology are not going unnoticed.Android is both the most popular and most susceptible mobile operating system [4].Figure 1 shows that up to July 2023, Android OS accounted for more than 71.9% of the worldwide market in the mobile industry.Next on the list of mobile operating systems is iOS, which has around 27.3% of the worldwide market share as shown in Figure 2.
Android OS smartphones are widely used, making them a potential target for malware attacks.Another reason Android OS is vulnerable is because it is open-source [5,6].In 2022, 196,476 banking Trojans and 10,543 ransomware Trojans were anticipated to have been identified, according to the Kaspersky research [7].Trojan mobile apps infiltrate the operating system by masquerading as genuine programs, while in reality, they are counterfeit.Banking Trojans allow customers to reveal their account information using phony banking applications, which is a problem since most consumers now utilize mobile Internet banking.In addition, users' health information and sensitive personal details are also disclosed.Numerous additional strategies, including blockchain technology and edge computing approaches, are now being used to safeguard such data [8].Professionals in malware have compromised mobile devices and turned them into bots.Distributed denial-of-service (DDoS) assaults and spam emails with harmful links are both sent by these bots.
The development of this malicious software employs sophisticated techniques that make these assaults unavoidable [9].These botnets seriously jeopardize the security of Android OS.Unfortunately, the Security Institute [10] reported that Android packages are a significant vector for malware attacks.
Therefore, signature-based methods of detecting malware and malicious installation packages utilizing attribute information may be efficiently used to improve Android mobile security.The malware industry is likely teeming with specialists who are always thinking of new ways to conceal their assaults and steal sensitive user data.Neutralizing such novel and intricate approaches is, however, getting more difficult [11].There are a lot of methods that have been developed to identify malicious activity in Android applications.These methods detect malware assaults and use feature interaction to show which features are important [12].For malware attack detection, the Owl binary optimization algorithm is used to select features from the Drebin dataset [13].Using Android malware detection greatly increases security against any potential attack [14].Furthermore, in recent years, many deep learning-based methods have been proposed to detect malware.For example, S.C. Tan et al. used back propagation (BP) and particle swarm optimization (PSO) to find the best ensemble classifier for deep learning.The ultimate objective of employing deep learning is to choose the most ideal attributes to improve the accuracy of malware detection systems, while simultaneously aiming to minimize the computational costs of computers.Detecting malware on mobile phones with high accuracy and ease of use is the goal of this research paper, which combines parallel machine learning classifiers and supervised algorithms.Also incorporated into the framework was optimal feature selection.
Modern machine learning networks use correlation scores to choose features, so you do not have to do any feature computations by hand before putting a deep learning classifier to work.
The contribution of the paper is evident in the following aspects: A feature selection mechanism based on the correlation score is embedded in the machine learning network instead of performing a manual calculation of features before applying a deep learning classifier, which contributed to reducing the computational burden.
Without incurring an additional processing time requirement, the accuracy of malware detection was enhanced.Offers a cost-effective approach to detecting malicious or altered programs in mobile device operating systems.provides an alternative to malware detection of smartphone operating systems for malicious or recompiled applications at very low costs.

LITERATURE REVIEW
Researchers in malware detection analysis techniques are still constrained between two practical approaches.Strategies based on Android malware detection analysis are usually either static, dynamic, or hybrid.In this section, we review some analysis methods and briefly summarize their used properties.
Alabrah [13] presented a cutting-edge automated technique for detecting Android malware, based on artificial neural networks (ANN).To test this innovative method, two wellknown datasets were utilized: CICInvestAndMal2019 and Drebin/AMD.These datasets underwent preprocessing to convert their static features into binary values, indicating the status of certain app permissions (enabled or disabled).The modified feature sets were fed into the ANN classifier for two crucial experiments.In the first experiment, a basic input layer was used alongside a five-fold cross-validation approach.For the second experiment, a novel feature selection layer was introduced in the ANN classifier, focusing on features correlated with benign or malware apps.The outcomes of Alabrah's ANN-based method were not only substantial but also showed enhancements in performance and resilience.
Tarwireyi et al. [14] introduced BarkDroid, a novel Android malware detection technique that uses the low-level Bark Frequency Cepstral Coefficients audio features to detect malware.The initial results obtained show that Bark Frequency Cepstral Coefficients have high discriminative capabilities to achieve accurate predictions.
In the study of Fan et al. [15], a method called free graphing was studied, in which sub-frequent graphs represent typical patterns from malicious systems that merge with the same package.They're also a template for FalDroid, which is a (free) chart-based detection system.Studies across multiple trials have shown that FalDroid can classify up to 96.3% of malicious system samples into their own divisions in about 6.2 seconds per app.
Fatima et al. [16] presented another model that works on a server-hosting basis to detect malicious systems.Through this approach, material costs can be reduced and resource constraints of more than 98% can be achieved, but the model needs high server-level specifications and features for immediate response time.In addition, this model did not discuss the information security involved in the process.
Cai and Jenkins [17] proposed a unique Android malware detection approach that, once tested on different categories of data, can effectively continue to detect new malware without retesting.Droid-evolver is a fully automated (without human intervention) system for detecting malicious apps for smartphone operating systems, automatically updating itself.
Fang et al. [18] used the feature fusion method and directly call the library function to extract the permissions and API features of the APK file, then decompile the APK file to obtain the opcode features and merge the three features with multiple features to generate a feature vector.Finally, it uses a multimodel neural network HYDRA to learn fusion feature vector, so that it can identify and detect malware.The work also compared it with other single-feature machine learning algorithms to verify its effect.Experimental results show that the accuracy of the multi-model neural network detection method based on feature fusion reaches 98.92%, which is better than other single-model feature methods.
The composite method has been discussed in the study of Surendran et al. [19].Through a script to detect malicious systems using the Bayesian Tree (TAN) model, which is based on dynamic and static features such as permissions and system calls, it detects the harmful model by combining the results of these two features taken from the segmentations.Moreover, the text shows 95%, but it does not show the smartphone OS version during the dynamic analysis.However, although the hybrid analysis method has been shown to be more complex and successful in the case of dynamic and static analysis, in the end, feature selection remains the key to the detection ratio.
The method used in the study of Al Ali et al. [20] was used to reach a detection ratio of 96.The compared the characteristics of dynamic analysis using integration, structure dimensions, and connectivity between components.They concluded that the specifications extracted using the hull dimensions were more significant than the other two.

PRINCIPAL COMPONENT ANALYSIS (PCA)
Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables.Its goal is to extract the important information from the statistical data to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity between the observations and of the variables as points in spot maps [21,22].
One of the most important features of PCA are: Principal Components Analysis (PCA) aims to maximize the variance in the data by creating new axes called principal components.By selecting dimensions that capture most of the data's variance, PCA retains important information while reducing dimensionality, which is crucial in malware detection to maintain feature distinctiveness for accurate classification.PCA ensures that the new axes are orthogonal, meaning each component captures a unique aspect of the data's variability, resulting in a more concise representation of the original features.Unlike other dimensionality reduction methods, PCA offers consistency and minimizes redundancy or information loss.While PCA assumes linear correlations among variables, which may not always hold true, it is generally effective in capturing the underlying data structure without significant loss.Its computational efficiency and ease of use make PCA a preferred choice for handling large datasets in malware detection studies compared to methods like t-SNE or Isomap.
PCA delivers a coherent interpretation of the condensed feature space via principal components, which represent linear combinations of the original features.This interpretive capability aids in analyzing feature significance and enhances comprehension of the intrinsic data structure.

DYNAMIC AND STATIC ANALYSES
To identify malware, almost absolute majority of static statistical methods and dynamical approaches are applied [23,24].Two detection approaches are available to the user with static analysis: heuristic analysis, and signature-based detection technique There are two different techniques in antivirus software, which are very arguable among programmers.As the signatures can only look for the patterns of the know malware, they are no longer the ultimate way to achieve the entire security.What is a contrast between the both scenarios is in place.In the first scanner, it identifies risks according to its specific purpose which is to spy malicious files that are programs and deliver warnings when they are noticed.
Through the study of code's traits and/or the way how the form behaves. Users may consider code analysis as an applicable option.Code structure examination involves finding malicious code patterns by looking at the syntax of the code as well as picking out how it is arranged.Alternatively, string analysis entails activities such source code inspection even for signs of malicious intent such as IP addresses, encryption keys, or hardcoded URLs.During data analysis, properties of files, like the size, creation date, and digital signature have to be observed to find alterations that are unlawful or might look like certain type of damage.An additional method, that completes the analysis of the execution code can find malicious and inappropriate behavior, like including concealed functions and code obfuscation.
While sandboxing analysis means executing the binary in a simulated environment for the purpose of seeing the trait behaviors, code interactions with the system, and detection of any network traffic that's sniffy or suspicious activity.
The most prevalent dynamic analysis technique, similar to static analysis, encompasses [25,26]: Runtime behavior appraisal is a process of checking the activity of code or files to detect code(s) or files(s) that have questionable or true malicious behaviors.
This includes, for instance, the unlawful revamping of the system, the modifially of the file system or the monutring of the network.API monitoring is a method of observation that focuses on how the codes call the "Application Programming Interfaces" (APIs) in order to detect any suspicious or severe calls that are likely to be attempts to break in and cause damage.Network traffic analysis is equivalent to taking a document and reviewing the network traffic that may be related to the operation of the code or the file itself.The targeted objective is to trace and flag any transaction with confirmed fraudulent websites, abnormal data transportation, or abnormally high network activity.In contrast to static code analysis modality which deals with looking closely at the written code for any statistic malicious operations, dynamic code analysis extends the objective to include thorough processes to point out suspicious and hazardous experiences while the code is running.System call monitoring implies monitoring system calls of applications, and files made to the operating system in order to stipulate malicious actions followed by the abnormal or illegal system call.Sandboxing is a way that programs or files are run in a virtual environment (sandbox) to observe its reactions and maintain it separated from other entities, thereby spoiling any possible harm in the main system.
The emulator and virtual machines reproduce runtime environment of the target system in which the code or file is executed.By this approach the analysts can see the behavior of the code and its interactions with other applications / components without affecting the host system.Implementing these tools along allows to identify and categorize malicious and plays an important role in the security of systems and networks.

PROBLEM FORMULATION
In the protection phase, developers provide a trained model for users to detect malware where the software is able to independently reach a decision based on system predictions as shown in Figure 3. Errors can lead to great risks for the usersuch as removing the phone's operating system.It is necessary for the developer to choose a model family correctly.The developer should use a robust affirmative training procedure to achieve the ideal model with a high detection rate and effective positive rate.
User machines that apply machine learning models make decisions on their own.The quality of the machine learning model affects the functioning of the user's system.For this reason, machine learning-based malware detection has specific characteristics.Sample selection is based on the Zone-Alarm suite of applications (for security applications).Originally, a batch of 270 good apps and 270 others with malicious behavior was made to try to cover up a certain randomness.
The collection of cute apps has been chosen to try to be diverse and reflect the different types of apps on the Play Store app.It is also proportional to the number of samples present in each type of application [28].The following aspects were taken into account when collecting mock samples: a.
Similar and different sized apps with the same name and malware variants.b.
Different classifications according to the behavior of those with the greatest impact: SMS Trojans, banks, root extortionists, extortionists and criminals, adware and malicious tools.
C. Different transactions within one malware package and more than one package within a classification by pattern.
For the analysis, 6227 samples were selected from the same repository, of which 4105 were infected and 2122 benign.In addition, for the set of samples with detrimental behavior, at least one variant from each bundle was detected in the system.Drebin contains 7,220 samples of infected software owned by 319 malware packages.For the detection of recombinant infected programs, 1912 samples of infected programs were selected from the top 4 packages with the number of samples in each package (Table 1) [29].In addition, the specific software has been changed to have multiple features such as permissions and package names.Surprisingly, many duplicates were found among the package names of applications after analysis.It was concluded that about 68.19% of the applications in the dataset share a number of repeated package names, and therefore, the applications that share the same package names were sorted.
The orderly compilation of smartphone operating system applications can make a positive or negative change in the application signature.Because of this, all applications that share the same package names continue to have different hash values, and therefore, it was necessary to create a more robust signature technology.The primary goal in this part of the study is to update an efficient signature mechanism so that about 95% of the samples with package names have identical signatures.Then the hash of the class.dexfile is developed for all the open-source code obtained from the application, instead of using the hash value calculation method [30].
A detailed report of applications shared in family names shows that 90% of them use the same source code with minor changes.Hash algorithms, such as SHA-115 and MD-516, load from a file of random size and a fixed-length cryptographic hash as a result.Computing a SHA-1 or MD-5 hash for two identical files will most of the time yield the same result.Antivirus software stores up-to-date databases of MD-5 and SHA-1 hashes of malware.In addition, a small modification to the infected system causes a very large change in the SHA-1 or MD-5 hashing process.Therefore, a new, more efficient hashing technique called SSDeep hash was used, instead of calculating the source codes of applications that share the same package names by SHA-1 or MD-5 algorithms.SSDeep is based on context driven segmentation (CTPH) technology known as fuzzy segmentation.CTPH is a new technology that improves the effectiveness of similar file detection.Because of the fuzzy hashes of two highly identical files, i.e. the original file and a file with some minor changes, SSDeep hashes can give the degree of similarity between two hashes.If there are any minor degree changes in the cloned software and malware, a similarity score can be obtained by comparing it to the malware which is the ability to compare the similarity between two algorithms [31].Algorithm 1: Detect malware repackaged using Fuzzy hash Input: FH = {h1, h2, h3…….hn} and APK Output: Similarity-Score 1: hash SSDeepHash (APK) 2: for all i € FH do 3: Similarity SSDeepSim (i, hash) 4: if Similarity > threshold then 5: Return Similarity 6: end if 7: end for 8: Return 0 Algorithm 1 introduces a new approach based on fuzzy hashing to detect repackaged malware.We assume that F is the set of the top 4 bundles of the Drebin data set.We do the reverse process of designing all applications in F to choose a bunch of distinct package names such as DPN = {Pn1, Pn2, Pn3, Pnn}.In addition, one application model is randomized for each package name in the DPN, after that, we do a mathematical operation to calculate the fuzzy hash using SSDeep and put it into the FH matrix.In the end, a fuzzy hash package of FH and an APK of F are obtained as input in Algorithm 1, while the similarity ratio is chosen as the final result.
First: Get the fuzzy hash value of the parent code of the given APK (step 1).
Second: We compare the algorithm of the APK file with the whole algorithm in FH with the help of SSDeep hash comparison tool (step 3).
Third: If there is a similarity ratio greater than the threshold value at any point, the APK file will be marked as a recompiled infected program, and the similarity score will be returned again. .
Fourth: The value of zero is returned to the algorithm if there is no similarity ratio higher than the minimum hash in FH.A similarity ratio of 85% was set for the standard cut-off for the trials [32].
As mentioned earlier, the technologies for detecting malicious APK files are divided into dynamic and static features.Dynamic analysis works with the pattern of the running time of the programs at the time of their execution compared to several specific experiments.Although the hard side of the analysis is done at a non-running stage (as opposed to) in terms of verifying the source code, analyzing metadata and additional data about vulnerabilities.Dynamic analysis is an accurate detection method because it involves detailed analysis of applications, so it requires a high amount of money.After executing the APK files, the analysis is performed [33,34].
Static analysis consists of a very large set of techniques and methods that aim to learn about the patterns and behavior of the system runtime before implementing it.The main goal of increasing security is to separate applications that will be recompiled from malware before execution and installation processes [35].

METHODOLOGY
In this section, we discuss our approach to developing a malware detection model based on the analysis of effective and early system calls coupled with evaluation by an application.
The model proposed in the study of Zhu et al. [33].Detection percentage with reduced passes based on enhancing validity features.As we conclude from the Per-DRaML detection system based on the proposed scheme using permissions from the applications themselves and their applications, Per-DRaML targets a set of specific permissions enhanced in improving the percentage of detection of dangerous programs, rather than analyzing all required permissions.Random Forest algorithms, Support Vector Machine (SVM) and Rotation Forest classifiers were used for classification.Based on the perceived effect on the detection effectiveness of systems and malware, we will select a set of powers.We will discuss some important issues in this paper: 1. Packets of benign and malignant specimens.2. Build/define the feature set.3. Key Features (dataset) Inference, Filter and Finalize.4. Classification of Android malware using moderated eLearning algorithms.

Packets of benign and malignant specimens
A set of Android applications has been selected from two different groups of android families, benign and malicious.Virus-Share (about 7,000) malicious apps have been aggregated into an Android malware database (http://virusshare.com/,December 25, 2022).Virus-Share's database identifies application packages from different malware packages at different dates and is available to all as archived and compressed files.These files can be obtained using any torrent's user.A bunch of benign apps (about 7000) were also selected from the official app site (Google Play and Apple Store) using the Python language implementation.Innocent APKs are selected from different Play Store app ratings to increase diversity in the dataset.The total APK files are 14,000 samples, each classification has 7,000 samples.Training data is used to evaluate the effectiveness of the current model, while samples are used to perform validations.

Build/define the feature set
In the first stage, classifier schemas are built and classified in the selection of key permissions based on the data set.The permissions and features required by the app are obtained in the form of an app package such as: APK and Manifest.xmlfiles.To obtain the required validity, the Andro-guard algorithm is adopted to unpack 14,000 application samples for the required data bundle.Different classifications of permissions used to create the feature set package, such as small application sizes and permission ratio, have been selected to perform consistent analysis and understanding of the style of each application from the selected packages [34].

Key Features (dataset) inference, filter and finalize
This pane shows the most important permissions that can be used to separate apps from benign and malicious apps.Google Systems and Zhu et al. [33] were able to extract the list of dangerous permissions.To identify the main permissions important for malware detection, several permissions, as shown in Table 2, were presented as illustrative samples.It is noted from the table that Zhu et al.Google permissions are integrated to be evaluated while using an exclusive feature called Permission Ratio [35,36].
Figure 4 shows the proposed Per-DRaML model, which demonstrates filtering of APKs parameter specification for a dataset and packages, de-compilation and refactoring.

Enhanced permissions package
First the permissions that have a weak impact on detection are obtained to determine the minimum value of the number of permissions required.To this end, we used a dataset from Google's permissions list (from Table 2) documenting the functionality and importance of the feature.Feature significance is the metric that leads to the creation of simpler, more efficient prediction recipes using less data.When the feature significance of the Random Forest model is used, some significant powers are shown (Table 2).We set a threshold standard of 0.7 to choose the feature that has the most impact by avoiding permissions that have a significance of less than 0.7.Where the most important validity models were identified, based on a set of different feature set samples.As shown in Table 3.   Permission packets are converted into a binary dataset so that '1' is the program granting validity, and '0' denotes no validity.Permission models selected from a few benign and malicious applications, represented binary, are combined to design a single comprehensive dataset for analysis.

6.4
Classification of android malware using moderated machine learning algorithms Supervised machine learning assessments were used in this part, which can detect dangerous programs with the least amount of positive error value.The general plan of the current model is divided into two categories; The first consists of a standard in which supervised trainees are trained and validated using datasets with different machine learning algorithms, and the second category is validity feature inference.As mentioned earlier, the data set used consists of 14,000 samples consisting of 7,000 samples of each type.A similar training method and testing algorithm was applied in experiments as Zhu et al. [33,35].

PERFORMANCE EVALUATION
Standard evaluation criteria are described: accuracy, sensitivity and Receiver Operating Characteristic (ROC) curve.Where we review later the formulas and their definitions [36].
Confusion matrix consisting of four criteria are: true positive (TP), true negative (TN), false positive (FP), and false negative (FN).
To analyze the results of the model framework used, we used the following criteria: 1) Accuracy: This is the percentage of correctly selected APKs.

Recall = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁
(3) 4) F scale: is the weighted harmonic average of test accuracy and recall.At value 1 it will be positive and at value 0 it will be negative.
We trained our model using 4 phases, and reported the results for each phase as shown in Table 4.
The standard of performance of malware detection systems can be increased either by improving certain powers and features or by improving data collection.Where the high performance is to choose important permissions of the proposed method, they are influential figures considered in the literature.The application needs to obtain the permission of the user to perform the necessary activities.The proposed model aims to evaluate performance in the careful selection of data set samples.Additionally, this model was trained and validated on a large dataset, around 14,000 APK files were used as samples, obtained from places as diverse as virus sharing and application web sites.It has recently been observed that the current model achieves a similar detection ratio in the Rotation Forest and SVM algorithms using the given current powers, when compared to the results obtained from standard methods as shown in Table 3.To achieve high detection accuracy can Classifiers help reduce the number of batches being reserved, furthermore, reduce computational overhead, and can become a cost-effective solution for malware detection [33].The Figure 5 shows that 90% of the malware samples were effectively identified as malware (6,132/ 6,648) in the fourth type (9≤score≤10), and the malicious type almost reached 1% with the use of 52 samples belongs to this type.Regarding the latter two types, only 1% of the samples were rated as almost reliable (52/6075) and another 1% as almost reliable (67/630).582 files, i.e. 7% of the data packets (391/5,560) were not parsed.1210 files of type IV (malware type, 9≤score≤10) were given a score equal to 10, which is the highest end of the malware classification according to the Andrubis study [32].

CONCLUSIONS
While mobile malware continues to pose a persistent threat to Android users, the increasing integration of smartphones into our daily lives underscores the critical need for robust security measures.Therefore, the development of novel and effective malware detection technologies should be prioritized.
In this study, we evaluated various metrics and criteria, including malware detection rates, resource utilization, machine learning schemes, and extracted models for analysis, to assess the efficacy of malware detection technologies.We compared and analyzed techniques and models from previous research, considering factors such as unknown malware detection, which was not part of the training set.
Our approach involved a multilevel model, where we initially identified and inferred significant features from a dataset comprising 14,000 application samples.We utilized various machine learning frameworks to classify applications as benign or harmful.Through a series of experiments, our proposed model demonstrated significant enhancements in predictive features and the identification of harmful applications.
Moreover, our model offers a cost-effective alternative for detecting malware in smartphone operating systems, particularly malicious or recompiled applications.However, it is essential to acknowledge the limitations of our research, such as the need for further investigation into addressing unknown malware detection and refining the feature selection process.

Figure 4 .
Figure 4. Diagram of specification generation, dataset generation, and filtering for malicious and benign APKs

Figure 2
Figure2shows the percentage of malware data packets that fall within the classifications shown.

Figure 5 .
Figure 5. Percentage ratings of malware samples

Table 1 .
Malware samples in the Drebin dataset from the top 4 packages

Table 2 .
Permissions risky feature and its significance by Google -G and Zhu et al. -R with exceptional standards

Table 3 .
Proposed models of the features of the specified parameters

Table 4 .
The results of the training phase over several time stages