Найдено 187
A Static Analysis Framework for Investigating Tainted Data Sources in Software Systems
Dai P., Ma X., Peng Z., Zhao C., Zhang Q.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2025, цитирований: 0, doi.org, Abstract
One of the most effective methods for detecting software security vulnerabilities is taint analysis. Some software defects originate from certain external input data. Analyzing the taint sources and the data flow propagation from these sources to defect points through static analysis can help us understand the causes of software defects and reduce the difficulty of debugging them. This paper combines intraprocedural and interprocedural analysis methods to obtain global taint source information. A novel propagation path calculation algorithm is proposed, incorporating predecessor node computation and alias analysis, effectively reducing the negative impact of irrelevant code on the performance of taint analysis. This method not only helps detect errors that lead to vulnerabilities but also analyzes the impact of vulnerable input data on the system. Based on the global taint source analysis algorithm, we developed a static taint source analysis prototype tool for C programs, called AWsTS. Experiments conducted on five open-source projects show that AWsTS improves the accuracy of analysis results without increasing the required analysis time. The average precision for intra-procedural taint source analysis is 93.4%, and the average recall is 90.2%. Similarly, for interprocedural taint source analysis, the average precision is 87.6%, and the average recall is 84.9%. Additionally, AWsTS can output taint propagation paths, providing valuable support for further taint analysis.
OSCSE: A Practical Security Assessment Model for General Open Source Components
Wang Z., Huang C., You Y.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2025, цитирований: 0, doi.org, Abstract
Open source components (OSCs) have become a vital part for developing modern applications. The security of these components could affect the overall security of the software depends on them. Thus, the security of an OSC should be evaluated first before integrating to the software. However, the existing models lack generality, and cannot be easily automatic applied to OSCs developed in different programming language. To this end, we propose a security assessment model for OSCs, called the CRAM, which features generality and automation. The proposed model is constructed under the hypothesis that OSC with a larger and more active community is more likely to disclose more vulnerabilities. And it evaluates the security of OSC from its performance in size as well as activities of open source community and vulnerability disclosures. In the experiment section, we present validation and application experiments. In the validation experiment, we find that the basic hypothesis of the proposed model is valid, and there is a positive correlation between the community size as well as activities and vulnerability risk of OSCs. In the application experiment, we further evaluate our approach with large-scale open source components. Our hypothesis is further validated. The most of OSCs in the ecosystem are in line with the hypothesis. Finally, we successfully build the security baseline according to the hypothesis, and 5 vulnerable OSCs classified as vulnerable by our model are analyzed. The result proves the effectiveness of our model to identify a vulnerable open source ecosystem around the ecosystem.
SPERT: Reinforcement Learning-Enhanced Transformer Model for Agile Story Point Estimation
Younas W., Chen R., Zhao J., Iqbal T., Sharaf M., Imran A.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2025, цитирований: 0, doi.org, Abstract
Story point estimation is a key practice in Agile project management that assigns effort values to user stories, helping teams manage workloads effectively. Inaccurate story point estimation can lead to project delays, resource misallocation and budget overruns. This study introduces Story Point Estimation using Reinforced Transformers (SPERT), a novel model that integrates transformer-based embeddings with reinforcement learning (RL) to improve the accuracy of story point estimation. SPERT utilizes Bidirectional Encoder Representations from Transformers (BERT) embeddings, which capture the deep semantic relationships within user stories, while the RL component refines predictions dynamically based on project feedback. We evaluate SPERT across multiple Agile projects and benchmark its performance against state-of-the-art models, including SBERT-XG, LHC-SE, Deep-SE and TF-IDF-SE. Results demonstrate that SPERT outperforms these models in terms of Mean Absolute Error (MAE), Median Absolute Error (MdAE) and Standardized Accuracy (SA). Statistical analysis using Wilcoxon tests and A12 effect size confirms the significance of SPERT’s performance, highlighting its ability to generalize across diverse projects and improve estimation accuracy in Agile environments.
HCIA: Hierarchical Change Impact Analysis Based on Hierarchy Program Slices
Chang J., Wang L., Zhang Z.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2025, цитирований: 0, doi.org, Abstract
Change impact analysis (CIA) is an essential method in software maintenance and evolution. Its accuracy and usability play a crucial role in its application. However, most CIAs are coarse-grained and limited to class and method levels. Despite the fine-grained CIAs’ success in giving the statement-level impact set, they are still limited without the sub-statement level dependency analysis, leading to low precision. Additionally, their unstructured impact sets make it challenging for users to comprehend the impact content. This paper proposes Hierarchical Change Impact Analysis (HCIA), a Hierarchical CIA technique based on the sub-statement level dependence graph. HCIA can perform a forward hierarchy program slicing on the change set from five levels: sub-statement, statement, method, class, and package. Based on the program slices, HCIA calculates the impact factor of the impact sets at the five levels to generate the final impact set. In the experiment, we evaluate the relationship between the impact factor and the actual affected codes and assess the most appropriate size of HCIA impact sets. Furthermore, we evaluate HCIA on 10 open-source projects by comparing our approach with popular CIAs at the five levels. The experimental result shows that HCIA is more accurate than the popular CIAs.
GMRepair: Graph Mining Template-based Automated Software Repair
Cao H., Guo Y., Wang Y., Tian F., Wang Z., Chu Y., Deng M., Wang P., He Z., Wei S.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2025, цитирований: 0, doi.org, Abstract
With the increasing scale and complexity of software recently, automated software bug repair has grown in importance. However, the current automated software bug repair process suffers from issues such as coarse-grained repair granularity and poor patch quality. To address these problems, we propose a graph mining template-based automatic software repair (GMRepair) to improve the performance of automated software bug repair. First, this approach adopts the Ochiai fault localization technique to locate and generate a list of suspicious defect statements. We utilize the GumTree tool to parse the bug and repair program files, generating edit scripts. These edit scripts are then transformed into a graphical representation. Second, we utilize a frequent graph miner to obtain graph mining templates by matching the context of the suspicious statements with the context of the graph mining templates, generating an initial population for them. The buggy program is evolved using genetic programming through mutation and crossover operations, generating new individuals. Finally, we sequentially pass the candidate patches (CPs) through corresponding test cases and prioritize the test cases using priority sorting techniques. Patches that fail to pass the test cases are filtered out, and the patches that pass the test cases are output. We conducted the experiments using two datasets, QuixBugs and Defects4J. In Defects4J, the GMRepair successfully repaired 41 defects, while in QuixBugs, it successfully repaired 15 defects. Compared to the existing methods, GMRepair offers a higher success rate and efficiency in defect repair.
A Method to Evaluate the Credibility of Domain Knowledge Network using Validated Expert Knowledges
Li Y., Zhou Y., Li B.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2025, цитирований: 0, doi.org, Abstract
We are living in an era of knowledge explosion, where all kinds of knowledge are emerging and becoming more and more complicated with the development of new techniques and new ideas. When we study knowledge and apply them to understand and solve problems, the credibility of knowledge is becoming our main concerns. Usually, high credible domain knowledge can guide us correctly understand all concepts and the relationships between them in this domain. Due to its good layer structure and scalability, domain knowledge network is widely used to represent knowledge in knowledge engineering, artificial intelligence and others in recent years. How to ensure the credibility of domain knowledge network? This is an important and interesting topic. In this paper, we propose a method to evaluate the knowledge credibility for domain knowledge network, which means that we can start from the layer structure of domain knowledge network, and evaluate the credibility of knowledge layer by layer using validated expert knowledge such as domain dictionary, domain ontology and domain expert experience. We conduct experiments with six domain knowledge network constructed based on network data and six domain knowledge network constructed manually based on published books or domain dictionaries, which describe the same domain knowledge in pairs. Experimental results show that the knowledge credibility of domain knowledge network constructed from validated expert knowledge is significantly higher than the knowledge credibility of domain knowledge network constructed directly from network data, which satisfy our expectation and also prove the effectiveness of our credibility evaluation method.
Mining Fine-grained Code Change Patterns Using Multiple Feature Analysis
Liu D., Feng Y.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Maintaining high code quality is a crucial concern in software development. Existing studies demonstrated that developers frequently face recurrent bugs and adopt similar fix measures, known as code change patterns. As an essential static analysis technique, code pattern mining supports various tasks, including code refactoring, automated program repair, and defect prediction, thus significantly improving software development processes. A prevalent approach to identifying code patterns involves translating code changes to edit actions into a Bag-of-Words (BoW) model. However, when applied to open-source projects, this method exhibits several limitations. For instance, it overlooks function call information and disregards feature word order. This study introduces MIFA, a novel technique for mining code change patterns using multiple feature analysis. MIFA extends existing BoW methods by incorporating analysis of function calls and overall changes in the Abstract Syntax Tree (AST) structure. We selected 20 popular Python projects and evaluated MIFA in both intra-project and cross-project scenarios. The experimental results indicate that: (1) MIFA achieved higher silhouette coefficients and F1 scores compared to other state-of-the-art methods, demonstrating a superior accuracy; (2) MIFA can assist developers in detecting unique change patterns more earlier, with an efficiency improvement of over 40% compared to random sampling. Additionally, we discussed critical parameters for measuring the similarity of code changes, guiding users to apply our method effectively.
An Empirical Study of Fault Localization on Novice Programs and Addressing the Tie Problem
Liu Y., Zhong J., Hei Q., Zhou X., Xiao J.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Programming education is becoming increasingly popular in universities. However, due to a lack of debugging experience, novices often encounter numerous difficulties in the programming process. Automatic fault localization techniques have emerged as a promising solution to address this issue. Among these techniques, Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) have been widely used in industrial programs. However, there is a significant difference between industrial and novice programs and the performance of these methods on novice programs has not been extensively studied. To fill this gap, we conducted an empirical study to evaluate the fault localization performance and execution overhead of SBFL and MBFL in a typical novice programming environment. Our study specifically examined how different program characteristics, including code coverage and mutation score, affect the accuracy of these localization methods. Additionally, during the study, we identified the tie problem in both methods and further investigated its impact on fault localization techniques in novice programs. To remove the impact of the tie problem, we proposed using PageRank scores as weights for the suspiciousness, sorting, and locating faults based on the weighted suspiciousness. The PageRank algorithm is based on statement coverage information and constructs a directed graph. From the directed graph, a transition matrix generates the weight scores (PageRank scores) for each statement. Our research demonstrates that both SBFL and MBFL are effective for fault localization in novice programs, with MBFL showing significantly better performance in our tests. In TOP-[Formula: see text], MBFL accurately locates 67, 96 and 114 faults, respectively, indicating superior performance. Additionally, calculating weighted suspiciousness significantly alleviates the tie problem.
The Trustworthiness Metric Model of Interface based on Defects
Ma Y., Gao X.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
The interface is a crucial element in component-based software, enabling the linkage of distinct components to facilitate interaction. Defects within the interface can significantly impact the overall trustworthiness of the system. Therefore, it is essential to assess the interface trustworthiness based on a defect-centric approach. This paper introduces a novel model for evaluating interface trustworthiness, anchored in defect analysis. First, the defect types are formalized based on interface specifications. Then, the comprehensive weight allocation method is established to characterize the importance degree of each interface defect type by combining the G1 and CRITIC methods. Subsequently, the attributes of the interface are evaluated by defect value analysis, and the trustworthiness measurement model of the interface is proposed based on these attributes. Furthermore, to evaluate the trustworthiness of the whole system, the trustworthiness measure models under different combination structure of components are established. Finally, the model’s’ applicability is demonstrated through an illustrative example. This trustworthiness evaluation from the interface view can guide interface designers to obtain high-quality interfaces and improve the trustworthiness of the entire software.
Code Recommendation for Schema Evolution of Mimic Storage Systems
Kong X., Lv Z., Chen C., Chang H., Li N., Zhang F.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Schema evolution of mimic storage systems is a time-consuming and error-prone task due to the redundant development of heterogeneous executors. The ORM-based proxy requires an entire class to represent the structure of a data table. There lacks domain-specific code recommendation techniques to boost storage development. To address this issue, we design a novel type of code context, i.e. schema context, that combines features of code text, syntax and structure. Regarding the requirements of class-level granularity, we focus on behavior and attribute in code syntax, and use element position and structural metrics to mine the hidden relationships. Based on schema context and an existing inference mode, we propose SchemaRec to recommend ORM-related class for the database executors once one of them has been changed. We conduct experiments with 110 open-source projects, and the results show that SchemaRec obtains more accurate results than Lucene, DeepCS, QobCS and SEA in terms of Top-1, Top-10 and MRR accuracy due to the better ability of context representation. We also find that code syntax is the most important information because it involves behavior and attribute information of ORM-related classes.
IRaDT: LLVM IR as Target for Efficient Neural Decompilation
Li Y., Xu T., Wang C.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Decompilation is a widely utilized technique in reverse engineering, aimed at restoring binary code to human-readable high-level language code. However, the readability of the output from traditional decompilers is often poor. With advancements in language models, several learning-based decompilation methods have emerged. Nevertheless, the probabilistic nature of language models leads to outputs whose correctness cannot be guaranteed, necessitating further analysis by engineers to identify the corresponding functionality of the code. Inspired by compiler toolchains, we propose a novel approach to enhance the effectiveness of language models in decompilation tasks. Traditional rule-based methods and learning-based techniques are fused together in our approach, drawing insights from both paradigms. Specifically, we present a pre-trained sequence-to-sequence model called IRaDT tailored to refine decompilation outputs at the intermediate representation level. Through this hybridization, we aim to address the limitations of existing methodologies and achieve more accurate and robust decompilation. We construct a diverse decompilation dataset targeting IR and evaluated IRaDT based on this dataset. The experimental results indicate that IRaDT has the ability to improve the readability of IR while ensuring its compileability, achieving a 74% improvement compared to RetDec and a 93% improvement compared to ChatGPT.
Boosting Commit Classification Based on Multivariate Mixed Features and Heterogeneous Classifier Selection
Wu Y., Li Y., Wang Z., Tan Q., Liu J., Jiang Y.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Commit classification plays a crucial role in software maintenance, as it permits developers to make informed decisions regarding resource allocation and code review. There are several approaches for automatic commit classification, yet they do not sufficiently explore the features related to commits and consider the advantages of ensemble models over individual models. Therefore, there is some room for improvement. In this paper, we propose MuheCC, a commit classification approach based on multivariate mixed features and heterogeneous classifier selection to address these challenges. It mainly consists of three phases: (1) Multivariate mixed feature extraction, which extracts features from commit messages, changed code and handcrafted features to construct comprehensive mixed features; (2) Hyperparameter tuning based on genetic algorithm, which utilizes genetic algorithm to optimize candidate traditional models and ensemble models; (3) Heterogeneous classifier selection, which selects the optimal combinations of traditional and ensemble models, respectively, to build a heterogeneous classifier for commit classification. To evaluate this approach, we extend an existing dataset with code changes for each commit and compare MuheCC with three baselines on this real-world dataset. The results show that MuheCC outperforms all baselines, especially improving the best baseline by 7.25% for accuracy, 6.88% for precision, 7.25% for recall and 7.06% for [Formula: see text]-score. Furthermore, the ablation experiments validate that the performance advantage of MuheCC is mainly attributed to the multivariate features (e.g. 12.55% contributions to accuracy) and the heterogeneous classifier (e.g. 12.26% contributions to accuracy). We further discuss the impact of hyperparameter tuning and heterogeneous classifier selection on the performance of MuheCC. These results prove the superiority and potential practical value of MuheCC.
A Formal Language for Performance Evaluation based on Reinforcement Learning
Wang F., Tan L., Cao Z., Ma Y., Zhang L.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Temporal Logics are a rich variety of logical systems designed for specifying properties over time, and about events and changes in the world over time. Traditional temporal logic, however, is limited to binary outcomes true or false and lacks the capacity to specify performance properties of a system such as the maximum, minimum, or average costs between states. Current languages do not accommodate the quantification of such performance properties, especially in scenarios involving infinite execution paths where performance property like cumulative sums may fail to converge. To this end, this paper introduces a novel formal language aimed at assessing system performance, which encapsulates not only temporal dynamics but also various performance-related properties. In this study, this paper utilizes reinforcement learning techniques to compute the values of performance property formulas. Finally, in the experimental part, a formal language representation of system performance properties was implemented, and the values of the performance property formulas were computed using reinforcement learning. The effectiveness and feasibility of the proposed method were validated.
An Empirical Study of the Impact of Class Overlap on the Performance and Interpretability of Cross-Version Defect Prediction
Han H., Yu Q., Zhu Y., Cheng S., Zhang Y.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
The class overlap problem refers to instances from different categories heavily overlapping in the feature space. This issue is one of the challenges in improving the performance of software defect prediction (SDP). Currently, the studies on the impact of class overlap on SDP mainly focused on within-project defect prediction and cross-project defect prediction. Moreover, the existing class overlap instances cleaning methods are not suitable for cross-version defect prediction. In this paper, we propose a class overlap instances cleaning method based on the Ratio of K-nearest neighbors with the Same Label (RKSL). This method removes instances with the abnormal neighbor ratio in the training set. Based on the RKSL method, we investigate the impact of class overlap on the performance and interpretability of the cross-version defect prediction model. The experiment results show that class overlap can affect the performance of cross-version defect prediction models significantly. The RKSL method can handle the class overlap problem in defect datasets, but it may impact the interpretability of models. Through the analysis of feature changes, we consider that class overlap instances cleaning can assist models in identifying more important features.
Video Multimodal Entity Linking via Multi-Perspective Enhanced Subgraph Contrastive Network
Li H., Yue Y., Man X., Li H.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Video Multimodal Entity Linking (VMEL) is a task to link entities mentioned in videos to entities in multimodal knowledge bases. However, current entity linking methods primarily focus on text and image modalities, neglecting the significance of video modality. To address this challenge, we propose a novel framework called the multi-perspective enhanced Subgraph Contrastive Network (SCMEL) and construct a VMEL dataset named SceneMEL, based on tourism domain. We first integrate textual, auditory and visual modal contexts of videos to generate a comprehensive high-recall candidate entity set. Furthermore, a semantic-enhanced video description subgraph generation module is utilized to convert videos into a multimodal feature graph structure and perform subgraph sampling on the domain-specific knowledge graph. Lastly, we conduct contrastive learning on local perspectives (text, audio, visual) within the video subgraphs and the knowledge graph subgraphs, as well as global perspectives, to capture fine-grained semantic information about videos and entities. A series of experimental results on SceneMel demonstrate the effectiveness of the proposed approach.
Multi-label Classification of Pure Code
Gao B., Qin H., Ma X.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Currently, there is a significant amount of public code in the IT communities, programming forums and code repositories. Many of these codes lack classification labels, or have imprecise labels, which causes inconvenience to code management and retrieval. Some classification methods have been proposed to automatically assign labels to the code. However, these methods mainly rely on code comments or surrounding text, and the classification effect is limited by the quality of them. So far, there are a few methods that rely solely on the code itself to assign labels to the code. In this paper, an encoder-only method is proposed to assign multiple labels to the code of an algorithmic problem, in which UniXcoder is employed to encode the input code and the encoding results correspond to the output labels through the classification heads. The proposed method relies only on the code itself. We construct a dataset to evaluate the proposed method, which consists of source code in three programming languages (C[Formula: see text], Java, Python) with a total size of approximately 120[Formula: see text]K. The results of the comparative experiment show that the proposed method has better performance in multi-label classification task of pure code than encoder–decoder methods.
Semantic Code Clone Detection Based on Community Detection
Wan Z., Xie C., Lv Q., Fan Y.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Semantic code clone detection is to find code snippets that are structurally or syntactically different, but semantically identical. It plays an important role in software reuse, code compression. Many existing studies have achieved good performance in non-semantic clone, but semantic clone is still a challenging task. Recently, several works have used tree or graph, such as Abstract Syntax Tree (AST), Control Flow Graph (CFG) or Program Dependency Graph (PDG) to extract semantic information from source codes. In order to reduce the complexity of tree and graph, some studies transform them into node sequences. However, this transformation will lose some semantic information. To address this issue, we propose a novel high-performance method that utilizes community detection to extract features of AST while preserving its semantic information. First, based on the AST of source code, we exploit community detection to split AST into different subtrees to extract the underlying semantics information of different code blocks, and use centrality analysis to quantify the semantic information as the weight of AST nodes. Then, the AST is converted into a sequence of tokens with weights, and a Siamese neural network model is used to detect the similarity of token sequences for semantic code clone detection. Finally, to evaluate our approach, we conduct experiments on two standard benchmark datasets, Google Code Jam (GCJ) and BigCloneBench (BCB). Experimental results show that our model outperforms the eight publicly available state-of-the-art methods in detecting code clones. It is five times faster than the tree-based method (ASTNN) in terms of time complexity.
Pattern Mining-based Warning Prioritization by Refining Abstract Syntax Tree
Ge X., Li X., Sun Y., Qing M., Zheng H., Zhang H., Wu X.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Static code analysis tools (SATs) are widely used to detect potential defects in software projects. However, the usability of SATs is seriously hindered by a large number of unactionable warnings. Currently, many warning prioritization approaches are proposed to improve the usability of SATs. These approaches mainly extract different warning features to capture the statistical or historical information of warnings, thereby ranking actionable warnings in front of unactionable warnings. Such features are extracted by extremely relying on domain knowledge. However, the precise domain knowledge is difficult to be acquired. Also, the domain knowledge obtained in a project cannot be directly applied to other projects due to different application scenarios among different projects. To address the above problem, we propose a pattern mining-based warning prioritization approach based on the warning-related Abstract Syntax Tree (AST). To automatically mine actionable warning patterns, our approach leverages an advanced technique to collect actionable warnings, designs an algorithm to extract the warning-related AST, and mines patterns from ASTs of all actionable warnings. To prioritize the newly reported warnings, our approach combines exact and fuzzing matching techniques to calculate the similarity score between patterns of the newly reported warnings and the mined actionable warning patterns. We compare our approach with four typical baselines on five open-source and large-scale Java projects. The results show that our approach outperforms four baselines and achieves the maximum MAP (0.76) and MRR (2.19). Besides, a case study on Defect4J dataset demonstrates that our approach can discover 83% of true defects in the top 10 warnings.
Program Segment Testing for Human–Machine Pair Programming
Rao L., Liu S., Liu A.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Human–Machine Pair Programming (HMPP) is a promising technique in the software development process, which means that software construction can be done in the manner that humans are responsible for developing the program while computer is responsible for monitoring the program in real-time and reporting errors. The Java runtime exceptions in the current version of the software under construction can only be effectively detected by means of its execution. Traditional software testing techniques are suitable for testing completed programs but face a challenge in building a suitable testing environment for testing the partial programs produced during HMPP. In this paper, we put forward a novel technique, called Program Segment Testing (PST) for automatically identifying errors caused by runtime exceptions to support HMPP. We first introduce the relevant involved in this technique to detect index out of bounds exceptions, a representative of runtime exceptions. Then we discuss the methodology of this technique in detail and illustrate its workflow with a simple case study. Finally, we carry out an experiment to evaluate this technique and compare it with three existing fault detection techniques using several programs to demonstrate its effectiveness.
Optimizing Mutation-based Fault Localization through Contribution-based Test Case Reduction
Wang H., Yang K., Wu T.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Fault localization is an expensive phase of software debugging processes. Although Mutation-based Fault Localization (MBFL) is a promising technique, its computational cost remains high due to the extensive mutation executions involved in mutation analysis. Previous studies have primarily focused on reducing costs by decreasing the mutant numbers and optimizing the execution, yielding promising results. However, test case reduction has also proven to be effective in reducing costs in MBFL. In this paper, we propose an approach called Contribution-Based Test Case Reduction (CBTCR) aimed at enhancing MBFL efficiency. CBTCR assesses the contribution value of each test case and selects them accordingly. The reduced test suite is then used for mutant execution. We evaluate CBTCR on 543 real software faults from Defects4J benchmark. Results show that CBTCR outperforms other MBFL test case reduction strategies (e.g. FTMES, IETCR), in terms of the Top-N and MAP metrics. Moreover, CBTCR achieves an average cost reduction of 87.06%, while maintaining accuracy comparable to those of the original MBFL techniques. This research paper presents an innovative and effective solution for optimizing MBFL, which can significantly reduce the cost and time required for software debugging.
Approach to detect Windows malware based on malicious tendency image and ResNet algorithm
Zhang B., Zhang H., Ren R., Wen Z., Wang Q.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Timely detection of self-replicating malware in the high market share Windows operating system can effectively prevent personal or corporate financial losses. The form and characteristics of malware are constantly evolving, leading to a concept drift issue that gradually decreases the effectiveness of traditional detection methods. Therefore, we propose WinMDet, a Windows malware detection method based on malicious tendency image and ResNet algorithm. First, to tackle the complexity and difficulty in accurately characterizing malware features, WinMDet retains detailed malware features and encodes them into malicious tendency images to better describe malware across different periods. Secondly, WinMDet utilizes previously generated malicious tendency images to train the initial detection model. Then, to alleviate the issue of malware concept drift, WinMDet employs Local Maximum Mean Discrepancy (LMMD) as the criterion for model transfer, enhancing the initial detection model’s ability to distinguish between malware and benign software. We conducted a comprehensive evaluation of WinMDet using common metrics such as accuracy, precision and recall. The results indicate that WinMDet performs remarkably well in terms of accuracy, exceeding 82%. Additionally, significant improvements were observed in precision and recall, surpassing 82.42% and 82.06%, respectively. After employing our LMMD-based transfer method, the initial detection model improved the detection accuracy of malware in 2021 and 2022 by approximately 4.22% to 8.06%. The false negative rate decreased by at most 4.34%, and the false positive rate decreased by at most 4.61%.
Fine-Grained Entity Type Completion based on Neighborhood-Attention and Cartesian-Polar Coordinates Mapping
Zhang X., Li X., Wang H.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Entities refer to things that exist objectively, and entity types are concepts abstracted from entities that have the same features or properties. However, the entity types in the knowledge graph are always incomplete. Currently, the main approach for predicting missing entity types is to learn structured representations of entities and types separately, which ignores neighborhood semantic knowledge of the entity. Therefore, this paper proposes the aggregation neighborhood semantics model for type completion (ANSTC), which extracts neighborhood triple features of target entities with two attentional mechanisms. Meanwhile, the spatial mapping module in ANSTC maps entities from Cartesian coordinate to Polar coordinate system, which can map similar vectors onto a concentric circle and then rotate the angle according to the fine-grained difference to achieve entity-to-type transformation. Moreover, we add semantic features from text to the entity representations to enrich semantics. Through experimental comparison on the FB15K and YAGO43K dataset, we get similar results to the baseline. We also construct person dataset in computer domain, and the values of MRR, Hit@1, Hit@3 and Hit@10 are improved compared with the ConnectE model. The experimental results demonstrate that our model can effectively predict the fine-grained entity types in the domain dataset, and achieve state-of-the-art performance.
Flaky Test Detection Based on Adaptive Latest Position Execution for Concurrent Android Applications
Zhang W., Wang W., Zhao R.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Tests may pass or fail under the same conditions. These tests are commonly known as flaky tests. In Android applications, the primary reason for flaky tests is attributed to its event-driven programming paradigm and multi-threading concurrency mechanism. It may activate an unexpected event order when a test is executed, causing test flakiness. The later the execution of asynchronous events, the more likely it is to result in test flakiness. Inspired by this deduction, this paper puts forward a flaky test detection method for concurrent Android applications based on adaptive latest position execution. In more detail, the latest execution positions of each asynchronous event are identified by analyzing the sequential dependencies between events. On this basis, the asynchronous event is scheduled at the corresponding position, thereby trying to change the test results and detecting flaky tests. To validate the effectiveness and efficiency of our approach, a series of experiments are conducted on 16 known flaky test cases across 7 Android applications. The experimental results show that compared with the state-of-the-art tool FlakeScanner, the flaky test detection rate of our approach improves by 18.75%.
Security Development Lifecycle-based Adaptive Reward Mechanism for Reinforcement Learning in Continuous Integration Testing Optimization
Yang Y., Wang W., Li Z., Zhang L., Pan C.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Continuous automated testing throughout each cycle can ensure the security of the continuous integration (CI) development lifecycle. Test case prioritization (TCP) is a critical factor in optimizing automated testing, which prioritizes potentially failed test cases and improves the efficiency of automated testing. In CI automated testing, the TCP is a continuous decision-making process that can be solved with reinforcement learning (RL). RL-based CITCP can continuously generate a TCP strategy for each CI development lifecycle, with the reward mechanism as the core. The reward mechanism consists of the reward function and the reward strategy. However, there are new challenges to RL-based CITCP in real-industry CI testing. With high-frequency iteration, the reward function is often calculated with a fixed length of historical information, ignoring the spatial characteristics of the current cycle. Therefore, the dynamic time window (DTW)-based reward function is proposed to perform the reward calculation, which adaptively adjusts the recent historical information range based on the integration cycle. Moreover, with low-failure testing, the reward strategy usually only rewards failure test cases, which creates a sparse reward problem in RL. To address this issue, the similarity-based reward strategy is proposed, which increases the reward objects of some passed test cases, similar to the failure test cases. The DTW-based reward function and the similarity-based reward strategy together constitute the proposed adaptive reward mechanism in RL-based CITCP. To validate the effectiveness of the adaptive reward mechanism, experimental verification is carried out on 13 industrial data sets. The experimental results show that the adaptive reward mechanism can improve the TCP effect, where the average NAPFD is maximally improved by 7.29%, the average Recall is maximally improved by 6.04% and the average TTF is improved by 6.81 positions with a maximum of 63.77.
Consistency Checking for Refactoring from Coarse-grained locks to Fine-grained locks
Zhang Y., Liu J., Qi L., Meredith G.
Q3
World Scientific
International Journal of Software Engineering and Knowledge Engineering, 2024, цитирований: 0, doi.org, Abstract
Refactoring for locks is widely used to improve the scalability and performance of concurrent programs. However, when refactoring from coarse-grained locks to fine-grained locks, the behavior of concurrent programs may be changed. To this end, we present LockCheck, a consistency-checking approach based on the parallel extended finite automaton for fine-grained locks. First, we model the critical sections of concurrent programs through control flow analysis and dependency analysis. Second, we sequentialize the concurrent programs to get all the possible transition paths. Furthermore, it reduces the exploration of the redundant paths using partial order theory to obtain the compared transition paths. Finally, we combine consistency rules to check the consistency of the program before and after refactoring. We evaluated LockCheck in five open-source projects. A total of 1528 refactoring operations have been evaluated and 93 inconsistent refactoring operations have been detected. The results show that LockCheck can effectively detect inconsistent behavior when coarse-grained locks are refactored into fine-grained locks.
Cobalt Бета
ru en