Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters
Published in Computers & Security, 2021
Android malware detection is an important problem that must be urgently studied and solved. Machine learning-based methods first extract features from applications and then build a classifier using machine learning algorithms to distinguish malicious and benign applications. In most of the existing work, the difference in feature importance has been ignored, or the calculation of feature weights is irrelevant to the classification model. To address these issues, this paper proposes a novel Android malware detection scheme based on feature weighting with the joint optimization of weight-mapping and classifier parameters, called JOWMDroid. First, features of eight categories are extracted from the Android application package and then a certain number of the most important features are selected using information gain for malware detection. Next, an initial weight is calculated for each selected feature via three machine learning models and then five weight-mapping functions are designed to map the initial weights to the final weights. Finally, the parameters of the weight-mapping function and classifier are jointly optimized by the differential evolution algorithm. The experimental results reveal that the proposed method outperforms four state-of-the-art feature weighting methods and makes the weight-aware classifiers more competitive.
Recommended citation: Cai, Lingru & Li, Yao & Xiong, Zhi. (2021). JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters. Computers & Security.
Download Paper
StructureTester: Automatic Machine Translation Testing Based on Variation Feature Vector
Published in 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS), 2023
In recent years, the performance of machine translation systems has made remarkable progress, primarily due to the rapid advancements in neural network language models. These state-of-the-art models enable the swift translation of vast amounts of text, leading to considerable time and cost savings. In pursuit of enhancing machine translation accuracy, researchers have devoted attention to developing automated translation testing tools. A prominent approach in this context involves comparing the translation results of “similar” source sentences, anticipating the correctness of translation by similarities in sentence structure. However, despite the potential of this approach, the current studies still face certain challenges. Notably, false negatives and false positives persist as issues. Moreover, achieving high detection accuracy for all types of translation errors remains an ongoing challenge. To address these challenges, we propose the StructureTester, a novel approach that not only leverages the differences between the structure trees of two sentences but also employs changes in sentence purpose as crucial judgmental features. Our proposed method yields significant improvements, elevating the overall detection accuracy to an impressive 98.17%. Furthermore, StructureTester effectively identifies various types of translation errors.
Recommended citation: W. Luo, Y. Luo, Y. Li and T. Zhang, "StructureTester: Automatic Machine Translation Testing Based on Variation Feature Vector," 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS), Chiang Mai, Thailand, 2023, pp. 301-312,
Download Paper
Meta-Learning for Multi-Family Android Malware Classification
Published in ACM Transactions on Software Engineering and Methodology, 2024
With the emergence of smartphones, Android has become a widely used mobile operating system. However, it is vulnerable when encountering various types of attacks. Every day, new malware threatens the security of users’ devices and private data. Many methods have been proposed to classify malicious applications, utilizing static or dynamic analysis for classification. However, previous methods still suffer from unsatisfactory performance due to two challenges. First, they are unable to address the imbalanced data distribution problem, leading to poor performance for malware families with few members. Second, they are unable to address the zero-day malware (zero-day malware refers to malicious applications that exploit unknown vulnerabilities) classification problem. In this article, we introduce an innovative meta-learning approach for multi-family Android malware classification named Meta-MAMC, which uses meta-learning technology to learn meta-knowledge (i.e., the similarities and differences among different malware families) of few-family samples and combines new sampling algorithms to solve the above challenges. Meta-MAMC integrates (i) the meta-knowledge contained within the dataset to guide models in learning to identify unknown malware; and (ii) more accurate and diverse tasks based on novel sampling strategies, as well as directly adapting meta-learning to a new few-sample and zero-sample task to classify families. We have evaluated Meta-MAMC on two popular datasets and a corpus of real-world Android applications. The results demonstrate its efficacy in accurately classifying malicious applications belonging to certain malware families, even achieving 100% classification in some families.
Recommended citation: Yao Li, Dawei Yuan, Tao Zhang, Haipeng Cai, David Lo, Cuiyun Gao, Xiapu Luo, and He Jiang. 2024. Meta-Learning for Multi-Family Android Malware Classification. ACM Trans. Softw. Eng. Methodol. 33, 7, Article 174 (September 2024), 27 pages.
Download Paper
Do pre-trained language models indeed understand software engineering tasks?
Published in IEEE Transactions on Software Engineering, 2024
Artificial intelligence (AI) for software engineering (SE) tasks has recently achieved promising performance. In this article, we investigate to what extent the pre-trained language model truly understands those SE tasks such as code search, code summarization, etc. We conduct a comprehensive empirical study on a board set of AI for SE (AI4SE) tasks by feeding them with variant inputs: 1) with various masking rates and 2) with sufficient input subset method. Then, the trained models are evaluated on different SE tasks, including code search, code summarization, and duplicate bug report detection. Our experimental results show that pre-trained language models are insensitive to the given input, thus they achieve similar performance in these three SE tasks. We refer to this phenomenon as
Recommended citation: Yao Li, Tao Zhang, Xiapu Luo, Haipeng Cai, Sen Fang, and Dawei Yuan. 2023. Do Pretrained Language Models Indeed Understand Software Engineering Tasks? IEEE Trans. Softw. Eng. 49, 10 (Oct. 2023), 4639–4655.
Download Paper
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.