Leveraging the enterprise logs shared by one of the key national stakeholders, the cyber security group develops ML solutions that integrate information from multiple logs (application logs, system logs, network logs) to detect compromised user accounts. It further investigates security solutions in the age of encryptions. Specifically, it develops techniques to detect suspicious emails in an enterprise setting when end-to-end email encryption is deployed, and designs mechanisms to detect Tor-based malwares when they disguise their traffic through anonymous communication channels. Another effort is on forensics and threat hunting. For forensics, the cyber security group developed tools to recover incomplete/damaged multimedia evidence to facilitate cyber crime investigation. For threat hunting, it currently investigates mechanisms to leverage the advances in NLP to automatically extract actionable APT threat intelligence from security incident reports, which can then be further integrated into enterprise security solutions for threat hunting and monitoring.
Enterprises and organizations are increasingly under risk of attacks despite extensive research effort, invested organizational resources, and increasing cybersecurity awareness. Further, with edge-based systems starting to find their way into operational technology (OT) environments and increased convergence of IT/OT systems, critical infrastructure organizations get more severely exposed to such threats. This problem is more prevalent in the context of Qatar where ICT systems are adopted more rapidly and extensively, reliance on cyber-physical systems is higher, and potential financial gains as well as regional political conflicts fuel more adversarial actions. Hence, the primary goal of this project is to develop novel solutions to prevent, detect, and respond to threats through analysis of large-scale multidimensional enterprise logs provided by major Qatari stakeholders. This project follows MITRE’s threat model (https://attack.mitre.org/) to identify advanced persistent threat (APT) tactics (e.g., privilege escalation, credential access, lateral movement, command and control, etc.) and the techniques used to achieve them (e.g., entity compromise, account hijacking, spear phishing, remote access, etc.). The project also aims, given the current state and practice of an enterprise’s cyber infrastructure, at predicting future attack campaigns so that necessary preventive measures could be deployed before attackers could target and penetrate vulnerable assets. A medium-term objective of this project is to expand and adapt these capabilities to operational technology environments.
P. Dodia, M. AlSabah (joint first author), O. Alrawi, and T, Wang. “Exposing the Rat in the Tunnel: Using Traffic Analysis for Tor-based Malware Detection,” In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS ’22). PDF
M. Abdallah, D. Woods, P. Naghizadeh, I. Khalil, T. Cason, S. Sundaram, S. Bagchi, “TASHAROK: Using Mechanism Design for Enhancing Security Resource Allocation in Interdependent Systems,” IEEE Symposium on Security and Privacy, 2022, pp. 249-266. PDF
M. Abdallah, D. Woods, P. Naghizadeh, I. Khalil, T. Cason, S. Sundaram, and S. Bagchi, “Morshed: Guiding Behavioral Decision-Makers towards Better Security Investment in Interdependent Systems,” AsiaCCS, 2021, pp. 378-392. PDF
L. Yuan, E. Choo, T. Yu, I. Khalil, and S. Zhu, “Time-Window Based Group-Behavior Supported Method for Accurate Detection of Anomalous Users,” DSN, 2021, pp. 250-262. PDF
M. Nabeel, E. Altinisik, H. Sun, I. Khalil, W. Wang, and T. Yu, “CADUE: Content-Agnostic Detection of Unwanted Emails for Enterprise Security. RAID, 2021, pp. 205-219. PDF
I. Khalil, E. Choo, T. Yu, Lun-Pin Yuan, and Sencun Zhu.“Anomalous user account detection systems and methods.” US Patent, Application number 17685687. 2022
M. Nabeel, I. Khalil, T. Yu, Haipei Sun, and Hui Wang.“Systems and methods for encrypted message filtering.” US Patent, Application number 17490252. Yu. 2022
This project was born out of the partnership between QCRI, and various government and public stakeholders involved in the National Security Operations Center (NSoC) initiative led by the Ministry of Interior (MOI). As part of this partnership, QCRI established and manages the National Cyber Security Research Lab (NCSRL) which comprises three units: (i) The Secure Data Lab, an isolated lab that stores sensitive NSoC logs collected from various government agencies and public organizations. The lab has a direct connection to the NSoC datacenter via a point-to-point GNET link, allowing it access to all NSoC logs and providing security intelligence to participating stakeholders. (ii) The Cyber Range serves as a training ground for students, practitioners and security operators on cyber security defensive and offensive techniques, and as a testing ground for researchers to advance the state-of-the-art in cyber security technologies and for stakeholders to better understand the security posture of their organizations by testing an emulated version of it in the cyber range. (iii) The Security Operations Center (SoC) that supports the cyber range adversary emulation and defense games and also used to replay data received from the NSoC for ground-truth mining.
- Spam and phishing in logs: We have built a prototype of the compromised user account detection mechanism, named SIJIL, using autoencoders. We have also developed a novel approach for content-agnostic detection of spam and phishing emails. Stakeholders across all sectors in Qatar can benefit from the outcomes of this project by: (1) having continuous assessment and monitoring of their critical cyber assets, (2) enhancing their preventive and detective cyber security posture with tailored and advanced agile technologies, and (3) gaining better visibility into their cyber space through sharing of data, intelligence, resources, and experience.Studying the MOI logs, we observed thousands of Tor and VPN connections, some of which were related to malware and botnets. This isn’t surprising as hundreds of malware variants have evolved to use VPNs and Tor to hide their presence and thwart active take down operations. C&C servers are increasingly operated from hidden servers to protect their IPs and locations and prevent network admins from observing coordinated botnet operations.
- Tor malware problem: Simple blocking of Tor connections is not feasible, so we explore if it is possible to differentiate between benign and malicious Tor connections. We rely on traffic analysis, which is the process of examining traffic patterns (packet sizes, directions, timings, etc) to infer more (sensitive) information about traffic, thereby reducing the expected privacy provided by encryption or by proxy-based anonymization. Not only are we able to identify malicious encrypted Tor connections, we show that this can be done for “zero-day” malware variants, which have not been seen by our models. Our approach is summarized as follows.First, we build and update a repository consisting of hundreds of thoroughly verified fresh malicious Tor binaries. We inspect their traffic and ensure they are indeed malicious and use Tor for their operation. For creating a benign dataset, we build various binaries that simulate different profiles of traffic browsing (using the Tor browser) with different settings and loads. We deploy our (benign and malware) binaries independently on a sandbox environment and collect thousands of traffic instances over months. We also incorporate Tor traffic instances generated using various applications that were independently collected and published. Second, we characterize Tor-based malware binaries through the analysis of different families and network traffic exchange. We identify several characteristics that differentiate between malicious Tor communication and typical benign Tor browsing sessions. We craft and extract novel combinations of connection- and host-level traffic features from benign and malicious traffic. We then create (1) binary classifiers that can distinguish malware from benign connections, and (2) multi-label malware classifiers that can identify the malware class (ransomware, worm, trojan, etc.) based on their Tor communication.Finally, we craft several experiments to demonstrate the usefulness of our approach, feature categories, and models. In testing our models, we build datasets that reflect realistic scenarios where malware connections comprise a minimal percentage of all Tor connections. Our experiments show that we can identify the following: (1) malware connections, (2) malware class, and (3) zero-day malware. Furthermore, we experiment on real-world enterprise network logs which we obtain from our partners, and we are able to red flag suspicious connections despite missing some features due to the unavailability of raw PCAPs
Our short-term goal, by the end of 2023, is to create a data management layer on top of the raw logs for more efficient parsing and extraction of pertinent data attributes. Additionally, we aim to develop a user-friendly interface for security analysts and stakeholders to easily explore and summarize the data and for researchers to present their findings to stakeholders. In the medium-term, by 2024, we plan to continue our research on identifying compromised accounts, detecting risky emails, and identifying vulnerable assets. In the long-term, within the next 3 to 5 years, we aim to systematically align our log data with MITRE Att&ck techniques and develop detection methods utilizing the data sources present in our logs.Another direction is on VPN Security. Qatar is one of the top countries in VPN downloads. Often, installing VPN apps causes all user traffic to be routed through the VPNs or proxies leaving users vulnerable to privacy leaks and attacks. Previous research explored (1) passive leaks (e.g. DNS or IPv6 leaks) from centralized VPN services, (2) characterized residential IP services and their possible malicious use, and (3) shed light on active attacks such as TLS interception and password logging for open proxies. However, there is no way currently for a user to evaluate the privacy and security guarantees offered by VPN services (whether Residential IPs or centralized servers). One goal here is to build a framework for analysis and making evaluations readily available to users. Further, with the widespread use of such proxies, another challenge present to ISPs and network admins is determining whether connections associated with a certain device are originated or proxied by the device. Another goal is to use traffic analysis to address this challenge, and determine usage (mining, spam or malicious use, or benign browsing).