Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient user experience, VUIs raise new security and privacy concerns for their users. In this paper, we focus on attribute inference attacks in the speech domain, demonstrating the potential for an attacker to accurately infer a target user's sensitive and private attributes (e.g. their emotion, sex, or health status) from deep acoustic models. To defend against this class of attacks, we design, implement, and evaluate a user-configurable, privacy-aware framework for optimizing speech-related data sharing mechanisms. Our objective is to enable primary tasks such as speech recognition and user identification, while removing sensitive attributes in the raw speech data before sharing it with a cloud service provider. We leverage disentangled representation learning to explicitly learn independent factors in the raw data. Based on a user's preferences, a supervision signal informs the filtering out of invariant factors while retaining the factors reflected in the selected preference. Our experimental evaluation over five datasets shows that the proposed framework can effectively defend against attribute inference attacks by reducing their success rates to approximately that of guessing at random, while maintaining accuracy in excess of 99% for the tasks of interest. We conclude that negotiable privacy settings enabled by disentangled representations can bring new opportunities for privacy-preserving applications.
Data holders are increasingly seeking to protect their user's privacy, whilst still maximizing their ability to produce machine learning (ML) models with high quality predictions. In this work, we empirically evaluate various implementations of differential privacy (DP), and measure their ability to fend off real-world privacy attacks, in addition to measuring their core goal of providing accurate classifications. We establish an evaluation framework to ensure each of these implementations are fairly evaluated. Our selection of DP implementations add DP noise at different positions within the framework, either at the point of data collection/release, during updates while training of the model, or after training by perturbing learned model parameters. We evaluate each implementation across a range of privacy budgets and datasets, each implementation providing the same mathematical privacy guarantees. By measuring the models' resistance to real world attacks of membership and attribute inference, and their classification accuracy. we determine which implementations provide the most desirable tradeoff between privacy and utility. We found that the number of classes of a given dataset is unlikely to influence where the privacy and utility tradeoff occurs, a counter-intuitive inference in contrast to the known relationship of increased privacy vulnerability in datasets with more classes. Additionally, in the scenario that high privacy constraints are required, perturbing input training data before applying ML modeling does not trade off as much utility, as compared to noise added later in the ML process.
Neural networks are increasingly important in the development of Network Intrusion Detection Systems (NIDS), as they have the potential to achieve high detection accuracy while requiring limited feature engineering. Deep learning-based detectors can be however vulnerable to adversarial examples, by which attackers that may be oblivious to the precise mechanics of the targeted NIDS add subtle perturbations to malicious traffic features, with the aim of evading detection and disrupting critical systems in a cost-effective manner. Defending against such adversarial attacks is therefore of high importance, but requires to address daunting challenges.
In this paper, we introduce Tiki-Taka, a general framework for (i) assessing the robustness of state-of-the-art deep learning-based NIDS against adversarial manipulations, and which (ii) incorporates our proposed defense mechanisms to increase the NIDS' resistance to attacks employing such evasion techniques. Specifically, we select five different cutting-edge adversarial attack mechanisms to subvert three popular malicious traffic detectors that employ neural networks. We experiment with a publicly available dataset and consider both one-to-all and one-to-one classification scenarios, i.e., discriminating illicit vs benign traffic and respectively identifying specific types of anomalous traffic among many observed. The results obtained reveal that, under realistic constraints, attackers can evade NIDS with up to 35.7% success rates, by only altering time-based features of the traffic generated. To counteract these weaknesses, we propose three defense mechanisms, namely: model voting ensembling, ensembling adversarial training, and query detection. To the best of our knowledge, our work is the first to propose defenses against adversarial attacks targeting NIDS. We demonstrate that when employing the proposed methods, intrusion detection rates can be improved to nearly 100% against most types of malicious traffic, and attacks with potentially catastrophic consequences (e.g., botnet) can be thwarted. This confirms the effectiveness of our solutions and makes the case for their adoption when designing robust and reliable deep anomaly detectors.
In recent years, deep neural networks (DNN) have become an important type of intellectual property due to their high performance on various classification tasks. As a result, DNN stealing attacks have emerged. Many attack surfaces have been exploited, among which cache timing side-channel attacks are hugely problematic because they do not need physical probing or direct interaction with the victim to estimate the DNN model. However, existing cache-side-channel-based DNN reverse engineering attacks rely on analyzing the binary code of the DNN library that must be shared between the attacker and the victim in the main memory. In reality, the DNN library code is often inaccessible because 1) the code is proprietary, or 2) memory sharing has been disabled by the operating system. In our work, we propose GANRED, an attack approach based on the generative adversarial nets (GAN) framework which utilizes cache timing side-channel information to accurately recover the structure of DNNs without memory sharing or code access. The benefit of GANRED is four-fold. 1) There is no need for DNN library code analysis. 2) No shared main memory segment between the victim and the attacker is needed. 3) Our attack locates the exact structure of the victim model, unlike existing attacks which only narrow down the structure search space. 4) Our attack efficiently scales to deeper DNNs, exhibiting only linear growth in the number of layers in the victim DNN.
Public clouds are inherently multi-tenant: applications deployed by different parties (including malicious ones) may reside on the same physical machines and share various hardware resources. With the introduction of newer hypervisors, containerization frameworks like Docker, and managed/orchestrated clusters using systems like Kubernetes, cloud providers downplay the feasibility of co-tenant attacks by marketing a belief that applications do not operate on shared hardware. In this paper, we challenge the conventional wisdom that attackers cannot confirm co-residency with a victim application from inside state-of-the-art containers running on virtual machines. We analyze the degree of vulnerability present in containers running on various systems including within a broad range of commercially utilized orchestrators. Our results show that on commercial cloud environments including AWS and Azure, we can obtain over 90% success rates for co-residency detection using real-life workloads. Our investigation confirms that co-residency attacks are a significant concern on containers running on modern orchestration systems.
The recent years have brought an influx of privacy conscious applications, that enable strong security guarantees for end-users via end-to-end or client-side encryption. Unfortunately, this application paradigm is not easily transferable to web-based cloud applications. The reason for this lies within adversary's enhanced control over client-side computing through application provided JavaScript. In this paper, we propose CryptoMembranes - a set of native client-side components that allow the development of web applications which provide a robust isolation layer between the client-side encrypted user data and the potentially untrusted JavaScript, while maintaining full interoperability with current client-side development practices. In addition, to enable a realistic transition phase, we show how CryptoMembranes can be realized for currently existing web browsers via a standard browser extension.
Hardware architectural vulnerabilities, such as Spectre and Meltdown, are difficult or inefficient to mitigate in software. Although revised hardware designs may address some architectural vulnerabilities going forward, most current remedies increase execution time significantly. Techniques are needed to rapidly and efficiently detect these and other emerging threats. We present an anomaly detector, MARTINI, that analyzes traces of memory accesses in real time to detect attacks. Our experimental evaluation shows that anomalies in these traces are strongly correlated with unauthorized program execution, including architectural side-channel attacks of multiple types. MARTINI consists of a finite automaton that models normal program behavior in terms of memory addresses that are read from, and written to, at runtime. The model uses a compact representation of n-grams, i.e., short sequences of memory accesses, which can be stored and processed efficiently. Once the system is trained on authorized behavior, it rapidly detects a variety of low-level anomalous behaviors and attacks not otherwise easily discernible at the software level. MARTINI's implementation leverages recent advances in in-cache and in-memory automata for computation, and we present a hardware unit that repurposes a small portion of a last-level cache slice to monitor memory addresses. Our detector directly inspects the addresses of memory accesses, using the pre-constructed automaton to identify anomalies with high accuracy, negligible runtime overhead, and trivial increase in CPU chip area. We present analyses of expected hardware properties based on indicative cache and memory hierarchy simulations and empirical evaluations.
Process confinement is a key requirement for workloads in the cloud and in other contexts. Existing process confinement mechanisms on Linux, however, are complex and inflexible because they are implemented using a combination of primitive abstractions (e.g., namespaces, cgroups) and complex security mechanisms (e.g., SELinux, AppArmor) that were designed for purposes beyond basic process confinement. We argue that simple, efficient, and flexible confinement can be better implemented today using eBPF, an emerging technology for safely extending the Linux kernel. We present a proof-of-concept confinement application, bpfbox, that uses less than 2000 lines of kernelspace code and allows for confinement at the userspace function, system call, LSM hook, and kernelspace function boundaries---something that no existing process confinement mechanism can do. Further, it does so using a policy language simple enough to use for ad-hoc confinement purposes. This paper presents the motivation, design, implementation, and benchmarks of bpfbox, including a sample web server confinement policy.
String search finds occurrences of patterns in a larger text. This general problem occurs in various application scenarios, f.e. Internet search, text processing, DNA analysis, etc. Using somewhat homomorphic encryption with SIMD packing, we provide an efficient string search protocol that allows to perform a private search in outsourced data with minimal preprocessing. At the base of the string search protocol lies a randomized homomorphic equality circuit whose depth is independent of the pattern length. This circuit not only improves the performance but also increases the practicality of our protocol as it requires the same set of encryption parameters for a wide range of patterns of different lengths. This constant depth algorithm is about 12 times faster than the prior work. It takes about 5 minutes on an average laptop to find the positions of a string with at most 50 UTF-32 characters in a text with 1000 characters. In addition, we provide a method that compresses the search results, thus reducing the communication cost of the protocol. For example, the communication complexity for searching a string with 50 characters in a text of length 10000 is about 347 KB and 13.9 MB for a text with 1000000 characters.
On today's Internet, combining the end-to-end security of TLS with Content Delivery Networks (CDNs) while ensuring the authenticity of connections results in a challenging delegation problem. When CDN servers provide content, they have to authenticate themselves as the origin server to establish a valid end-to-end TLS connection with the client. In standard TLS, the latter requires access to the secret key of the server. To curb this problem, multiple workarounds exist to realize a delegation of the authentication.
In this paper, we present a solution that renders key sharing unnecessary and reduces the need for workarounds. By adapting identity-based signatures to this setting, our solution offers short-lived delegations. Additionally, by enabling forward-security, existing delegations remain valid even if the server's secret key leaks. We provide an implementation of the scheme and discuss integration into a TLS stack. In our evaluation, we show that an efficient implementation incurs less overhead than a typical network round trip. Thereby, we propose an alternative approach to current delegation practices on the web.
seek to influence and polarize political topics through massive coordinated efforts. In the process, these efforts leave behind artifacts, which researchers have leveraged to analyze the tactics employed by disinformation campaigns after they are taken down. Coordination network analysis has proven helpful for learning about how disinformation campaigns operate; however, the usefulness of these forensic tools as a detection mechanism is still an open question. In this paper, we explore the use of coordination network analysis to generate features for distinguishing the activity of a disinformation campaign from legitimate Twitter activity. Doing so would provide more evidence to human analysts as they consider takedowns. We create a time series of daily coordination networks for both Twitter disinformation campaigns and legitimate Twitter communities, and train a binary classifier based on statistical features extracted from these networks. Our results show that the classifier can predict future coordinated activity of known disinformation campaigns with high accuracy (F1 =0.98). On the more challenging task of out-of-distribution activity classification, the performance drops yet is still promising (F1= 0.71), mainly due to an increase in the false positive rate. By doing this analysis, we show that while coordination patterns could be useful for providing evidence of disinformation activity, further investigation is needed to improve upon this method before deployment at scale.
Public development processes are a key characteristic of open source projects. However, fixes for vulnerabilities are usually discussed privately among a small group of trusted maintainers, and integrated without prior public involvement. This is supposed to prevent early disclosure, and cope with embargo and non-disclosure agreement (NDA) rules. While regular development activities leave publicly available traces, fixes for vulnerabilities that bypass the standard process do not.
We present a data-mining based approach to detect code fragments that arise from such infringements of the standard process. By systematically mapping public development artefacts to source code repositories, we can exclude regular process activities, and infer irregularities that stem from non-public integration channels. For the Linux kernel, the most crucial component of many systems, we apply our method to a period of seven months before the release of Linux 5.4. We find 29 commits that address 12 vulnerabilities. For these vulnerabilities, our approach provides a temporal advantage of 2 to 179 days to design exploits before public disclosure takes place, and fixes are rolled out.
Established responsible disclosure approaches in open development processes are supposed to limit premature visibility of security vulnerabilities. However, our approach shows that, instead, they open additional possibilities to uncover such changes that thwart the very premise. We conclude by discussing implications and partial countermeasures.
Verifpal is a new automated modeling framework and verifier for cryptographic protocols, optimized with heuristics for common-case protocol specifications, that aims to work better for real-world practitioners, students and engineers without sacrificing comprehensive formal verification features. In order to achieve this, Verifpal introduces a new, intuitive language for modeling protocols that is easier to write and understand than the languages employed by existing tools. Its formal verification paradigm is also designed explicitly to provide protocol modeling that avoids user error.
Verifpal is able to model protocols under an active attacker with unbounded sessions and fresh values, and supports queries for advanced security properties such as forward secrecy or key compromise impersonation. Furthermore, Verifpal's semantics have been formalized within the Coq theorem prover, and Verifpal models can be automatically translated into Coq as well as into ProVerif models for further verification. Verifpal has already been used to verify security properties for Signal, Scuttlebutt, TLS 1.3 as well as the first formal model for the DP-3T pandemic-tracing protocol, which we present in this work. Through Verifpal, we show that advanced verification with formalized semantics and sound logic can exist without any expense towards the convenience of real-world practitioners.
It is widely known that return-oriented programming (ROP) attack can be mounted on x86, ARM and SPARC architectures. However, it remained an open question if ROP was possible on RISC-V, a new and promising free and open instruction set architecture (ISA). In this paper we present a novel ROP technique specific to RISC-V architecture. Our method relies on the processor's saved registers and its function calling convention. We use functional gadgets (that perform primitive operations) ended in a jump instruction to an address held in a saved register. The order of gadgets chaining is given by a novel gadget, which we call the charger gadget, which loads the saved registers with the gadgets? addresses from the stack. We constructed a library of gadgets extracted from the standard Linux libraries. Finally, we evaluated our method by exploiting a buffer-overflow vulnerable application.
Current adversarial attack methods in black-box settings mainly: (1) rely on transferability approach which requires a substitute model, hence inefficient; or (2) employ a large number of queries for crafting their adversarial examples, hence very likely to be detected and responded by the target system (e.g., AI service provider) due to its high traffic volume. In this paper, we present a black-box adversarial attack based on Multi-Group Particle Swarm Optimization with Random Redistribution (MGRR-PSO) which yields a very high success rate while maintaining a low number of query by launching the attack in a distributed manner. Attacks are executed from multiple nodes, disseminating queries among the nodes, hence reducing the possibility of being recognized by the target system while also increasing scalability. Furthermore, we propose to efficiently remove excessive perturbation (i.e., perturbation pruning) by utilizing again the MGRR-PSO. Overall, we perform five different experiments: comparing our attack's performance with existing algorithms, testing in high-dimensional space using ImageNet dataset, examining our hyperparameters, and testing on real digital attack to Google Cloud Vision. Our attack proves to obtain a 100% success rate for both untargeted and targeted attack on MNIST and CIFAR-10 datasets and able to successfully fool Google Cloud Vision as a proof of the real digital attack with relatively low queries.
Machine Learning as a Service (aka MLaaS) and Smart Grid as a Service (aka SGaaS) are expected to grow at a significant rate. Just like most cloud services, MLaaS and SGaaS can be subject to a number of attacks. In this paper, we focus on white-box attacks (informally defined as attacks that try to access some or all internal data or computation used by the service program), and black-box attacks (informally defined as attacks only use input-output access to the attacked service program).
We consider a participant model including a setup manager, a cloud server, and one or many data producers. The cloud server runs a machine learning classifier trained on a dataset provided by the setup manager and classifies new input data provided by the data producers. Applications include analytics over data received by distributed sensors, such as, for instance, in a typical SGaaS environment. We propose a new security notion of encrypted-input classifier obfuscation as a set of algorithms that, in the above participant and algorithm model, aims to protect the cloud server's classifier program from both white-box and black-box attacks. This notion builds on cryptographic obfuscation of programs [1], cryptographic obfuscation of classifiers [2], and encrypted-input obfuscation of programs [3]. We model classifiers as a pair of programs: a training program that on input a dataset and secret data values, returns classification parameters, and a classification program that on input classification parameters, and a new input data value, returns a classification result. A typical execution goes as follows. During obfuscation generation, the setup manager randomly chooses a key k and sends a k-based obfuscation of the classifier to the cloud server, and sends to the data producers either k or information to generate k-based input data encryptions. During obfuscation evaluation, the data producers send k-based input data encryptions to the cloud server, which evaluates the obfuscated classifier over the encrypted input data. Here, the goal is to protect the confidentiality of the dataset, the secret data, and the classification parameters.
One can obtain a general-purpose encrypted-input classifier obfuscator in two steps: 1) transforming a suitable composition of training and classification algorithms into a single boolean circuit; 2) applying to this circuit the result from saying that [3] a modification of Yao's protocol[4] is an encrypted-input obfuscation of gate values in any polynomial-size boolean circuit. This result is of only theoretical relevance. Towards finding a practically efficient obfuscation of specific classifiers, we note that techniques from [3] can be used to produce an obfuscator for decision trees. Moreover, in recent results we have produced an obfuscator for image matching (i.e., matching an input image to a secret image).