Research OverviewResearch efforts in the laboratory revolve around these specific topics:
- Computer Architecture Design: Emerging micro-architecture design techniques, Network-on-chip design, Adaptive and self-reconfigurable architectures, Hardware techniques for approximate computing;
- Hardware Security: Secure Architecture Design, Hardware-based security primitives (PUF’s, RNG, AES etc), Detection and prevention of hardware Trojans, Side-channel attacks, fault attacks and countermeasures, Security and privacy for the Internet of Things;
- Neural Network and Neuromorphic Computing: Techniques for learning on a chip, Neural network acceleration techniques and hardware designs.
Current Research Highlight
The Odysseus: an open-interface IoT testbed system architecture, learn more about here → Link
Secure Architecture Design
Sphinx: Hardware-Software Co-design Framework for Binary Code Diversification Based Secure Execution
In the Sphinx project, we are developing a hardware-software attack resistant computing system where the privacy and integrity of application executions are maintained. The design consists of two parts: a software obfuscation module and a dedicated hardware execution engine. During compile time, obfuscation instructions are added to the assembly code to produce a new program. The technique allows for multiple versions of a program to be produced and provides moving target security capabilities. For each version, an encrypted obfuscation mask code is produced to distinguish real and obfuscation instructions. A copy of the obfuscated executable and associated mask file is securely distributed to certified users. The dedicated secure hardware execution engine does just-in-time decryption of mask file and execution of the obfuscated program. To demonstrate the feasibility of the Sphinx architectural vision, we are implementing a RISC-V ISA version of the architecture, called Sphinx-V.
Hermes: Secure Heterogeneous Multicore Architecture
With the emergence of general-purpose system-on-chip (SoC) architectures in an array of application domains, some key security challenges arise. In these systems, tenants, i.e., intellectual property (IP) cores or processing units, may come from different providers and executable code may have varying levels of trust. It is therefore important to support multi-level user-defined security protocols that can isolate hardware subsystem and code while enabling optimal sharing of computing resources and data among the tenants. In this work, we are developing security mechanisms for integrating multiple tenants, secure to non-secure cores, into the same chip design, maintaining their individual security, preventing data leakage and corruption while promoting collaboration among the tenants.
Update: Initially the process isolation hardware in the Hermes architecture sits behind the last level cache, but due the recent highlighted side-channel vulnerabilities - Spectre - associated with speculative executions, we are investigating techniques to bring the isolation closer to the core.
Adaptive and Resilient Architecture Design
Helios Project: Adaptive-Approximate Computing Architecture
The Helios project is investigating approaches for designing computer systems that dynamically adapt and optimize their execution behavior according to a set of high-level application goals. The project will explore computer architectures that can automatically detect volume, variety, velocity or veracity variations in the application input streams and reconfigure themselves to meet user-directed performance to power ratios or real-time constraints. The goal for the project is to develop an adaptive-approximate computing system where accuracy can be tradeoff for compute time to withstand real-time changes in the input streams, or precision for power trade-offs can be made in the presence of power availability variations.
Coeus Project: Reconfigurable Architecture Design for Deep Learning Acceleration
Deep learning based algorithms currently provide the best solution to many computing problems from image recognition to health data analysis. Creating a domain-specific architecture template targeted mainly at deep learning based applications will lead to a more effective execution, since their specific performance, communication, and programmability requirements can be better addressed. Four important research questions that we addressing in this project are: (1) what is the appropriate granularity of the reconfiguring processing elements (fine-grained architectures at the granularity of field-programmable gate arrays and extend to complex processor cores), (2) what degree of homogeneity or heterogeneity should the micro-architectural of processor elements exhibit, (3) what is the effective memory organization and technology (from data placement to data movement), and (4) how to implement network intelligence to support and adapt to the communication and memory requirements between layers.
High-Performance Graph Processing Architecture Design
As data collection capabilities improve, both the amount of data available for analysis and the complexity of algorithms rapidly increase. In many applications, ranging from target identification and social network analysis to anomaly detection, the data of interest can be represented as a graph. A graph G = (V,E) is a pair of sets: a set of vertices, V, representing graph nodes, and a set of edges, E, representing relationships between the nodes. Graph-based algorithms and applications are primarily relations and events driven. They exhibit certain unique computational characteristics that often fail beyond the capabilities of current CPUs or GPUs. Our laboratory is investigating a new graph processing architecture that uses on self-timed circuits to leverage the event-driven nature of graph-based applications.
Resilient and Fault-Tolerant Interconnect Network-on-Chip Design
On-chip network (OCN) design has become increasingly challenging due to high levels of integration and complexity of modern systems-on-chip (SoCs). As feature size shrinks, transistors become less reliable and component failures increase. Transistor scaling and integration result in reliability challenges, including interference from electric fields, shrinking of the maximum-minimum voltage window, thermo-mechanical limitations, and soft, transient and intermittent errors. Therefore, beside high-throughput and effective load-balancing routing algorithms, fault-aware design techniques are also required. Our research efforts focus on modeling and evaluating bandwidth-adaptive, fault-aware, and self-reconfigurable on-chip network designs.
Architecture Design and Exploration Tools
We are developing a versatile tool for architecture design space exploration targeting research and teaching environments. BRISC-V is an open-source, parameterized, synthesizable RISC-V based multi-core system. We provide highly parameterized HDL modules for RISC-V based processing elements (cores), cache subsystem, main memory and on-chip network. The system is designed with a high degree of modularity which allows fast exploration of different topologies, routing schemes, processing elements, and memory system organizations. Hardware modules are implemented in synthesizable verilog.
The Odysseus system is a deployable complete platform for investigating, designing and validating large-scale connected distributed systems. It is a three-layer distributed system. It consists of (1) a set of distributed edge nodes connected to data transmission bases using the Zigbee communication protocol, (2) a pool of the data transmission bases to perform long-distance transmissions, and (3) a server-based backend compute infrastructure. Its key features are (i) an open-interface to connect to most 3-pin or 4-pin sensors, (ii) I/O ports on the edge nodes to connect to FPGA boards for in-situ processing, and (iii) an API for programming, testing and monitor nodes and communication bases in the system.
The ASCS Lab is taking over the development and maintenance of the Heracles tool previously housed at the MIT CSAIL Laboratory. The tool Heracles presents designers with a global and complete view of the inner workings of a multiprocessor machine cycle-by-cycle from instruction fetches at the microprocessor core at each node to the flit arbitration at the routers, with RTL level correctness. A flit is the smallest unit of information recognized by the flow control method. This enables the designer to explore different implementation approaches: core microarchitecture, levels of caches, cache sizes, routing algorithm, router micro-architecture, distributed or shared memory, or network interface, and to quickly evaluate their impact on the overall system performance.