Найдено 16
Low Register-Complexity Systolic Digit-Serial Multiplier Over Based on Trinomials
Xie J., Meher P.K., Zhou X., Lee C.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 9, doi.org, Abstract
Digit-serial systolic multipliers over $GF(2^m)$ based on the National Institute of Standards and Technology (NIST) recommended trinomials play a critical role in the real-time operations of cryptosystems. Systolic multipliers over $GF(2^m)$ involve a large number of registers of size $O(m^2)$ which results in significant increase in area complexity. In this paper, we propose a novel low register-complexity digit-serial trinomial-based finite field multiplier. The proposed architecture is derived through two novel coherent interdependent stages: (i) derivation of an efficient hardware-oriented algorithm based on a novel input-operand feeding scheme and (ii) appropriate design of novel low register-complexity systolic structure based on the proposed algorithm. The extension of the proposed design to Karatsuba algorithm (KA)-based structure is also presented. The proposed design is synthesized for FPGA implementation and it is shown that it (the design based on regular multiplication process) could achieve more than 12.1 percent saving in area-delay product and nearly 2.8 percent saving in power-delay product. To the best of the authors’ knowledge, the register-complexity of proposed structure is so far the least among the competing designs for trinomial based systolic multipliers (for the same type of multiplication algorithm).
A New Fluid-Chip Co-Design for Digital Microfluidic Biochips Considering Cost Drivers and Design Convergence
Chakraborty A., Datta P., Pal R.K.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 6, doi.org, Abstract
The design process for digital microfluidic biochips (DMFBs) is becoming more complex due to the growing need for essential bio-protocols. A number of significant fluid- and chip-level synthesis tools have been offered previously for designing an efficient system. Several important cost drivers like bioassay schedule length, total pin count, congestion-free wiring, total wire length, and total layer count together measure the efficiency of the DMFBs. Besides, existing design gaps among the sub-tasks of the fluid and chip level make the design process expensive delaying the time-to-market and increasing the overall cost. In this context, removal of design cycles among the sub-tasks is a prior need to obtain a low-cost and efficient platform. Hence, this paper aims to propose a fluid-chip co-design methodology in dealing with the consideration of the fluid-chip cost drivers, while reducing the design cycles in between. A simulation study considering a number of benchmarks has been presented to observe the performance.
A Deep Structure of Person Re-Identification Using Multi-Level Gaussian Models
Vishwakarma D.K., Upadhyay S.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 6, doi.org, Abstract
Person re-identification is being widely used in the forensic, and security and surveillance system these days. However, it is still a challenging task in a real life scenario. Hence, in this work, a new feature descriptor model has been proposed using a multilayer framework of the Gaussian distribution model on pixel features, which include color moments, color space values, gradient information, and Schmid filter responses. An image of a person usually consists of distinct body regions, usually with differentiable clothing followed by local colors and texture patterns. Thus, the image is evaluated locally by dividing the image into overlapping regions. Each region is further fragmented into a set of local Gaussians on small patches. A global Gaussian encodes these local Gaussians for each region, creating a multi-level structure. Hence, the global picture of a person is described by local level information present in it, which is often ignored. Also, we have analyzed the efficiency of some existing metric learning methods on this descriptor. The performance of the descriptor is evaluated on four publicly available challenging datasets and the highest accuracy achieved on these datasets are compared with similar state-of-the-art works. It clearly demonstrates the superior performance of the proposed descriptor.
Analytical Modeling and Performance Benchmarking of On-Chip Interconnects with Rough Surfaces
Kumar S., Sharma R.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 17, doi.org, Abstract
In planar on-chip copper interconnects, conductor losses due to surface roughness demands explicit consideration for accurate modeling of their performance metrics. This is quite pertinent for high-performance manycore processors/servers, where on-chip interconnects are increasingly emerging as one of the key performance bottlenecks. This paper presents a novel analytical model for parameter extraction in current and future on-chip interconnects. Our proposed model aids in analyzing the impact of spatial and vertical surface roughness on their electrical performance. Our analysis clearly depicts that as the technology nodes scale down; the effect of the surface roughness becomes dominant and cannot be ignored. Based on AFM images of fabricated ultra-thin copper sheets, we have extracted roughness parameters to define realistic surface profiles using the well-known Mandelbrot-Weierstrass (MW) fractal function. For our analysis, we have considered four current and future interconnect technology nodes (i.e., 45, 22, 13, 7 nm) and evaluated the impact of surface roughness on typical performance metrics, such as delay, energy, and bandwidth. Results obtained using our model are verified by comparing with industry standard field solver Ansys HFSS as well as available experimental data that exhibits accuracy within 9 percent. We present signal integrity analysis using the eye diagram at 1, 5, 10, and 18 Gbps bit rates to find the increase in frequency dependent losses due to surface roughness. Finally, simulating a standard three line on-chip interconnect structure, we also report the computational overhead incurred for different values of roughness and technology nodes.
Scalable and Performant Graph Processing on GPUs Using Approximate Computing
Singh S., Nasre R.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 8, doi.org, Abstract
Graph algorithms are being widely used in several application domains. It has been established that parallelizing graph algorithms is challenging. The parallelization issues get exacerbated when graphics processing units (GPUs) are used to execute graph algorithms. While the prior art has shown effective parallelization of several graph algorithms on GPUs, a few algorithms are still expensive. In this work, we address the scalability issues in graph parallelization. In particular, we aim to improve the execution time by tolerating a little approximation in the computation. We study the effects of four heuristic approximations on six graph algorithms with five graphs and show that if an application allows for small inaccuracy, this can be leveraged to achieve considerable performance benefits. We also study the effects of the approximations on GPU-based processing and provide interesting takeaways.
Docker Container Scheduler for I/O Intensive Applications Running on NVMe SSDs
Bhimani J., Yang Z., Mi N., Yang J., Xu Q., Awasthi M., Pandurangan R., Balakrishnan V.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 29, doi.org, Abstract
By using fast back-end storage, performance benefits of a lightweight container platform can be leveraged with quick I/O response. Nevertheless, the performance of simultaneously executing multiple instances of same or different applications may vary significantly with the number of containers. The performance may also vary with the nature of applications because different applications can exhibit different nature on SSDs in terms of I/O types (read/write), I/O access pattern (random/sequential), I/O size, etc. Therefore, this paper aims to investigate and analyze the performance characterization of both homogeneous and heterogeneous mixtures of I/O intensive containerized applications, operating with high performance NVMe SSDs and derive novel design guidelines for achieving an optimal and fair operation of the both homogeneous and heterogeneous mixtures. By leveraging these design guidelines, we further develop a new docker controller for scheduling workload containers of different types of applications. Our controller decides the optimal batches of simultaneously operating containers in order to minimize total execution time and maximize resource utilization. Meanwhile, our controller also strives to balance the throughput among all simultaneously running applications. We develop this new docker controller by solving an optimization problem using five different optimization solvers. We conduct our experiments in a platform of multiple docker containers operating on an array of three enterprise NVMe drives. We further evaluate our controller using different applications of diverse I/O behaviors and compare it with simultaneous operation of containers without the controller. Our evaluation results show that our new docker workload controller helps speed-up the overall execution of multiple applications on SSDs.
: Cost Based Hardware Optimization for Asymmetric Multicore Processors
Sreelatha J.K., Balachandran S., Nasre R.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 9, doi.org, Abstract
Heterogeneous Multiprocessors (HMPs) are popular due to their energy efficiency over Symmetric Multicore Processors (SMPs). Asymmetric Multicore Processors (AMPs) are a special case of HMPs where different kinds of cores share the same instruction set, but offer different power-performance trade-offs. Due to the computational-power difference between these cores, finding an optimal hardware configuration for executing a given parallel program is quite challenging. An inherent difficulty in this problem stems from the fact that the original program is written for SMPs. This challenge is exacerbated by the interplay of several configuration parameters that are allowed to be changed in AMPs. In this work, we propose a probabilistic method named CHOAMP to choose the bestavailable hardware configuration for a given parallel program. Selection of a configuration is guided by a user-provided run-time property such as energy-delay-product (EDP) and CHOAMP aspires to optimize the property in choosing a configuration. The core part of our probabilistic method relies on identifying the behavior of various program constructs in different classes of CPU cores in the AMP, and how it influences the cost function of choice. We implement the proposed technique in a compiler which automatically transforms a code optimized for SMP to run efficiently over an AMP, eliding requirement of any user annotations. CHOAMP transforms the same source program for different hardware configurations based on different user requirement. We evaluate the efficiency of our method for three different run-time properties: execution time, energy consumption, and EDP, in NAS Parallel Benchmarks for OpenMP. Our experimental evaluation shows that CHOAMP achieves an average of 65, 28, and 57 percent improvement over baseline HMP scheduling while optimizing for energy, execution time, and EDP, respectively.
Co-Scheduling Persistent Periodic and Dynamic Aperiodic Real-Time Tasks on Reconfigurable Platforms
Saha S., Sarkar A., Chakrabarti A., Ghosh R.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 14, doi.org, Abstract
As task preemption/relocation with acceptably low overheads become a reality in today's reconfigurable FPGAs, they are starting to show bright prospects as platforms for executing performance critical task sets while allowing high resource utilization. Many performance sensitive real-time systems including those in automotive and avionics systems, chemical reactors, etc., often execute a set of persistent periodic safety critical control tasks along with dynamic event driven aperiodic tasks. This work presents a co-scheduling framework for the combined execution of such periodic and aperiodic real-time tasks on fully and run-time partially reconfigurable platforms. Specifically, we present an admission control strategy and preemptive scheduling methodology for dynamic aperiodic tasks in the presence of a set of persistent periodic tasks such that aperiodic task rejections may be minimized, thus resulting in high resource utilization. We used the 2D slotted area model where the floor of the FPGA is assumed to be statically equipartitioned into a set of tiles in which any arbitrary task may be feasibly mapped. The experimental results reveal that the proposed scheduling strategies are able to achieve high resource utilization with low task rejection rates over various simulation scenarios.
Dynamic Power Budgeting for Mobile Systems Running Graphics Workloads
Gupta U., Ayoub R., Kishinevsky M., Kadjo D., Soundararajan N., Tursun U., Ogras U.Y.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2018, цитирований: 17, doi.org, Abstract
Competitive graphics performance is crucial for the success of state-of-the-art mobile processors. High graphics performance comes at the cost of higher power consumption, which elevates the temperature due to limited cooling solutions. To avoid thermal violations, the system needs to operate within a power budget. Since the power budget is a shared resource, there is a strong demand for effective dynamic power budgeting techniques. This paper presents a novel technique to efficiently distribute the power budget among the CPU and GPU cores, while maximizing performance. The proposed technique is evaluated using a state-of-the-art mobile platform using industrial benchmarks, and an in-house simulator. The experiments on the mobile platform show up to 15% increase in average frame rate compared to default power allocation algorithms.
DISASTER: Dedicated Intelligent Security Attacks on Sensor-Triggered Emergency Responses
Mosenia A., Sur-Kolay S., Raghunathan A., Jha N.K.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2017, цитирований: 7, doi.org, Abstract
Rapid technological advances in microelectronics, networking, and computer science have resulted in an exponential increase in the number of cyber-physical systems (CPSs) that enable numerous services in various application domains, e.g., smart homes and smart grids. Moreover, the emergence of the Internet-of-Things (IoT) paradigm has led to the pervasive use of IoT-enabled CPSs in our everyday lives. Unfortunately, as a side effect, the numberof potential threats and feasible security attacks against CPSs has grown significantly. In this paper, we introduce a new class of attacks against CPSs, called dedicated intelligent security attacks against sensor-triggered emergency responses (DISASTER). DISASTER targets safety mechanisms deployed in automation/monitoring CPSs and exploits design flaws and security weaknesses of such mechanisms to trigger emergency responses even in the absence of a real emergency. Launching DISASTER can lead to serious consequences forthree main reasons. First, almost all CPSs offer specific emergency responses and, as a result, are potentially susceptible to such attacks. Second, DISASTER can be easily designed to target a large number of CPSs, e.g., the anti-theft systems of all buildings in a residential community. Third, the widespread deployment of insecure sensors in already-in-use safety mechanisms along with the endless variety of CPS-based applications magnifies the impact of launching DISASTER. In addition to introducing DISASTER, we describe the serious consequences of such attacks. We demonstrate the feasibility of launching DISASTER against the two most widely-used CPSs: residential and industrial automation/monitoring systems. Moreover, we suggest several countermeasures that can potentially prevent DISASTER and discuss their advantages and drawbacks.
Interference-Aware Wireless Network-on-Chip Architecture Using Directional Antennas
Mondal H.K., Gade S.H., Shamim M.S., Deb S., Ganguly A.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2017, цитирований: 31, doi.org, Abstract
Wireless Network-on-Chip (WiNoC) has been recently introduced for addressing the scalability limitations of conventional multi-hop NoC architectures. Existing WiNoC architectures generally use millimeter-wave antennas without significant directional gains, along with token passing protocol to access the shared wireless medium. This limits the achievable performance benefits since only one wireless pair can communicate at a time. It is also not practical in the immediate future to arbitrarily scale up the number of non-overlapping channels by designing transceivers operating in disjoint frequency bands in the millimeter-wave spectrum commonly adopted for on-chip wireless interconnects. Consequently, we explore the use of directional antennas whereby multiple wireless interconnect pairs can communicate simultaneously. However, concurrent wireless communications can result in interference. This can be minimized in NoC by optimal placement of wireless interfaces (WIs) to maximize performance while minimizing interference. To address this, we propose an interference-aware WIs placement algorithm with routing strategy for WiNoC architecture by incorporating directional planar log-periodic antennas (PLPAs). This directional wireless network-on-chip (DWiNoC) architecture enables point-to-point links between transceivers and hence multiple wireless links can operate at the same time without interference.
Fast Estimation of Area-Coverage for Wireless Sensor Networks Based on Digital Geometry
Saha D., Pal S., Das N., Bhattacharya B.B.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2017, цитирований: 12, doi.org, Abstract
In this paper, we propose a fast and nearly-accurate method of estimating the area covered by a set of n identical and active sensor nodes that are randomly scattered over a 2D region. Since the estimation of collective coverage-area turns out to be computationally complex in the euclidean geometry, we represent each Euclidean circle in JR2 with a digital circle in Z2 for enabling faster computation. Based on the underlying geometric properties of digital circles, we present a novel O(n logn) centralized algorithm and O(d2log d) distributed algorithm for coverage estimation, where d denotes the maximum degree of a node. In order to further expedite the estimation procedure, we approximate each digital circle by the tightest square that encloses it as well as by the largest square inscribed within it. Such approximation allows us to estimate the coverage-area, in a much simpler way, based on the intersection-geometry of a set of axis-parallel rectangles. Our experiments with random deployment of nodes demonstrate that the proposed algorithms estimate the area coverage with a maximum error of only 1.5 percent, while reducing the computational effort significantly compared to earlier work. The technique needs only simple data structures and requires a few primitive integer operations in contrast to classical methods, which need extensive floating-point computations for exact estimation. Furthermore, for an over-deployed network, the estimation provides an almost-exact measure of the covered area.
Wearable Medical Sensor-Based System Design: A Survey
Mosenia A., Sur-Kolay S., Raghunathan A., Jha N.K.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2017, цитирований: 110, doi.org, Abstract
Wearable medical sensors (WMSs) are garnering ever-increasing attention from both the scientific community and the industry. Driven by technological advances in sensing, wireless communication, and machine learning, WMS-based systems have begun transforming our daily lives. Although WMSswere initially developed to enable low-cost solutions for continuous health monitoring, the applications of WMS-based systems now range far beyond health care. Several research efforts have proposed the use of such systems in diverse application domains, e.g., education, human-computer interaction, and security. Even though the number of such research studies has grown drastically in the last few years, the potential challenges associated with their design, development, and implementation are neither well-studied nor well-recognized. This article discusses various services, applications, and systems that have been developed based on WMSs and sheds light on their design goals and challenges. We first provide a brief history of WMSs and discuss how their market is growing. We then discuss the scope of applications of WMS-based systems. Next, we describe the architecture of a typical WMS-based system and the components that constitute such a system, and their limitations. Thereafter, we suggest a list of desirable design goals that WMS-based systems should satisfy. Finally, we discuss various research directions related to WMSs and how previous research studies have attempted to address the limitations of the components used in WMS-based systems and satisfy the desirable design goals.
Load Balanced Coverage with Graded Node Deployment in Wireless Sensor Networks
Chatterjee P., Ghosh S.C., Das N.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2017, цитирований: 27, doi.org, Abstract
In this paper, to gather streams of data in static wireless sensor networks, a novel graded node deployment strategy is proposed that generates minimum traffic, just sufficient for coverage. Based on this node distribution, a distributed, nearly load-balanced data gathering algorithm is developed to deliver packets to the sink node via minimum-hop paths that also in turn helps to limit the network traffic. An average case probabilistic analysis is done based on perfect matching of random bipartite graphs to establish a theoretical lower bound on the number of nodes to be deployed. Analysis and simulation studies show that the proposed model results huge enhancement in network lifetime that significantly overrides the cost due to over deployment. Hence, this technique offers an excellent cost-effective and energy-efficient solution for node deployment and routing in large wireless sensor networks to operate with prolonged lifetime.
A PUF-Enabled Secure Architecture for FPGA-Based IoT Applications
Johnson A.P., Chakraborty R.S., Mukhopadhyay D.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2015, цитирований: 62, doi.org, Abstract
The Internet of Things (IoT) is a dynamic, ever-evolving “living” entity. Hence, modern Field Programmable Gate Array (FPGA) devices with Dynamic Partial Reconfiguration (DPR) capabilities, which allow in-field non-invasive modifications to the circuit implemented on the FPGA, are an ideal fit. Usually, the activation of DPR capabilities requires the procurement of additional licenses from the FPGA vendor. In this work, we describe how IoTs can take advantage of the DPR capabilities of FPGAs, using a modified DPR methodology that does not require any paid “add-on” utility, to implement a lightweight cryptographic security protocol. We analyze possible threats that can emanate from the availability of DPR at IoT nodes, and propose possible solution techniques based on Physically Unclonable Function (PUF) circuits to prevent such threats.
Energy-Efficient Long-term Continuous Personal Health Monitoring
Nia A.M., Mozaffari-Kermani M., Sur-Kolay S., Raghunathan A., Jha N.K.
Institute of Electrical and Electronics Engineers (IEEE)
IEEE Transactions on Multi-Scale Computing Systems, 2015, цитирований: 126, doi.org, Abstract
Continuous health monitoring using wireless body area networks of implantable and wearable medical devices (IWMDs) is envisioned as a transformative approach to healthcare. Rapid advances in biomedical sensors, low-power electronics, and wireless communications have brought this vision to the verge of reality. However, key challenges still remain to be addressed. The constrained sizes of IWMDs imply that they are designed with very limited processing, storage, and battery capacities. Therefore, there is a very strong need for efficiency in data collection, analysis, storage, and communication. In this paper, we first quantify the energy and storage requirements of a continuous personal health monitoring system that uses eight biomedical sensors: (1) heart rate, (2) blood pressure, (3) oxygen saturation, (4) body temperature, (5) blood glucose, (6) accelerometer, (7) electrocardiogram (ECG), and (8) electroencephalogram (EEG). Our analysis suggests that there exists a significant gap between the energy and storage requirements for long-term continuous monitoring and the capabilities of current devices. To enable energy-efficient continuous health monitoring, we propose schemes for sample aggregation, anomaly-driven transmission, and compressive sensing to reduce the overheads of wirelessly transmitting, storing, and encrypting/authenticating the data. We evaluate these techniques and demonstrate that they result in two to three orders-of-magnitude improvements in energy and storage requirements, and can help realize the potential of long-term continuous health monitoring.
Cobalt Бета
ru en