# Design and Energy Performance Simulation of Content Addressable Memory by # Dual-Threshold CMOS Technology D. Mukhopadhyay\* Niladri Narayan Mojumder\* #### **INTRODUCTION** Figure 1 (Schematic of a Basic CAM Cell) Figure: 1 shows the schematic diagram of a single bit Content Addressable Memory, consisting of five MOS transistors, two cross-coupled inverters and a capacitor. The transistors M1 and M2 are explicitly controlled by the bus wordline. When the word line is set high, transistors M1 and M2 conduct and data bit enters into the latch. Data bit to be stored in the memory comes from the bus bit / search and its compliment from not(bit) / not(search). After the data has been entered into the memory, word line need to be set low to prevent any modification of the stored data. The three additional transistors M3, M4, M5 are # **ABSTRACT** The Content Addressable Memory (CAM) is a class of memory that allows access by data instead of by physical address. On a read access to a CAM, every word is compared to see if it matches the requested data; therefore only requiring one access. CAMs are thus gaining increasing importance due to their parallel pattern of matching property [1]. This property makes them useful in applications such as networking, where a quick search is needed for routing. CAM is also an important hardware primitive to support applications in the fields of artificial intelligence, image processing and databases. The major drawbacks of a CAM as compared to a Random Access Memory RAM are identified as design complexity and energy consumption. Though the energy consumed by a single CAM cell during its entire search operation may not seem to be an issue the power problem gets compounded in a larger capacity. CAM On every access all the elements of a CAM are accessed whereas in a RAM only a portion used is accessed. The energy consumption in CAM structures in a processor can therefore become a major concern. The challenge in the design of a CAM cell is to reduce energy consumption in the compare circuitry. The compare operations are always active and are a major source of energy dissipation. This paper introduces the design and energy performance simulation of a CAM cell by dual-threshold CMOS technology. The new dual threshold CMOS or DTCMOS uses two different threshold voltage MOS transistors in the same chip. In this paper we report the use of MOS transistors of two different threshold voltages (high V<sub>TH</sub> and low V<sub>TH</sub>) to achieve energy-performance optimization used for matching. Of the two pass transistors M3 and M4, only one of the transistors will be activated at a time since the gates are to the opposite sides of the memory cell. If the search line matches the value in the memory cell, M5 will turn off, creating no path to ground for the *matchline*, The match line is precharged every cycle [2]. In order to read from the CAM the data bit corresponding to search is placed on the search and not(search) lines. One of the two pass transistors M3 and M4 will pass a high value to the gate of transistor M5, if no match is found. Otherwise the gate input of M5, would be set low and match line will remain at its pre charged value. The search operation in the CAM is illustrated in the table 1. # **Background** In recent years, scaling methodology for low voltage designs is targeted towards constant field scaling primarily due to two reasons: - As the VLSI technology moves deeper into the submicron region, use of constant voltage scaling gives rise to high-field effects like hot electron degradation. - 2. The expression for dynamic power dissipation in CMOS circuits is approximately given by $$P = C_L V_{DD}^{2} f$$ where $C_L$ is the load capacitance and f is the operating frequency. Lowering $V_{DD}$ thus reduces the power dissipation according to the square law. So the use of low voltage power supply becomes immensely attractive and almost mandatory for any low power design. Present day standard is a 1 volt power supply that offers smaller size and lighter weight devices. However it becomes a different issue as far as the time delay is concerned. The time delay $T_D$ is given by $$T_{D} = C_{L} V_{DD} / A (V_{DD} - V_{T})^{a}$$ where A is a constant and a is another constant between 1.4 and 2 depending upon the technology used. As a result, $V_{DD}$ as low as 1 volt leads to drastic performance degradation. Fig. 2 shows the nature of dependence of propagation delay with $V_{DD}$ . Figure 2 To recoup the performance, the threshold voltage is to be reduced to $0.2V_{DD}$ 0.2 volt. However decreasing $V_{\tau}$ makes the transistors difficult to be driven OFF. The sub-threshold leakage current $I_{SUB}$ is given by $$I_{SUB} = I_{ON} \exp[C(V_{s}-V_{t})] q/kT$$ Where $I_{ON} = current$ at $V_{gs} = V_{T}$ $$C = a constant.$$ The drain current characteristic is shown in figure 3(a) and the sub-threshold region has been shown magnified in figure 3(b). In 3(b), it is seen that for the same $V_{\rm g}$ the sub-threshold leakage current $I_{\rm D1}$ for a low $V_{\rm T}$ transistor is greater than $I_{\rm D2}$ for a high $V_{\rm T}$ transistor. This causes the standby leakage current to increase significantly in low $V_{\rm T}$ transistors. Table: 1 # **World Line Status: Low** | - | BIT | | not (BIT) | | Node: a | | Node: b | | Node : c | Results: | Matchline<br>Jstatus | | | |-------|-----|------|-----------|---|--------------|---------|---------|--|----------|----------|----------------------|---------------------------|------| | | ,0 | | 2 | 1 | 7.1 | | 1 | | 0 | 756 | 1 | Capacitor discharges | Low | | e est | 0 | 100 | | 1 | - 97 | (9 | 0 | | N=x 1 2 | ¥ | 0 | Capacitor remains charged | High | | | 1 | 10 m | 60 | 0 | | * := := | i | | 0 | 7.00 | 0 | Capacitor remains charged | High | | 15 | 1 | | | 0 | - 142<br>- 1 | | 0 | | 1 | 5<br>5 | = :1 | Capacitor discharges | Low | As a result, in the standby or hibernating mode, the low $V_{\tau}$ transistors drain the battery more conspicuously. We cannot afford to have drainage of the power supply in such a rapid manner in devices like cell phones which remain in the standby mode most of the time. So to avoid the frequent charging of the battery and to obtain better battery utilization and longer battery life, the leakage current in the standby mode must be reduced. Figure 4 It is intuitively felt that power optimization can be achieved if we combine high $V_T$ and low $V_T$ transistors in the same chip. The low $V_T$ transistors can be used for the circuit operation to achieve high speed and high $V_T$ transistors can be employed for reducing standby leakage power [3]. This method is known as the dual-threshold voltage CMOS (DTCMOS) technology. #### **Experimental Setup and Results** The spice simulation for the single bit CAM was performed on 0.2um technology with a $V_{DD}$ of 1 volt. The simulations were performed in three different modes: - 1. Using all low V<sub>r</sub> transistors - 2. Using all high V<sub>T</sub>transistors - 3. Using a number of combinations of low $V_{\tau}$ and high $V_{\tau}$ transistors. We have determined the **Energy-Delay Product (EDP)** in the above three modes. The leakage current in the steady state is calculated by the signals fed at the different controlled buses as proposed by the functionality of the CAM circuit. The effect of the different combinations of input vectors on the leakage current for each of the three topologies had been studied (viz. high $V_{\tau}$ mode, low $V_{\tau}$ mode, dual $V_{\tau}$ mode). The circuit had been simulated using Cadence-Spectre simulator. For the three modes mentioned above, the netlists were almost similar. only the appropriate models were to be considered. It is to be noted that leakage power has the same magnitude as the leakage current because $V_{DD} = 1$ volt. Only Ampere (A) is to be replaced by Watt (W). To calculate the energy dissipated by the circuit, we have considered the definite integral of the power distribution over the specified time interval. In calculating delay, we have taken the delay time $T_D$ as the time difference between the instant when the search operation is initiated and the instant when the matchline gets discharged to 90% of its initial pre charged value (1 volt) due to a mismatch. Figure 5 shows the waveforms of the matchline (shown in blue) and the leakage current (shown in green) for a single bit CAM after data mismatch. The wave shapes shown in figure 5 were found to be similar in case of mismatch for all the modes of simulations performed as mentioned in the subsequent sections. In simulation, we have considered 3 different time intervals as defined below: | 0-5ns | The time imerval between writing a data bit into the memory cell and the commencement of the search operation | |------------|---------------------------------------------------------------------------------------------------------------| | 10ns- 15ns | The time interval between the completion of the search operation and simulation stop time | | 0-15ns | Simulation period of the transient response | Figure 6 (Schematic of a NOT gate) We have calculated the **Energy-Delay** product of the single bit CAM using 4 different topologies of NOT gate (as a part of dual $V_{\tau}$ simulation mode) as represented in the following tables: We use the high $V_{\tau}$ models for the transistor M1, M2 and M5; and low $V_{\tau}$ modes for M3, M4 (see figure 1). Table 2 shows the simulation results using 4 different topologies of the inverter shown in figure 5. Table2 | Inverter | | Energy<br>dissipated<br>over the time | Energy<br>dissipated<br>over the time | Energy<br>dissipated<br>over the time | Time<br>delay | Energy-Delay Prodcut over the time | | |------------------------|------------------------|-------------------------------------------------|---------------------------------------|---------------------------------------------------|---------------|---------------------------------------------------|--| | PMOS | NMOS | interval 0-5 ns<br>(in 10 <sup>-16</sup> joule) | | interval 10-15 ns<br>(in 10 <sup>-12</sup> joule) | (in ns) | interval 0-15 ns<br>(in 10 <sup>21</sup> joule-s) | | | Low<br>V <sub>7</sub> | Low<br>V <sub>T</sub> | 5.488708 | 2.007999 | 1.002999 | 2.420000 | 2.427258 | | | Low<br>V <sub>T</sub> | High<br>V <sub>τ</sub> | 4.892251 | 1.992636 | 1.002987 . | 2.420000 | 2.427229 | | | High<br>V <sub>T</sub> | Low<br>V <sub>T</sub> | 0.343227 | 2.003493 | 1.003052 | 2.420000 | 2.427386 | | | High<br>V <sub>T</sub> | High<br>V <sub>T</sub> | 0.784002 | 2.003493 | ■ 1.003036 | 2.420000 | 2.427347 | | We use the high $V_T$ models for the transistor MI, M2; and low $V_T$ modes for M3, M4 and M5 (see figure 1). Table 3 shows the simulation results using 4 different topologies of the inverter shown in figure 5. Table 3 | Inverter | | Energy<br>dissipated<br>over the time | Energy<br>dissipated<br>over the time | Energy<br>dissipated<br>over the time | Time<br>'delay | Energy-Delay Prodcut over the time | | |------------------------|------------------------|-------------------------------------------------|---------------------------------------------------|---------------------------------------|----------------|----------------------------------------------------|--| | PMOS | NMOS | interval 0-5 ns<br>(in 10 <sup>-16</sup> joule) | interval 10-15 ns<br>(in 10 <sup>-16</sup> joule) | | (in ns) | interval 0-15 ns<br>(in 10 <sup>-21</sup> joule-s) | | | Low<br>V <sub>t</sub> | Low<br>V <sub>7</sub> | 12.172900 | 1.368956 | 1.003264 | 2.070000 | 2.076756 | | | Low<br>V <sub>t</sub> | High<br>V <sub>T</sub> | 13.488390 | 1.365603 | 1.003259 | 2.070000 | 2.076746 | | | High<br>V <sub>τ</sub> | Low<br>V <sub>T</sub> | 2.263445 | 1.367655 | 1.003266 | 2.070000 | 2.076761 | | | High<br>V <sub>T</sub> | High<br>V <sub>τ</sub> | 3.035074 | 1.366681 | 1.003251 | 2.070000 | 2.076730 | | Finally, we have performed the simulation using low and high $V_{\tau}$ models for all the transistors of the circuit shown in figure 1. the results have been tabulated in table 4. Table 4 | Simulation<br>mode | Energy dissipated over the time interval 0-5 ns (in 10 <sup>-16</sup> joule)— | Energy<br>dissipated<br>over the time<br>interval 10-15 ns<br>(in 10 <sup>-16</sup> joule) | Energy<br>dissipated<br>over the time<br>interval 10-15 ns<br>(in 10 <sup>-12</sup> joule) | Time<br>· delay<br>(in ns) | Energy-Delay<br>Prodcut over<br>the time<br>interval 0-15 ns<br>(in 10 <sup>21</sup> joule-s) | |---------------------------|-------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|----------------------------|-----------------------------------------------------------------------------------------------| | All Low<br>V <sub>7</sub> | 9.895090 | 1.370438 | 1.003293 | 2.070000 | 2.076816 | | All High | 0.249578 | 5.678755 | 1.002806 | 2.850000 | 2.868025 | | Dual<br>V <sub>T</sub> | 3.035074 | 1.366681 | 1.003251 | 2.070000 | 2.076730 | From the table 4, we came to see that the Energy-Delay Product of single bit CAM cell over a observation time 15ns (including data write and data search operation) using dual $V_{\tau}$ MOS transistors is less than 28% when compared to the Energy-Delay Product of the same using all high $V_{\tau}$ MOS transistors. **Proposed Design of Single** Bit Cam Using Dual V, **MOS Transistors** Figure 7 In the realization of the single bit Content Addressable Memory using dual threshold CMOS technology, the proper placement of the transistors with dual threshold voltage is an issue of utmost importance. The table below illustrates with proper reasons, the manner in which the threshold voltages of the transistors, high or low, are to be assigned. Here one need to mention that if we use low $V_T$ models for the two transistors M1 and M2, the Energy-Delay Product does not vary too much as compared to the scheme using high $V_T$ models for those two transistors. The reason we have placed transistors M1 and M2 of high threshold voltages is that it would reduce the susceptibility of the CAM circuit towards some undesired noise generated in a random manner. When low $V_{\tau}$ model is used for M1 and M2, in the worst situation, the noise may pull the wordline high, enabling the memory cell to write new undesired data into it. Thus probabilistically using high $V_{\tau}$ models for the two aforementioned transistors increases the stability of the circuit. Table 5 | Transistor<br>Name | Threshold<br>Voltage | Reasons | | |--------------------|----------------------|-----------------------------------------------------------------|--| | M1 | High | to prevent any modification of the stored data bit due to noise | | | M2 | High | -do- | | | M3 | Low | to minimize the delay involved inthe search operation. | | | M4 | Low | -do- | | | M5 | Low | -do- | | | M6 | High | dictated by the simulation results | | | M7 | High | -do- | | | M8 | High | -do- | | | M9 | High | -do- | | #### CONCLUSION In this paper we have presented a CAM designed for low energy-delay product. The low energy-delay product is achieved by DTCMOS technology. The simulation results shown in the result section support the effectiveness of the proposed-CAM cell design. It has been noted that the average leakage energy of a single bit CAM using DTCMOS technology is less than that of the basic CAM realized using low V<sub>T</sub> transistors by a magnitude of **4.2** X 10<sup>17</sup> Joule. It is further observed that the maximum delay involved in the search operation (in case of a mismatch) using dual- $V_{\tau}$ technology is less than almost 28% as compared to the scheme using all high $V_{\tau}$ transistors. This illustrates the effectiveness of dual- $V_{\tau}$ transistors in optimizing the energy-delay product in the design of a CAM cell. Such use of dual- $V_{\tau}$ transistors makes the design an energy-aware one, in the sense that it retains the speed-performance of the circuit without compromising on the energy and makes the circuit suitable for low-power (low-energy) operation. ### References - 1. A. Natarajan, D. Jasinski, W. Burleson, R. Tessier. "A hybrid adiabatic content addressable memory for ultra low-power applications", in GLSVLSI 03, April 28-29, 2003, Washington, DC, USA. - 2. G. Thirugnanam, N. Vijaykrishnan, M. J. Irwin. "A novel low power @AM design". Proc. of the 14th AnnualIEEE Int'l ASIC/SOC Conf., pages 198-202, Sept 2001. - 3. Mosin Mondal, "Power performance optimization by multi-threshold CMOS technique", B. Tech Dissertation, ETCE Dept. Jadavpur University, 2001s. <sup>\*</sup> VLSI Design Laboratory, Dept. of Electronics and Telecommunication Eng., Jadavpur Univ. Kolkata - 32