# **RECONFIGURABLE MODEM ARCHITECTURE FOR CDMA BASED 3G HANDSETS**

Ramesh Chembil Palat, \*Jina Kim, \*Jong Suk Lee, \*Dr. Dong S. Ha, \*\*Dr. Cameron Patterson, Dr. Jeffrey H. Reed

MPRG (Mobile and Portable Radio Research Group) \*VTVT (Virginia Tech VLSI for Telecommunications) \*\*CCM (Configurable Computing Lab) Bradley Dept of Electrical and Computer Engineering Virginia Tech Blacksburg, Virginia 24061 USA

### ABSTRACT

Third generation (3G) cellular standards has seen many changes in the design specifications during its evolution. There also exist multiple standards, CDMA2000 and WCDMA, which have different specifications for implementing the digital baseband (DBB) system. Changes in a standard during the development of systemon-a-chip (SoC) can be difficult to accommodate or can make the system less efficient when incorporated at the end of the SoC design cycle. The concept of software radios has gained widespread acceptance in basestations using reconfigurable hardware. However reconfigurable hardware designs for mobile handsets still remain a challenge. In this paper we present a design methodology to develop reconfigurable modem (RM) architecture for CDMA based 3G handsets. Performance comparison between ASIC, RM and DSP implementation of rake receiver processing is presented to demonstrate the relative advantages and disadvantages.

## **1. INTRODUCTION**

Reconfigurable hardware for digital baseband (DBB) processing is rapidly gaining acceptance in multi-mode radios that support multiple standards. Standardization efforts in software radio architectures also have provisions to support reconfigurable platforms. Field programmable gate arrays (FPGAs) based reconfigurable signal processing platforms are now widely being accepted in basestation designs. However, low power and form factor requirements have prevented their use in handsets. Current approaches to multi-mode handset DBB include using multiple ASIC cores for specific data intensive functional blocks with a DSP for control. This is a compromise approach that has limited flexibility and is less efficient in resource utilization. Designing ASICs for functional blocks during evolution of a standard may not be cost effective or can be detrimental. For example, changes in the standard specifications accommodated at the end of an ASIC design cycle can result in faulty chips or affect the

time to market. Custom Computing Machines (CCMs), that are a compromise between flexibility and application specific capabilities, show great promise for future software radio handsets. In this paper we present the design methodology for a CCM based reconfigurable modem that supports DBB processing for CDMA2000 and WCDMA standards.

CDMA2000 and WCDMA are third generation cellular standards that have evolved many times over the last five years. They have many common features from a system level yet differ in many aspects of implementation. The organization of the paper describing Virginia Tech's Reconfigurable Modem (VTRM) is as follows. Section 2 describes the design considerations. Section 3 describes the analysis of key baseband functional blocks considered for support by the reconfigurable modem (RM). Section 4 gives details on processing elements (PEs) chosen. Section 5 describes the VTRM architecture. Section 6 gives the WCDMA rake receiver implementation results on VTRM architecture using SystemC and its comparison to ASIC and DSP performance. We conclude in section 7.

## 2. DESIGN CONSIDERATIONS

The digital baseband (DBB) functional block diagram of a CDMA based 3G handset receiver is shown in Figure 1. The figure also shows the hardware software partitioning that is commonly followed. Most of the data path intensive operations like cell searcher and delay path estimation (DPE), despreading in a rake finger and channel decoders are implemented in ASIC. Control operations and signal processing with reasonable processing and flexibility requirements such as; MRC and channel estimation are delegated to a low power DSP. For a handset such partitioning is crucial to achieve required efficiency where DBB functions can consume a major portion of the total handset power. The implementation of functional blocks in ASIC cores makes the SoC lose out on flexibility, scalability to support additional functionality and resource



Figure 1 Digital Baseband functional block diagram of a mobile receiver

utilization. The tradeoff therefore is between power, area efficiency and flexibility. Reconfigurable modems in handsets that achieve flexibility with better power and area efficiency can be made possible by using CCMs. CCM architecture performance depends on the type of application that it is intended to support. Some of the important factors that need to be considered for CCM based architecture in a wireless modem design are listed as follows:

*Reconfiguration*: This is the most important factor that allows for flexibility in CCM based architecture. We focus on two types of reconfiguration datapath based and control based.

*Granularity*: A CCM architecture is made up of a combination processing elements (PEs) connected to an interconnection network. PEs output depends on the inputs and control signals. A PE is formed by a combination of functional units which perform fixed operation. The granularity and type of PEs selected can affect the area efficiency, interconnection and data flow for a reconfigurable modem.

*Interconnection network*: The type of connection between PEs decides how data is routed. Poor interconnection can cause the architecture to be area inefficient. Hence choice of granularity and functionality of PE impacts interconnection complexity.

*Compiler design*: For a successful acceptance, ease of programming is very important. System designers and DSP engineers are comfortable and productive with a C-like programming environment. Hence the success of a CCM based reconfigurable modem depends on efficient compiler development. An architecture which has a uniform structure for the PEs and its placement lends itself to better compiler design.

An optimal CCM architecture must consider the above factors carefully. Apart form the above factors, the number of PEs required, type of PEs and their placement depend on the application considered. Hence we present a top down design methodology where we look at the baseband receiver design from a communication system perspective and make decisions on architecture from this high level perspective. In the next section we analyze key functional blocks considered for CDMA2000 and WCDMA systems.

## 3. ANALYSIS OF KEY DBB FUNCTIONAL BLOCKS IN WCDMA AND CDMA2000

The primary modem blocks for WCDMA/CDMA2000 identified are cell searcher, rake receiver and channel decoder. The cell searcher performs initial acquisition and cell monitoring. The rake receiver is a diversity receiver designed specifically for CDMA system, where frequency diversity is provided by the fact that the multipath components are practically uncorrelated from one another when their propagation delay exceeds a chip period. Channel decoding enables the receiver to correct errors due to the redundancy provided by convolution or turbo coders. This provides the required FEC capability. The focus of RM design is on these three functions which constitute the basic building blocks of any CDMA baseband design.

*Cell searcher*: The basic functions performed by a cell searcher are: pilot acquisition and synchronization, cell search during handoff and reacquisition after sleep mode. We look at initial acquisition process for CDMA2000 and WCDMA. CDMA2000 is a synchronous system while WCDMA is an asynchronous system. Hence the cell search operation in CDMA2000 is simpler than WCDMA.

In a CDMA2000 system the pilot channel is used to identify the PN sequence from which the timing offset associated with the cell is identified. This is done by using a hypothesis testing of the PN sequence generated by the local PN generator inside the mobile. The length of the PN sequence is 2<sup>15</sup>. There are two parts, the early searcher part and a late searcher part. The late part is half chip delayed. The higher of the correlation results is used for identifying the correct PN offset. To make the acquisition faster more number of searcher blocks can be used. These can be 8, 16 or 32 parallel searcher blocks operating simultaneously testing different time offsets. Figure 2 shows the hypothesis testing carried out by a single searcher block.

The analysis of the block diagram suggests the following hardware requirements: linear feedback shift registers



Figure 2 CDMA2000 cell searcher block

(LFSRs), XOR logic for one bit correlators, multipliers, adder, accumulator and comparator.

The WCMDA cell searcher is more complicated than CDMA2000. It has three stages for synchronization and searching. The first stage is slot synchronization where the slot boundary is acquired by correlating the received signal with primary synchronization code (PSC) in downlink SCH channel [1]. The second stage involves frame synchronization and scrambling code group identification using secondary synchronization codes (SSC) in SCH channel. This again involves correlating the received signal with all possible SSC sequences and identifying the maximum correlation value. These operations are similar to the operations shown in Figure 2 for CDMA2000. The third stage involves scrambling code identification which involves symbol by symbol correlation over CPICH with all the 64 possible scrambling codes within the scrambling code group identified in stage two. More details on cell searcher implementation can be found in [2], [3] and [4].

The analysis of hardware profile for WCDMA cell searcher shows that it requires LFSRs to implement scrambling code generators, XOR logic for one-bit correlation, multipliers and adders for square and add operations, comparators, registers and memory to store intermediate results. It should also be noted that the cell searcher operation results are also used to estimate the delay paths which can be used for finger allocation and management. Next we look at the rake functional block.

*Rake*: Both WCDMA and CDMA2000 have the same functions except for code generators used for despreading. In the case of WCDMA descrambling is followed by despreading using OVSF codes [1]. In CDMA2000 first the codes are despread with PN sequence and then with



Figure 3 Rake finger functions

Walsh codes. The functions in a rake receiver finger include despreading, channel estimation, channel compensation and time tracking with early late samples which is shown in Figure 3. It should be noted that there are two rates of processing. The operations upto despreader requires chip rate processing. The operations after despreading require symbol rate processing which has a much lower rate and also depends on the spreading factor used.

Analysis of hardware profile for rake receiver indicates the use of different types of LFSRs of different lengths and taps for code generators; XOR logic for despread operations; MACs for moving average and channel estimation; multipliers for power estimation; registers, MUXs and adders for deskew operations. Time tracker requires an accumulator, MACs for low pass filter and a comparator.

Channel decoder: CDMA2000 uses rate  $\frac{1}{2}$  and  $\frac{1}{4}$  with constraint length 9 for convolution coding. WCDMA uses rate  $\frac{1}{3}$  convolution coding with constraint length 9. This is typically used in fundamental channels. Convolution codes are used for voice and data. E<sub>b</sub>/N<sub>o</sub> requirement is lower than traditional convolution codes for large block sizes. Turbo codes are also used for high data rates on supplemental channels. For VTRM architecture we decided to analyze a Viterbi decoder rate 1/2 and constraint length 9. Both CDMA2000 and WCDMA have similar hardware profiles. The data processing is on symbols received but the operational frequency required is much higher. The hardware components required include ADD, logic for implementing AND and XOR for computing branch metric; ADD, comparator, RAM and MUXs for add compare and select (ACS) operations. Comparators, MUXs and registers for lowest pick. Trace-back requires more control based operations which may be implemented sequentially.

In this section we have identified hardware components

based on hardware profiling of the key functional blocks which are most frequently used in a mobile handset operation. Based on the available information we construct processing elements flexible enough to support functional blocks of both CDMA2000 and WCDMA. The structure of PEs and the resulting architecture is described in the next section.

### 4. CHOICE OF PE

Based on the analysis of key functional blocks four types of PE structures were classified with one major function to support. They are: bit manipulation, One bit correlation, Multiplication and Accumulation (MAC), and Add Compare Select (ACS). A brief description and justification of the PE structure are given below.

PE\_A: This PE is proposed to support the bit manipulation operations, such as code generation. Code generation is one of the core operations and most frequently used in cell searcher and rake receiver. Its minimum operational clock frequency is the chipping rate. PE\_A structure is shown in Figure 4. Since WCDMA and CDMA2000 use quadrature modulation (QPSK) scheme, PE A consists of two parts to generate the real and imaginary portions of the code at the same time. Each part has a register, a counter, two configurable LFSRs (CLFSRs) and five 2-input logic gates. OVSF (and Walsh) code need 9-bit register, 9-bit counter, nine 2-input logic gates. This can be implemented by cascading two PE As. At the same time, scrambling code, which uses 18-bit LFSRs, can be generated by cascading four PE\_As, PN sequence for CDMA2000, which uses 15-bit LFSRs, can be generated by cascading three PE As. The variable length operations are managed by generating required control signals. The data flow inside PE\_A is from the register and counter to the logic gates, and then to the one of the CLFSR. Note that PE A is optimized for the code generators, but still can be used for other bit manipulation operations.

*PE\_B*: PE\_B is designed to support despreading. Since despreading is accomplished by correlation of the input data and the 1-bit code sequence, it is called 1-bit correlation. 1-bit correlation requires chipping rate for minimum operational clock frequency. The functional units which make up PE\_B are shown in Figure 5. Similar to PE\_A, PE\_B also has two sets internally to obtain the correlation results for real and imaginary parts simultaneously. Each set consists of two XOR gates, an adder (or subtractor), a register and a shifter.







Figure 5 PE\_B structure

*PE\_C*: PE\_C is created to support the MAC operations which are used in filter and channel estimation blocks. The peak detection of the cell searcher, channel compensation and power measurement in the rake receiver use multipliers which can be used from PE\_Cs. Hardware multipliers are costly to implement hence the VTRM architecture tries to limit their number to the minimum possible. Figure 6 shows a basic block diagram of the PE\_C. Since PE\_C is more like a general purpose PE compared with PE\_A and PE\_B, some additional functional units involved in the microprocessors are introduced to improve flexibility and scalability. There are four internal registers to store the intermediate values, a status register to process the conditional statements, and a 1's population counter.

*PE\_D*: PE\_D performs the ACS operations in a Viterbi decoder. PE\_D can also be used as a comparator for time tracker in rake receiver and for peak detector in cell searcher. Though PE\_D is optimized for Viterbi decoder it is closer to general-purpose. Therefore, similar to PE\_C, some additional general purpose FUs are included to provide extra flexibility. The FUs include adder, comparator, selection multiplex, four internal registers to store the intermediate values, a status register to process the conditional statements, logic gates, and 1's population counter. The block diagram of PE\_D is shown in Figure 7.



Figure 6 PE\_C block diagram



Figure 7 PE\_D block diagram

#### **5. VTRM ARCHITECTURE**

The PE structures developed in section 4 are used as building blocks for VTRM architecture. The model of the architecture developed is hierarchical. The data flow in the receiver is from chip rate processing to symbol rate processing. Therefore, we form two types of blocks one consisting of PE\_As and PE\_Bs and the other a combination of PE\_Cs and PE\_Ds. These blocks are named AB and CD blocks respectively. Each block is again broken down to a lower level consisting of modules. A module is a combination of 4 PEs of the same type. The same types were chosen as it was seen that use of concatenation of CLFSRs for code generation, parallel correlators in rake receiver, parallel MACs and parallel ACS units improve performance in receiver functions. AB block was then formed using 4 PE B modules and 1 PE A module. This combination was chosen to maximize resource utilization for chip rate processing. Similarly for CD block a combination of 4 PE D modules and 1 PE C module was chosen. One local memory bank is also included per PE block. Figure 8 shows AB and CD block placement. The interconnection and control network for



Figure 8 AB and CD blocks in VTRM architecture

reconfiguration of the architecture is described as follows. For each level of hierarchy, there are controllers to generate the necessary control information. The PE controllers generate routing control, I/O control; enable control, operation control, and local memory access control signals for PEs. The PE module controllers generate PE selection control and local memory access control signals for PE modules. Each global memory and local memory bus also have their own controllers to generate the memory access and bus access control signals. The routing inside PE\_A and PE\_B is fixed, but is reconfigurable for PE C and PE D. Within a PE module data path exists only for the horizontal and vertical directions. The data paths between the PEs inside a PE module are bidirectional, which can be reconfigured according to the input control and output control In case of PE A module, there exist information. additional direct connections between the CLFSRs to support concatenation of PE\_As for long LFSR implementation. At the block level, in case of AB block, one PE A module can only communicate with one specific PE inside a PE\_B module. Similarly in CD block one PE\_D module can only communicate with a specific PE inside the PE C module. In PE blocks, local memory is the gateway to transfer data between PEs under different PE blocks and between PEs and global memories.

A global memory called *transfer memory* is inserted to provide data path from AB blocks to CD blocks. Instead of connecting all PE blocks on one large bus, using a memory (synchronous SRAM) reduces power dissipation significantly and provides temporary storage for AB block outputs which can then be accessed by CD blocks multiple times. Another global memory is *rake-viterbi memory*, which is accessed by CD blocks. The rake receiver outputs for a frame are stored in this memory to be used in the Viterbi decoder operation. The cell searcher operation may also require this storage for codes used in correlation, thus one global memory is added for access from AB blocks. This completes the RM



Figure 9 VTRM architecture overview

architecture design. Figure 9 shows the complete architecture for VTRM.

#### 6. RAKE RECEIVER IMPLEMENTATION

The VTRM architecture was implemented and synthesized using SystemC, more details of which can be found in [6]. For the measurement of the power dissipation, circuit complexity, and the critical path delay, we used Synopsys Design Compiler. The cell library used in our experiments was TSMC 0.18 µm technology with 1.8 V supply voltage obtained from Artisan. To make a comparison between ASIC, VTRM and DSP, a 4 channel 6 finger rake was implemented for ASIC and on VTRM architecture. For a DSP a 1 channel 2 finger Rake was implemented on TI's TMS320C5416 and the results interpolated for 4 channel 6 finger case. Table 1 and Table2 shows the comparison between ASIC and VTRM implementations and VTRM and DSP implementation. The ASIC implementation has lower circuit complexity and critical path delay compared to VTRM. However the VTRM architecture meets the delay specification for target application (16.3 ns) but offers higher flexibility. In the case of low power (160 mW) 160 MIPS DSP it is highly programmable but cannot meet the 3G real time data rate requirement. Hence RM out performs the DSP easily. It is clear from the above results that RM architecture needs to be refined in order to achieve ASIC like performance.

### 7. CONCLUSIONS

In this paper we discussed the importance of introducing CCM based design for mobile software radio handsets. Analysis of baseband receivers for CDMA2000 and WCDMA was presented to bring out salient signal processing features for choosing PEs in the reconfigurable modem architecture. VTRM is a modular reconfigurable architecture that can support cell search; rake and Viterbi functional blocks of a CDMA system. The methodology presented for VTRM architecture can be extended to create new PE structures and hence additional functionality like Turbo decoders in future. Comparison

Table 1 Comparison of ASIC and RM

|      | Circuit<br>complexity<br>(#NAND2<br>gates) | Critical<br>path delay<br>(nsec) |
|------|--------------------------------------------|----------------------------------|
| ASIC | 5531333<br>(5.5 M)                         | 7.53                             |
| RM   | 6892836<br>(6.9 M)                         | 13.02                            |

Table 2 Comparison of RM and DSP

| Functional<br>block                                      | No of cycles<br>VTRM<br>implementation<br>(16x chip rate<br>) | No: of cycles<br>TI C54 DSP<br>Implementation<br>(160 MIPS) |
|----------------------------------------------------------|---------------------------------------------------------------|-------------------------------------------------------------|
| Descramble<br>&<br>despread                              | 16395<br>12 AB blocks                                         | 299,220                                                     |
| Channel<br>estimation<br>(Moving<br>average<br>included) | 16402<br>Reuse above<br>AB blocks                             | 4,266                                                       |
| Channel compensation                                     | 16468<br>6 CD blocks                                          | 164,616                                                     |
| MRC                                                      | 16478<br>1 CD block                                           | 10,434                                                      |
| Total                                                    | N.A.                                                          | 478,536                                                     |

of RM performance to ASIC and DSP implementations were shown for rake receiver implementation. Future evolution of the architecture will require focus on refining the architecture to extend support for other DBB functionalities and the compiler design.

## 8. REFERENCES

[1] "3GPP TS 25.201 V3.4.0 (2002-06)" 3<sup>rd</sup> generation partnership project; technical specification group radio access network; physical layer- general description, release 1999.

[2] Bahl, S.K., Plusquellic J., Thomas, J., "Comparison of initial cell search algorithms for W-CDMA systems using cyclic and comma free codes," *Circuits and Systems, 2002. MWSCAS-2002. The 2002 45th Midwest Symposium on*, pp. III-192 - III-195 vol.3., 4-7 Aug. 2002

[3] Wang, Y.-P.E.; Ottosson, T. "Cell search in W-CDMA," Selected Areas in Communications, IEEE Journal on, pp.1470 – 1482, Volume: 18, Issue: 8, Aug. 2000

[4] Hanada, Y., Higuchi, K., Sawashi, M., "Three-step cell search algorithm for broadband multi-carrier CDMA packet wireless access," *Personal, Indoor and Mobile Radio* 

*Communications*, 2001 12th IEEE International Symposium on, pp. G-32 - G-37 vol.2 Volume: 2, 30 Sept.-3 Oct. 2001

[5] Alsolaim A., Becker J., Glesner M., Starzyk J., "Architecture and application of a dynamically reconfigurable hardware array for future mobile communication systems," *Field-Programmable Custom Computing Machines*, 2000 IEEE Symposium on , pp. 205 – 214, 17-19 April 2000

[6] Jina Kim, Dong Sam Ha, J.H. Reed, "A New Reconfigurable Modem Architecture for 3G Multi-Standard Wireless Communication Systems" *Circuits and Systems, ISCAS IEEE International Symposium,* pp. 1051 – 1054, May 2005