# **Prototype Implementation of a Real-Time 8x8 MIMO LTE-Advanced Link**

# Amal Ekbal, Ph.D. (amal.ekbal@ni.com) Wireless Innovation Forum 2011



# Outline

- Overview of National Instruments
  - ~5 slides, ~5 minutes
- A crash course on LTE-Advanced
  - ~25 slides, ~45 minutes
- Prototyping LTE-Advanced Downlink using Graphical System Design Tools
  - ~30 slides and code review, ~1 hour



# **National Instruments Overview**



# **National Instruments: Key Stats**

- Founded in 1976 and HQ in Austin, TX
- 30+ years growth and profitability
  - \$873M revenue in 2010 (+29% YOY) and 17% operating income
  - \$255M revenue in Q3 2011 (+16% YOY)
- 6,000+ employees
- Operations in 50+ countries
- FORTUNE's 100 Best Companies to Work For list for 12 consecutive years
- FORTUNE's 25 Best Multinational Companies to Work For 2011
- Strong investment in R&D for new product development
- Over 30,000 customers
- Over 7000 universities







#### **The National Instruments Vision**

#### **Graphical System Design**

#### **Test and Measurement**

Automated Test Data Acquisition Reconfigurable Instruments **Real-Time Systems** Software-Defined Radio Embedded Monitoring Hardware-in-the-loop

#### **Industrial and Embedded**

Industrial Control (PAC) Machine Control Electronic Devices

"To do for test and measurement what the spreadsheet did for financial analysis."

"To do for embedded what the PC did for the desktop."



#### From Concept to Prototype ... Rapidly!







#### A Highly Productive Graphical Development Environment for Engineers and Scientists





#### **System Design to Deployment**





#### **Solving the Toughest Problems on Earth**





# **3GPP Long Term Evolution (LTE) Basics**



#### **Mobile Data Demand: Large Growth Expected**



Source: "Mobile Traffic Growth 2010-2020," UMTS Forum report 44, prepared for UMTS Forum by IDATE C&R, May 2011.

- Subscriber numbers are growing fast
  - > 5.7 billion wireless subscriptions worldwide
  - 1.4 billion of those are 3G capable subscriptions
  - By 2015, expected to have 3.2 billion 3G subscriptions
- Data intensive services are growing fast
- >33x growth predicted from 2010 to 2020



# **Objectives of LTE**

- Ensure continuing competitiveness of 3GPP technologies for the future
- LTE design goals
  - Efficiently deliver higher data rates
    - Multiple antenna systems
    - Carrier aggregation
  - Efficiently leverage different types of spectrum resources
    - Spectrum chunks of different sizes including wider bands
    - Spectrum with specific regulatory requirements (e.g. TDD)
  - Enable more advanced cell network and receiver architectures
    - Heterogeneous networks
    - Relaying functionality
    - Advanced interference cancellation and management
  - Complement existing 3G systems



# **IMT Advanced and LTE**

|                             |                      | IMT Advanced<br>Requirements | LTE<br>Release 8 | LTE Advanced<br>Release 10         |
|-----------------------------|----------------------|------------------------------|------------------|------------------------------------|
| Transmission Bandwidth      |                      | at least 40 MHz              | up to 20 MHz     | up to 100 MHz                      |
| Peak Data Rate              | Downlink             | 1 Gbps                       | 300 Mbps         | 1 Gbps                             |
|                             | Uplink               |                              | 75 Mbps          | 500 Mbps                           |
| Peak Spectral<br>Efficiency | Downlink             | 15 b/s/Hz                    | 16 b/s/Hz        | 16 b/s/Hz [4x4]<br>30 b/s/Hz [8x8] |
|                             | Uplink               | 6.75 b/s/Hz                  | 4 b/s/Hz         | 8 b/s/Hz [2x2]<br>16b/s/Hz [4x4]   |
| Latency                     | <b>Control Plane</b> | < 100 ms                     | 50 ms            | 50 ms                              |
|                             | User Plane           | < 10 ms                      | 5 ms             | 5 ms                               |



# **Commercialization Timeline of 3GPP Evolution**



DL: Downlink; UL: Uplink; MC: Multi-carrier

- Strong evolution path for existing networks (HSPA/EV-DO)
- LTE as the global standard of the future
  - 248 commitments, 35 launches by operators as of October 2011
  - 103 launches predicted by end of year 2012
  - LTE public safety network solutions launched in US

Operator data source: The Global Mobile Suppliers Association, http://www.gsacom.com



#### **LTE Air Interface: Slot Structure**



#### • LTE follows a frame structure for all transmissions

- 10ms radio frame
- One radio frame consists 10 sub-frames and 20 slots

Figure source: 3GPP TS 36.211 v10.3.0, "TSGRAN E-UTRA Physical Channels and Modulation," September 2011.



# **LTE Air Interface: Duplexing Options**



 Does not require duplexer in terminal



Figure source: D. Astely, et al., "LTE: The Evolution of Mobile Broadband," IEEE Communications Magazine, April 2009.



ni.com

USA, Europe, Japan

### **LTE Air Interface: Downlink Modulation**



• Orthogonal Frequency Division Multiple Access (OFDMA)

- FFT-based modulation and demodulation
- Converts a broadband channel into multiple narrowband sub-channels
- Resilient to wireless multipath fading
- Low complexity frequency-domain equalization
  - Easier implementation of advanced receiver architectures
- Scalability with spectrum
- Some design requirements are more stringent than single-carrier schemes
  - Frequency offset and phase noise requirements
  - Linearity of transmit power amplifier





Right figure source: 3GPP TS 36.211 v10.3.0, "TSGRAN E-UTRA Physical Channels and Modulation," September 2011.



# **LTE Air Interface: Downlink Bandwidths**

| Transmission BW<br>(MHz)                   |          | 1.4                           | 3    | 5    | 10    | 15    | 20    |  |  |  |
|--------------------------------------------|----------|-------------------------------|------|------|-------|-------|-------|--|--|--|
| Subframe duration                          |          | 1.0 ms                        |      |      |       |       |       |  |  |  |
| Subcarrier spacing                         |          | 15 kHz                        |      |      |       |       |       |  |  |  |
| Sampling frequency<br>(MHz)                |          | 1.92                          | 3.84 | 7.68 | 15.36 | 23.04 | 30.72 |  |  |  |
| Number of occupied subcarriers             |          | 73                            | 181  | 301  | 601   | 901   | 1201  |  |  |  |
| Number of<br>OFDM symbols<br>per sub frame |          | 14/12<br>(Normal/Extended CP) |      |      |       |       |       |  |  |  |
| CP length<br>(µs)                          | Normal   | 4.69 × 6, 5.21x1              |      |      |       |       |       |  |  |  |
|                                            | Extended | 16.67                         |      |      |       |       |       |  |  |  |



# **LTE Air Interface: Uplink Modulation**



• Single-Carrier Freq. Division Multiple Access (SC-FDMA)

- DFT-Spread OFDM (DFTS-OFDM)
- Lower peak-to-average power ratio compared to OFDM
  - Lower linearity requirements for user equipment (UE) transmitters
- Less sensitivity to carrier frequency offset and phase noise
- Parameters same as DL
  - The subcarrier spacing is 15kHz
  - Resource Block (RB): 12 consecutive sub-carriers in one slot



## **LTE Access Procedure Step 1: DL Sync Signals**



- Starts with energy detection in the allotted bands to obtain rough carrier estimation
- Primary Synchronization Signals (PSS)
  - 3 sequences, one of which is repeated every 5 ms
- Secondary Synchronization Signals (SSS)
  - SSS part 1 in slot 0, SSS part 2 in slot 10
  - 168 possible sequences
- PSS and SSS together determines cell identity



# LTE Access Proc. Step 2: DL Reference Signals



- LTE provides Reference Signals (RS) for each antenna port
  - Channel feedback, radio resource management and demodulation
  - The figure shows LTE Release 8 DLRS for 4 antenna case
- LTE-Advanced improves RS overhead efficiency
  - LTE Release 8 RS does not scale efficiently for 8x8, beamforming etc.
  - CSI-RS: Low overhead RS for channel feedback
  - User specific DM-RS: For more accurate channel estimation for demod

Figure source: 3GPP TS 36.211 v10.3.0, "TSGRAN E-UTRA Physical Channels and Modulation," September 2011.



## **LTE Access Procedure Step 3: System Info**



- Physical Broadcast Channel (PBCH)
  - Contains Master Information Block (MIB)
- MIB contains critical cell info required for cell acquisition
  - Downlink bandwidth, Downlink antenna ports, HARQ channel config etc.
- 40ms periodicity
- Read other System Information Blocks (SIB)
  - Multiplexed with data transmissions



## **LTE Access Procedure: DL Operation of UE**

- Read Physical Control Format Indicator Channel (PCFICH)
  - Gives parameters required to access control channel (PDCCH)
- Access the information in PDCCH to know what to do
  - User specific control information
    - DL and UL channel scheduling information for UE
    - Transmit power control information for UE
    - MIMO precoding
    - HARQ information
  - Common control information
    - System information, paging information etc.
- Receive on Downlink Shared Channel (PDSCH)
- Put HARQ information and channel feedback in Uplink Control Channel (PUCCH)



#### **LTE Access Procedure: UL Operation of UE**

- Random Access Channel (PRACH) for initial access
  - If the UE is in disconnected, lost synch, hand-over etc.
- Read schedule assignment from the downlink control channel (PDCCH)
- Transmit data on Uplink Shared Channel (PUSCH)
- Read downlink HARQ Indicator Channel (PHICH)
  - HARQ ACK/NACK info for uplink transmissions
- Further requests can go on Uplink Control Channel (PUCCH)
  - If data is to be transmitted soon, can send this info on PUSCH



# LTE Release 10 and Beyond Key Features



# **LTE Rel 10 Features: Carrier Aggregation**



- LTE Release 10 provides for aggregation of up to 5 carriers
  - Provides a solution for the spectrum fragmentation
  - Carriers can accumulate pieces of 5MHz, 10MHz, 20MHz spectrum to boost user experience
- Up to 100MHz spectrum by aggregating 20MHz carriers
  - Most deployments expected to be limited to 2x in the beginning
- Each carrier will be backward (Release 8) compatible
  - R11+ may remove this requirement



# LTE Rel 10 Features: MIMO

#### • Enhanced Single User MIMO (SU-MIMO)

- Downlink MIMO enhanced
  - 8x8 MIMO support
  - Enhanced from 4x4 MIMO in Rel 8
  - Better reference signal structure enables techniques such as non-codebook based pre-coding (beamforming etc.)
- Uplink MIMO supported
  - 4x4 MIMO support
  - Enhanced from 1x2 Rx diversity in Rel 8
- Enhanced DL Multi User MIMO (MU-MIMO)
  - 4 layers and up to 2 layers/user
  - SU/MU MIMO dynamic switching
  - Better channel feedback methods to support MU-MIMO



SU-MIMO Scheduled in Resource B



MU-MIMO Scheduled in Resource A



#### **Heterogeneous Network Architecture**



- HetNets may be how the networks look in the future
  - LTE Rel 8 and onwards have provided features to optimize such networks

Figure source: 3GPP TS 36.300 v10.5.0, "TSGRAN E-UTRA and E-UTRAN Overall Description Stage 2," September 2011.



## LTE Rel 10 Features: Relay



- Relay Nodes (RN) controlled by Donor eNodeB (DeNB)
  - RN's are full fledged eNB's with lower power , but with wireless backhaul
    - Outdoor: 250mW to 2W, Indoor: 100mW
  - Deployment in areas where wired backhaul is not available or expensive
- Different types of RN's supported
  - In-band v/s Out-band; Transparent v/s Non-transparent

Figure source: 3GPP TS 36.300 v10.5.0, "TSGRAN E-UTRA and E-UTRAN Overall Description Stage 2," September 2011.



# Macro, Pico and Femto

- Macro cell eNodeB
  - The "big" cell transmitting 5W 40W power
- Pico cell eNodeB
  - Full featured eNodeB's used to solve
    - Macro coverage holes
    - Better capacity in hot-spots
  - Lower power
    - Outdoor: 250mW to 2W, Indoor: 100mW
- Femto cells: Home eNodeB (HeNB)
  - ~100mW or lower
  - Typical deployment is for in-building coverage
    - Individual users and small business/office
- Interference management is needed

Figure source: 3GPP TS 36.300 v10.5.0, "TSGRAN E-UTRA and E-UTRAN Overall Description Stage 2," September 2011.



Macro and Pico Cells Femto Cells



# LTE Rel 10 Features: elClC

- Interference can limit effectiveness of HetNets
  - Macro can create a lot of interference to the pico and femto cells
  - The pico and femto cells can create interference to UE connected to Macro
    - A significant issue since femtocells usually only allow registered UEs to connect (Closed Subscriber Group CSG)
- Hence, LTE Rel 8 introduced limited Inter Cell Interference Coordination (ICIC)
- LTE Rel 10 made this much more powerful: Enhanced ICIC (eICIC)



# **LTE Rel 10 Features: elCIC**



#### • Concept of eICIC

- Macro and pico can communicate via backhaul (X2 interface)
- The goal is to minimize the interference between them

#### • Almost Blank Subframe (ABS)

- Dynamic resource partitioning on sub-frame level
- Can change partitioning every 40ms
- Range expansion
  - Use advanced receivers at UE to enable larger coverage for pico



## LTE Rel 11 and Beyond: Focus Areas

- A key area of focus in Rel 11 and beyond will be to refine existing features, such as
  - Carrier aggregation enhancements
    - E.g.:- Improve channel feedback
  - Further enhancements of ICIC
    - More capacity required in control channels
    - Better Femto cell support: No current X2 backhaul support for Femto
- Advanced UE receivers
  - Inteference cancellation/rejection receivers
- Upper layer improvements
  - Relays, Self organizing networks (SON), Minimizing drive test (MDT), Machine type communications (MTC), etc.



#### LTE Rel 11 and Beyond: CoMP



Joint Processing



**Coordinated Scheduling / Beamforming** 

- Coordinated Multi Point (CoMP)
  - Distributed/cooperative MIMO
- Challenging to enable this feature and get significant gains
  - Trade-off between control channel and backhaul overhead and performance gains



# LTE Advanced 8x8 MIMO Prototype System Design Details


## **LTE Advanced Downlink Testbed**

#### **Traditional Test Bed**

- 8x8 MIMO
- 80 Mbps Data Rate
- 5 MHz Bandwidth



Six 18-inch Racks

#### **NI PXI Approach**

- 8x8 MIMO
- ~1000 Mbps Data Rate
- 2 x 20 MHz Carrier Aggregation
- > 10x performance at 1/10 cost



TX Chassis

**RX** Chassis

Two 18-inch PXI Chassis



# **Demo Video**

Youtube Link: http://www.youtube.com/watch?v=wklxfXGQ\_7s



## **PXIe Based RF and Communication Solutions**







PC/Mac/Linux

- Instrumentation quality RF
- Multiple Computational Resources
  - Host PC
  - High performance multi-core processor with Windows or RTOS
  - High performance FPGA
- Highly Modular solution
- A unified software platform with LabVIEW
  - LabVIEW Graphical code, c code, .m files, VHDL etc.



## **PXI Express Embedded Controller NI PXIe-8133**



- High-performance controller capable of supporting complex system requirements
  - 1.73GHz Intel Core i7-820QM quad-core processor
    - Can run in Windows or Real-Time OS or mixed mode
  - Up to 8GB/s system bandwidth and up to 2GB/s slot bandwidth
  - 2GB dual-channel 1333MHz DDR3 standard (max 8 GB)
  - Two 10/100/1000Base-TX Ethernet ports and 4 Hi-Speed USB ports
  - Dual-monitor support



## FlexRIO FPGA Module NI PXIe-7965R



#### • FlexRIO family provides variety of FPGA targets

- FPGA programmable using NI LabVIEW FPGA module software
  - Can also bring in existing VHDL code
- I/O adapter modules provide customizable digital and analog I/O
- Peer-to-Peer streaming capability
- PXIe-7965R board contains Virtex-5 SX95T
  - 14720 FPGA slices, 640 DSP slices
  - Memory: 8784kb FPGA block RAM, 512MB onboard DRAM



## **ADC and DAC as FlexRIO Adapter Modules**



- NI 5781 baseband transceiver module used for DAC feature
  - 16-bit I and Q differential output at 100MS/s each
  - 40MHz baseband bandwidth (3dB)
  - 14-bit dual input; single ended I/O features not used in test bed
- NI 5762 dual ADC module
  - Two 16-bit channels at 250MS/s each
  - Unfiltered version is used in testbed
  - 100MHz filtered version is available



## **PXI Express Chassis NI PXIe-1075**



#### • PXI/PXIe chassis with high-bandwidth backplane

- 17-slots
  - 8 PXIe slots, 8 PXI/PXIe slots, 1 PXIe system timing slot
- Up to 1GB/s per-slot per-direction dedicated bandwidth (x4 PCIe)
- Backplane enables peer-to-peer data streaming between FPGA modules and other such select NI modular instruments



## **RF Up and Down Conversion**

- Instrument class PXI/PXIe modules available for up/down conversion and LO generation
- NI PXIe-5611 upconverter module
  - 85MHz to 6.6GHz frequency range
  - Real-time RF bandwidth = 50MHz
- NI PXIe-5601 downconverter module
  - 10MHz to 6.6GHz frequency range
  - Testbed uses IF = 187.5MHz setting.
  - Real-time RF Bandwidth = 100MHz
- NI PXI-5652 LO source module
  - 500kHz to 6.6GHz



#### LTE-A 8x8 MIMO Tx Block Diagram: Hardware







#### LTE-A 8x8 MIMO Rx Block Diagram: Hardware





#### LTE-A 8x8 MIMO Rx Block Diagram: Functional



## LTE Advanced 8x8 MIMO Prototype Case Study: LMMSE Detector Design



## **LabVIEW Design Flow**





#### **Code Partitioning on Hardware Targets**





#### **Linear MMSE Detector**



- MIMO model for each OFDM subcarrier  $\mathbf{y} = \mathbf{Hs} + \mathbf{n}; \quad \mathbf{n} \sim CN(0, \sigma^2 \mathbf{I})$
- Linear MMSE Detector

$$\mathbf{s}_{est} = \mathbf{W}_{mmse}\mathbf{y} = \left(\mathbf{H}^*\mathbf{H} + \sigma^2\mathbf{I}\right)^{-1}\mathbf{H}^*\mathbf{y}$$
$$\operatorname{Var}[\mathbf{n}_{mmse,i}] = \sigma^2 \left[\left(\mathbf{H}^*\mathbf{H} + \sigma^2\mathbf{I}\right)^{-1}\right]_{ii}$$



## **QR-Based Algorithm 1: Squared MMSE**

- QR decomposition can simplify the matrix inversion
- Squared MMSE formulation

$$\mathbf{A} = \mathbf{H}^* \mathbf{H} + \sigma^2 \mathbf{I} = \mathbf{Q} \mathbf{R}$$
$$\mathbf{A}^{-1} = \mathbf{R}^{-1} \mathbf{Q}^*$$

• Linear MMSE Detector

$$\mathbf{W}_{\text{mmse}} = \mathbf{R}^{-1}\mathbf{Q}^{*}\mathbf{H}^{*}$$
$$\operatorname{Var}[\mathbf{n}_{\text{mmse},i}] = \sigma^{2}[\mathbf{R}^{-1}\mathbf{Q}^{*}]_{ii}$$

• Inverting R is low complexity since it is upper triangular



#### **QR-Based Algorithm 2: Square-Root MMSE**

• QR conducted on the "square-root" matrix

 $\mathbf{B}^*\mathbf{B} = \mathbf{H}^*\mathbf{H} + \sigma^2\mathbf{I}$  $\mathbf{B} = \begin{vmatrix} \mathbf{n} \\ \sigma \mathbf{I} \end{vmatrix} = \mathbf{Q}\mathbf{R} = \begin{vmatrix} \mathbf{Q}_1 \\ \mathbf{Q}_2 \end{vmatrix} \mathbf{R}$  $\sigma \mathbf{I} = \mathbf{Q}_2 \mathbf{R}; \quad \mathbf{R}^{-1} = \frac{\mathbf{Q}_2}{\mathbf{Q}_2}$ • Only Q matrix needs to be explicitly computed  $\mathbf{W}_{\text{mmse}} = \frac{\mathbf{Q}_2 \mathbf{Q}_1^*}{\sigma}$  $\operatorname{Var}[n_{\text{mmse i}}] = \left[\mathbf{Q}_{2}\mathbf{Q}_{2}^{*}\right]_{ii}$ 



#### **QR: Modified Gram-Schmidt Algorithm (MGS)**

$$\begin{bmatrix} \mathbf{H} \\ \sigma \mathbf{I} \end{bmatrix} = \begin{bmatrix} \mathbf{v}_1 \ \mathbf{v}_2 \ \cdots \ \mathbf{v}_N \end{bmatrix}$$
  
for  $i = 1$  to  $N$   
 $r_{ii} := \|\mathbf{v}_i\|, \mathbf{u}_i := \frac{\mathbf{v}_i}{\|\mathbf{v}_i\|}$   
for  $j = i + 1$  to  $N$   
 $r_{ij} := \mathbf{u}_i^* \cdot \mathbf{v}_j$   
 $\mathbf{v}_j := \mathbf{v}_j - r_{ij} \cdot \mathbf{u}_i$   
end  
end





## **Other Algorithms for QR Decomposition**

- Givens rotation can be used to obtain QR decomposition
- Main advantage of Givens rotation
  - Lower bit precision requirement compared to MGS
  - However, squared MMSE-based MGS can perform almost as well if dynamic scaling is implemented
- Main disadvantage of Givens rotation
  - Givens is CORDIC-heavy, and can consume a lot of resources
  - In FlexRIO FPGA target used in the testbed, dedicated multipliers are available and this suits Gram-Schmidt algorithm

Reference: Hun Seok Kim, Weijun Zhu, Jatin Bhatia, Karim Mohammed, Anish Shah, and Babak Daneshrad, "A Practical, Hardware Friendly MMSE Detector for MIMO-OFDM-Based Systems," EURASIP Journal on Advances in Signal Processing, vol. 2008



#### **LMMSE Detector: High-Level Architecture**





## **System-Level Floating Point Simulations**



- Once the system blocks are understood, the next step is system-level floating point simulations
  - Use graphical LabVIEW host code or text-based Mathscript (shown above)



## **Sub-System Mapping and FPGA Considerations**

- Sub-system mapping
  - Consider throughput, latency, complexity, etc.
  - For LMMSE, for the timing requirements to be satisfied, an FPGA implementation is needed
- FPGA development considerations (very brief)
  - Conversion to fixed-point
    - Use fixed-point data type to simulate and verify performance
    - Decision based on complexity, performance and required dynamic range
  - Comparison with "golden" floating-point simulator
  - Bit and cycle-accurate simulations
    - ModelSim and ISim support



#### LMMSE Detector: Top-Level LabVIEW FPGA





## **QR Decomposition: Top-Level LabVIEW FPGA**



- In this design, matrix elements are processed serially
  - 128 clock cycles required for QR decomposition of a 16x8 matrix
- Two functional blocks required
  - Normalize a vector, Orthogonalize w.r.t to a vector



## **QR Decomposition: Another Version**



- Once the basic functional blocks are coded, the modular design enables exploration of design trade-offs
- In the above design, the QR decomposition runs 8x faster
  - 16 clock cycles per 16x8 matrix
  - The trade-off is increased resource usage



## **Levels of Abstraction in GSD**



#### • LabVIEW FPGA provides various levels of abstraction

- Can use inbuilt signal processing libraries, Xilinx COREGEN IP blocks, etc.
- Experienced programmers can access very low level details
- For example, in the above code segment used as part of vector normalization, the designer configures the Xilinx DSP48E blocks directly



## **FPGA Resource Utilization for Testbed**

|                    | ТХ            | <b>RX OFDM</b> | <b>RX MIMO</b> |
|--------------------|---------------|----------------|----------------|
| <b>Total Slice</b> | 13300 (90.4%) | 10433 (70.9%)  | 12431 (84.4%)  |
| Slice Register     | 35482 (60.3%) | 27409 (46.6%)  | 38361 (65.2%)  |
| Slice LUT          | 30008 (51.0%) | 22902 (38.9%)  | 31683 (53.8%)  |
| DSP48E             | 60 (9.4%)     | 164 (25.6%)    | 188 (29.4%)    |
| <b>Block RAM</b>   | 215 (88.1%)   | 118 (48.4%)    | 208 (85.2%)    |

- Tx FPGA : generation of data for one carrier (8 Tx antennas) and baseband for one Tx antenna
- Rx OFDM FPGA: synchronization and FFT for data from 2 Rx antennas
- Rx MIMO FPGA: channel estimation and LMMSE detection for one carrier (8 Rx antennas)



# Conclusion



## **GSD for RF and Communications**





- Common tool flow for system design
- Reduced learning curve, faster time to result
- Abstract without restricting access
- Enables domain expert to access hardware design



#### From Concept to Prototype ... Rapidly!





#### **GSD for RF and Communications**



## **Summary**

- LabVIEW Graphical System Design Platform
  - Combines scalable software and tightly integrated hardware
  - Ideal for rapid prototyping of communication systems
  - Prove out theory and algorithms in real-time with live wireless signals
- Interested?
  - amal.ekbal@ni.com



# **Backup on QR Decomposition**



## **QR Decomposition: Normalize Operation**



- Input : length-16 complex vector v, one element per cycle
- Output : normalized vector u, one element per cycle

$$\mathbf{u} = \frac{\mathbf{v}}{\|\mathbf{v}\|}$$

**x** 7

- Latency = 69 clock cycles
- Throughput = 1 vector per 16 clock cycles



## **QR Decomposition: Orthogonalize Operation**



- Input: length-16 complex vectors u and v
- Circular buffer: stores length-16 complex vector u
- Output: length-16 complex vector  $v (u \cdot v)u$
- Latency = 24 clock cycles
- Throughput = 1 vector per 16 clock cycles


## Simplified Look at QR Block Timing Diagram

• For understanding the timing, assume the input is a16x3 matrix

|                | Input: $\mathbf{H} = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \mathbf{v}_3 \end{bmatrix} \in \mathbf{C}^{16 \times 3}$ ; Output: $\mathbf{Q} = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \end{bmatrix} \in \mathbf{C}^{16 \times 3}$ ; $\mathbf{H} = \mathbf{Q}\mathbf{R}$ |                    |                       |                                             |            |                |                                                      |                  |  |                         |                        |              |            |       |                |                                       |            |
|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|-----------------------|---------------------------------------------|------------|----------------|------------------------------------------------------|------------------|--|-------------------------|------------------------|--------------|------------|-------|----------------|---------------------------------------|------------|
|                | Input Data Output Data Input Disabled Circular Buffer Data                                                                                                                                                                                                                                       |                    |                       |                                             |            |                |                                                      |                  |  |                         |                        |              |            |       |                |                                       |            |
|                |                                                                                                                                                                                                                                                                                                  |                    | 0 1                   | 6                                           | 53 e       | 69 93          |                                                      |                  |  | 146 162 176             |                        |              | 245        |       |                |                                       |            |
| Row<br>1       | Norm                                                                                                                                                                                                                                                                                             | In<br>Out<br>Delay | <b>v</b> <sub>1</sub> | <b>V</b> <sub>2</sub> <b>V</b> <sub>3</sub> |            | u <sub>1</sub> |                                                      |                  |  |                         |                        |              | Delay 2x93 | 3 — – | >              | <b>u</b> 1                            |            |
| Row<br>2       | Orth                                                                                                                                                                                                                                                                                             | In<br>CBuf<br>Out  |                       |                                             | <b>V</b> 1 | V2<br>U1       | V <sub>3</sub><br>U <sub>1</sub><br>V <sub>2,1</sub> | <b>V</b> 3,1     |  |                         |                        |              |            |       |                |                                       |            |
|                | Norm                                                                                                                                                                                                                                                                                             | In<br>Out<br>Delay |                       |                                             |            |                | V <sub>2,1</sub>                                     | V <sub>3,1</sub> |  |                         | u₂                     | _            | Delay 1x93 | 3 — – |                | <b>u</b> <sub>2</sub>                 |            |
| Row<br>3       | Orth                                                                                                                                                                                                                                                                                             | In<br>CBuf<br>Out  |                       |                                             |            |                |                                                      |                  |  | <b>V</b> <sub>2,1</sub> | V <sub>3,1</sub><br>U2 | V3,2         | 1          |       |                |                                       |            |
|                | Norm                                                                                                                                                                                                                                                                                             | In<br>Out          |                       |                                             |            |                |                                                      |                  |  |                         |                        | <b>V</b> 3,2 |            |       |                | <b>u</b> <sub>3</sub>                 |            |
| v 2,1          | $\mathbf{v}_{2,1} = \mathbf{v}_{2} - (\mathbf{v}_{2} \cdot \mathbf{u}_{1})\mathbf{u}_{1}$ $\mathbf{u}_{1} = \frac{\mathbf{v}_{1}}{\ \mathbf{v}_{1}\ }  \mathbf{u}_{2} = \frac{\mathbf{v}_{2,1}}{\ \mathbf{v}_{2,1}\ }$                                                                           |                    |                       |                                             |            |                |                                                      |                  |  |                         |                        |              |            |       | 2,1            |                                       |            |
| v 3,1<br>v 3,2 | $= \mathbf{v}_{3}$                                                                                                                                                                                                                                                                               | ,1 <sup>–</sup> (  | y 3,1                 | • u 2 )                                     | 1 2        |                |                                                      |                  |  |                         |                        |              |            |       | <sup>u</sup> 3 | $= \frac{\mathbf{v}}{\ \mathbf{v}\ }$ | 3,2<br>3,2 |



ni.com