PROCEEDINGS OF SPIE SPIEDigitalLibrary.org/conference-proceedings-of-spie Transceiver circuits for high-baudrate optical interconnects Bauwelinck, Johan, Verbeke, M., Lambrecht, J., Vanhoecke, M., Breyne, L., et al. Johan Bauwelinck, M. Verbeke, J. Lambrecht, M. Vanhoecke, L. Breyne, G. Coudyzer, J. Declercq, B. Moeneclaey, R. Ahmed, N. Singh, C. Bruynsteen, C. Wang, S. Niu, X. Wang, T. Pannier, Y. Gu, A. Vandierendonck, B. Van Lombergen, J. Van Kerrebrouck, H. Ramon, M. Verplaetse, G. Torfs, X. Yin, P. Ossieur, "Transceiver circuits for high-baudrate optical interconnects," Proc. SPIE 12007, Optical Interconnects XXII, 120070H (5 March 2022); doi: 10.1117/12.2607210 Event: SPIE OPTO, 2022, San Francisco, California, United States Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use Transceiver circuits for high-baudrate optical interconnects Johan Bauwelinck*a, M. Verbeke a, J. Lambrecht a, M. Vanhoeckea, L. Breynea, G. Coudyzera, J. Declercqa, B. Moeneclaeya, R. Ahmeda, N. Singha, C. Bruynsteena, C. Wanga, S. Niua, X. Wang, T. Panniera, Y. Gua, A. Vandierendoncka, B. Van Lombergena, J. Van Kerrebroucka, H. Ramona,b, M. Verplaetsea,c, G. Torfsa, X. Yina , P. Ossieura aIDLab, Dep. INTEC, Ghent University - imec, 9052 Ghent, Belgium; bNow with Bricsys, 9050 Ghent, Belgium; cNow with Nokia Bell Labs, 2018 Antwerp, Belgium; ABSTRACT New circuit architectures and technologies for high-speed electronic and photonic integrated circuits are essential to realize optical interconnects with higher symbol rate. As a consequence of the increasing speeds, close integration and co-design of photonic and electronic chips have become a necessity to realize high-performance transceivers with novel packaging approaches. Extensive co-design also enables the design of new electro-optic architectures to create and process optical signals more efficiently. This paper and presentation will illustrate a number of recent developments of application-specific high-speed electro- optic transceiver circuits including e.g. broadband driver amplifiers, transimpedance amplifiers, analog equalizers and multiplexer circuits for signal generation and reception at 100 Gbaud and beyond. The basic concepts and architectures, technological aspects, design challenges and trade-offs will be discussed. Keywords: optical transceiver, optical interconnect, electronic-photonic co-design, driver, equalizer, transimpedance amplifier, (de-)multiplexer, clock generation 1. INTRODUCTION High-speed transceivers need to scale up to follow the increasing demand of data intensive applications such as cloud services, virtual reality, high-performance computing, 5G/6G, etc. pushing the speeds of all sorts of interconnects and interfaces. Progress on transceiver throughput typically comes from technology evolutions and smarter integrated circuit (IC) and system design. Smarter design concerns the invention and realization of new circuits and architectures, more efficient multiplexing or modulation schemes, digital signal processing, etc. Due to the increasing speeds and multi- channel operation, the design and verification process becomes increasingly complex and challenging considering signal integrity, power integrity, process variations, design rules, thermal and packaging aspects etc. Besides creativity and accumulated design expertise, advancements in technology are crucial as enabling factor. This involves for example new electro-optic materials and devices (e.g. based on graphene or plasmonics), hybrid integration through new assembly techniques such as transfer printing of chiplets, and advancement of CMOS and SiGe BiCMOS technologies. Before presenting some recent 100-120 Gbaud IC design, some more background on CMOS versus SiGe BiCMOS is provided from a design perspective. A very nice graph to illustrate the difference between advanced CMOS and SiGe BiCMOS can be found in reference1. The transistor cut-off frequency fT is a standard benchmark to compare the high- speed performance of IC technologies. The fT is the frequency at which the transistor current gain (with an incrementally shorted output) drops to 1. The fT is usually derived from S-parameter measurements. One should be careful though not to overestimate the technology capabilities based on the fT as it does not include the effect of the transistor output capacitance or the wiring parasitics for example. Moreover, fT depends on the transistor sizing, biasing conditions and process corner, whereas the specified fT for a technology is usually the “peak” fT. On the other hand the fT gives an easy approximation for the gain-bandwidth product of analog circuits, and in digital circuits, the time constant for the charging and discharging of identical cascaded inverters (or simple gates) can be approximated as 1/2.pi.fT. *johan.bauwelinck@ugent.be; phone 32 264 3340; https://www.ugent.be/ea/idlab/en Invited Paper Optical Interconnects XXII, edited by Ray T. Chen, Henning Schröder, Proc. of SPIE Vol. 12007, 120070H · © 2022 SPIE · 0277-786X · doi: 10.1117/12.2607210 Proc. of SPIE Vol. 12007 120070H-1 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use From a high-speed analog design perspective SiGe BiCMOS2 provides several advantages compared to CMOS such as higher fT, higher break down voltage, higher drive current/voltage, higher transconductance and intrinsic gain, lower noise, better metal stacks for high-Q passives and interconnects and more intuitive/traditional circuit design. However, the integration density is limited by the relatively large CMOS node (55nm, 90nm,…) which limits mixed-signal and digital signal processing (DSP) capabilities. Today’s mainstream CMOS technologies on the other hand allow integrating huge amounts of transistors with densities on the order of 100 million per square mm. Enormous investments, technology development and new transistor concepts are fueling the continued scaling in next generation CMOS processes. This economic reality makes that if a circuit can be implemented in CMOS, it will be most likely done in CMOS, but high product volumes are needed. The ongoing CMOS scaling increases the digital circuit performance, enabling the extensive usage of DSP, calibration, predistortion and equalization. For analog circuits, the CMOS sweetspot is roughly around 28nm, beyond which the fT drops each new generation due to increased parasitic capacitance in the transistor structure. As such, pushing bandwidths beyond 40GHz in scaled CMOS is challenging, combined with a larger impact of interconnect parasitics (resistance, capacitance, inductance). Other design challenges in CMOS come from the 1000s of layout rules (increasing design time), low supply voltage, higher device variability, and aging. Reliability and electromigration are increasing concerns in finfet circuit designs which makes it much harder to increase the current in a transistor. To meet all CMOS design rules, more metal may be required, which again adds parasitics. A bipolar transistor on the other hand can handle a lot of current in a single transistor, with little penalty on its fT. Moreover, in CMOS high quality passives and transmission lines are difficult, whereas circuits to extend bandwidth typically rely on passives (e.g. traveling wave topology, inductive peaking) which also require area, which is relatively more expensive in CMOS. Although this introductory discussion focused on technological aspects, economical aspects will mainly drive the technology selection for product development as long as the technical requirements can be met. The remainder of this paper will focus on our recent developments at 100-120 Gbaud in SiGe BiCMOS, however, our ongoing research and development is increasingly using finfet CMOS for feasibility studies and proof-of-concept designs for e.g. 112 Gbaud DSP-based clock-and-data recovery (CDR) and high-sampling rate digital-to-analog-converters (DACs), complementing imec’s research on analog-to-digital converters (ADCs). Section 2 and 3 present proven silicon, whereas Sections 4 and 5 discuss chips under development or in fabrication. Section 6 will conclude this paper with some closing remarks. 2. 100 GBAUD ANALOG INTERLEAVER Today, DSP and digital-to-analog converters (DACs) are intensively used to create waveforms and to apply transmit-side equalization or pulse shaping for example. When scaling to 100Gbaud or higher, analog bandwidths in excess of 60GHz are required. Such high bandwidths are very challenging to realize in CMOS DACs. For this reason, an analog interleaver was proposed in the H2020 project Qameleon, to combine two or four sub-rate DACs into a single 100Gbaud output3,4. Such a circuit can be considered as a very fast selector or switch, taking samples one after the other from the different DACs. The analog interleaver architecture consists of three 2-to-1 sub-interleavers as shown in Figure 1. The circuit requires a 50GHz clock for the final stage; the 25GHz clocks for the first stage are created on-chip using quadrature clock dividers. Figure 1. Analog interleaver architecture based on three 2-to-1 sub-interleavers Proc. of SPIE Vol. 12007 120070H-2 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use Conventional high-speed current-mode logic (CML) implementations, suffer from clock feedthrough. In the proposed topology, the interleaving is done in two steps. First, the incoming data is translated explicitly into (non-overlapping) return-to-zero (RZ) pulses by suppressing the signal during half the symbol period. These RZ pulses are then combined through summation. In this way, the clock feedthrough in both RZ paths has opposite phase (as the RZ generators are driven by anti-phase clocks to make the alternating non-overlapping pulses) and cancels in the summer. Moreover, the explicit RZ generation (unlike conventional analog multiplexing) allows for pre- and de-emphasis as the signal suppression is realized by summing the incoming signal with the same signal multiplied by the clock {-1, +1}. Introducing a programmable gain in both branches of the RZ generator gives a 1-tap feedforward equalizer. More details on the concept and implementation can be found in references3,4. Figure 2. Die micrograph and 100GBd PAM-4 output The die micrograph (55nm SiGe BiCMOS) is depicted in Figure 2. The die size is around 2 square mm and the main high-speed interfaces and building blocks are indicated. The 100Gbaud PAM-4 eye diagram was measured with a 70 GHz sampling oscilloscope. The interleaver can also be used to make arbitrary waveforms. The resulting effective number of bits (ENOB) is more than 4 bits up to 40 GHz. In these experiments, the chip was mounted in a cavity of a dedicated Megtron 6 test board and wirebonded directly to the PCB transmission lines. Ardent TR70 connectors were used to provide the input data from a Keysight Arbitrary Waveform Generator whereas 1.85 mm connectors were used at the output to connect with the remote sampling heads of the scope via very short cables. The equalizer embedded in the interleaver chip was configured to overcome the losses introduced by the bond wires, transmission lines, connectors and cables, yielding the wide open eye diagram depicted in Figure 2. The measured bandwidth is 73 GHz and 400 mVpp output swing is obtained, sufficient to act as predriver for a dedicated high-swing modulator driver. The power consumption is 700 mW in 4-to-1 mode. This chip is currently being integrated in a demonstrator for the H2020 Qameleon project together with a high-swing III-V driver chip, developed by 3-5 labs and an InP modulator, developed by Fraunhofer HHI. 3. 120 GBAUD PAM-4 MULTIPLEXER WITH 7-TAP FEED-FORWARD EQUALIZER The 120 Gbaud PAM-4 transmitter IC shown in Figure 3 first decodes the incoming data to remove any signal distortion in amplitude or phase5. The resulting most and least significant bits (MSB, LSB) of the incoming PAM-4 signals are then retimed using two 90 degree out of phase quarter-rate clocks (CLK/2 and CLK/2 90º), and the 4 MSB and 4 LSB signals are separately multiplexed and filtered using two independent 7-tap FFEs, one for the MSB and one for the LSB. The two quarter-rate clocks are derived from an externally supplied half-rate clock (CLK) which is divided on-chip. Using phase interpolators (PI), the phases of both quarter-rate clocks can be fine-tuned. The FFE-filtered MSB and LSB signals are combined in the output stage to obtain a full-rate PAM-4 signal. An output driver amplifies the PAM-4 signal to drive a 100Ω differential load. The four PAM-4 levels are created by scaling the overall gain in the two separate FFEs. The advantage of the FFE architecture is the efficient use of both digital and analog delay structures to obtain >100Gbaud operation with a large amount of filter taps in a compact configuration. More details on the concept and implementation can be found in references5. Proc. of SPIE Vol. 12007 120070H-3 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use Figure 3. PAM-4 multiplexer with 7-tap feed-forward equalizer The die micrograph (55nm SiGe BiCMOS) is depicted in Figure 4. The die size is around 3 square mm and the main high-speed interfaces and building blocks are indicated. The 100-120 Gbaud PAM-4 eye diagrams were measured with a 100 GHz sampling oscilloscope. Figure 4. Die micrograph and NRZ/PAM-4 outputs at 100 GBaud, 112 Gbaud and 120 GBaud In this experiment, the chip was flip-chip assembled directly on a dedicated Megtron 6 test board. Ardent TR70 connectors were used to provide the input data from a Keysight Arbitrary Waveform Generator whereas 1.85 mm connectors were used at the output to connect with the remote sampling heads of the scope via very short cables. The FFE equalizer was configured to overcome the losses introduced by the transmission lines, connectors and cable, yielding the open eye diagrams up to 120 Gbaud as shown in Figure 4. A 0.5 Vpp output swing was obtained after a channel with a loss of 8 dB at 60 GHz. This is sufficient to act as predriver for a dedicated high-swing modulator driver. Without channel (or 0dB loss), the output swing could be up to 1.2Vpp. The total power consumption is 2W when the input decoders are configured for NRZ and 2.16W when configured to decode PAM-4. From this power, approximately 680mW is consumed in the analog part (VGAs, delay circuits and output stage), 300mW in the MUX and retiming and 560mW for clock distribution. Proc. of SPIE Vol. 12007 120070H-4 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use 4. QUAD-CHANNEL 106 GBAUD 2:1 LINEAR MULTIPLEXER WITH INTEGRATED MODULATOR DRIVER AND INTEGRATED PLL/VCO To scale up throughput to 800Gb/s, a fully integrated quad-channel SiGe BiCMOS transmitter front-end is being developed in H2020 Poetics, shown in Figure 5. The main functionality in each channel of the transmit path consists of a 2:1 analog signal multiplexer (AMUX) and a high-bandwidth linear driver DRV, referred to together as AMUX-DRV. The AMUX interleaves two incoming PAM4 signals to generate a PAM4 signal at twice the symbol rate. The AMUX operation requires a 53 GHz clock synchronized with the incoming 53 Gbaud PAM-4 data to alternate between the two data inputs, with very low jitter, as jitter reduces the SNR. For this purpose, an on-chip voltage controlled oscillator (VCO) and phase-locked-loop (PLL) is integrated, which can be locked to an external frequency reference and which can provide a phase controlled clock for each channel. The 2:1 106GBaud AMUX will be based on return-to-zero pulse shaping and interleaving, similar to the analog interleaver of Section 2. The output stage can be reconfigured for single- ended or differential operation, and the 50 Ohm back-termination resistors can be disconnected with a Focused Ion Beam (FIB). Figure 5. Quad channel 106 Gbaud 2-to-1 MUX with modulator driver, VCO/PLL clock generator and clock tree. In this configuration a single-ended electro-absorption modulator (EAM) is considered. Figure 6 gives a view on the floorplan of the quad AMUX-DRV chip in 90 nm SiGe BiCMOS and the current status of the toplevel layout. Post-layout verification is currently ongoing. Figure 6. Quad channel 106 Gbaud 2-to-1 MUX with modulator driver and clock generator Proc. of SPIE Vol. 12007 120070H-5 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use 5. 106 GBAUD QUAD-CHANNEL TIA AND MZM DRIVER Figure 7 shows a quad-channel TIA and MZM driver chip, both designed in 55nm SiGe BiCMOS and currently in fabrication. Figure 7. Quad channel 106 Gbaud TIA (left) and MZM driver (right) The quad-channel linear TIA array is designed for a bandwidth of 50 to 60GHz, and low total harmonic distortion (THD) below 5% for an output swing of 0.5Vpp. The optical modulation amplitude (OMA) sensitivity is estimated to be around minus 6 dBm (without intersymbol interference penalty and considering a target BER of 10-4). The TIA input stage is suitable for both single and balanced photodiodes. The power consumption is expected to be around 500 mW per channel. The quad-channel linear MZM driver is co-designed with a Silicon Photonic dual-polarization IQ modulator and targeted a bandwidth of 80-90 GHz, with a differential output swing of 2 to 2.5 Vpp. The MZM driver power consumption will be somewhat below 700 mW per channel. 6. CLOSING REMARKS AND CONCLUSIONS This invited paper presented a number of (ongoing) 100-120 Gbaud circuits for next-generation high-speed optical transceivers. To further scale the capacity of optical transceivers to a next level, ongoing research is exploring various ways forward for which custom high-speed electronics remains a crucial part in the development. For sure, technology, circuit topologies, design techniques, and co-integration methods will progress in the coming years. In this respect, high- speed optical interconnects will benefit from several other applications pushing transceiver operation towards higher frequencies, such as 6G, radar, sub-THz etc. Novel high-mobility semiconductor materials and transistor concepts are being developed to enhance performance and sustain the scaling of transistor technology. As an alternative, hybrid integration through advanced packaging such as transfer printing (H2020 Caladan) allows to mix and match small electronic and/or photonic chiplets with much lower parasitics than conventional flip-chip assembly. Furthermore, progress can be expected from the evolution of circuit architecture and design tools, and new transceiver concepts combining electronic and photonic ICs (electro-optical DACs, optical equalization, optical time division multiplexing, optical sampling…). However, the R&D on this road is a huge challenge and investment with an increasing list of issues/risks to be tackled. For example, power consumption is certainly a big concern for signal generation, and the clock generation and distribution in particular6. Nevertheless, a very exciting decade lies ahead for R&D on high-capacity links and transceivers with symbol rates beyond 100 Gbaud in both direct detect and coherent systems. ACKNOWLEDGEMENTS This work was supported in part by the EU-funded H2020 Projects Poetics under Grant 871769, Qameleon under Grant 780354, Plasmoniac under Grant 871391, Nebula under Grant 871658. This work was also supported by the Special Research Fund (BOF) of Ghent University and the Research Foundation Flanders (FWO). Proc. of SPIE Vol. 12007 120070H-6 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use REFERENCES [1] Chen, Y.-K., [DARPA, T-Music Program], “Technologies for Mixed-mode Ultra Scaled Integrated Circuits”, Jan. 8 2019. [2] Zimmer, T., et al., “SiGe HBTs and BiCMOS Technology for Present and Future Millimeter-Wave Systems”, IEEE Journal of Microwaves 1(1), 288-298 (2021). [3] Ramon, H., et al., “A 700mW 4-to-1 SiGe BiCMOS 100GS/s analog time-interleaver”, IEEE International Solid-State Circuits Conference, ISSCC, 214-216 (2020). [4] Ramon, H., et al,,"A 100 GS/s Four-to-One Analog Time Interleaver in 55 nm SiGe BiCMOS", IEEE Journal of Solid-State Circuits 56(8), 2539-2549 (2021) [5] Verplaetse, M., et al., “A 4-to-1 240 Gb/s PAM-4 MUX with a 7-tap mixed-signal FFE in 55nm BiCMOS”, 2021 IEEE Custom Integrated Circuits Conference, CICC (2021) [6] Razavi, B., “Jitter-Power Trade-Offs in PLLs”, IEEE Transactions on Circuits and Systems I: Regular Papers, 68(4), 1381-1387 (2021) Proc. of SPIE Vol. 12007 120070H-7 Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 11 Mar 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use