# TOBB UNIVERSITY OF ECONOMICS AND TECHNOLOGY INSTITUTE OF NATURAL AND APPLIED SCIENCES

# LOW POWER HIGH SPEED COMPUTING USING RAPID SINGLE FLUX QUANTUM CIRCUITS

**Doctor of Philosophy** 

Sasan RAZMKHAH

**Electrical and Electronics Engineering** 

Advisor: Assoc. Prof. Dr. Ali BOZBEY



Approval of the Graduate School of Science and Technology

| Prof. Dr. Osman EROĞUL |
|------------------------|
| Director               |

I certify that this thesis satisfies all the requirements as a thesis for the degree of Doctor of Philosophy.

Assoc. Prof. Dr. Tolga GİRİCİ
Head of Department

The thesis titled "LOW POWER HIGH SPEED COMPUTING USING RAPID SINGLE FLUX QUANTUM CIRCUITS", by Sasan Razmkhah, 121217713 the student of the degree of Doctor of Philosophy, Graduate school of Natural and Applied Sciences, TOBB ETU, which has been prepared after fulfiling all the necessary conditions determined by the related regulations, has been accepted by the jury, whose signatures are as below, on 22<sup>nd</sup> of February 2018.

| Thesis Advisor: | <b>Assoc. Prof. Dr. Ali BOZBEY</b> TOBB University of Economics and Te | chnology |
|-----------------|------------------------------------------------------------------------|----------|
| Jury Members:   | <b>Prof. Dr. Iman ASKERBEYLİ</b> Ankara University                     |          |
|                 | <b>Prof. Dr. Oğuz ERGİN</b> TOBB University of Economics and Tec       | chnology |
|                 | Assoc. Prof. Dr. Haluk KORALAY<br>Gazi University                      |          |
|                 | Assis. Prof. Dr. Rohat MELİK TOBB University of Economics and Tec      | hnology  |



## TEZ BİLDİRİMİ

Tez içindeki bütün bilgilerin etik davranış ve akademik kurallar çerçevesinde elde edilerek sunulduğunu, alıntı yapılan kaynaklara eksiksiz atıf yapıldığını, referansların tam olarak belirtildiğini ve ayrıca bu tezin TOBB ETÜ Fen Bilimleri Enstitüsü tez yazım kurallarına uygun olarak hazırlandığını bildiririm.

I hereby declare that all information provided in this thesis has been obtained with rules of ethical and academic conduct and has been written in accordance with thesis format regulations. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work.

Sasan RAZMKHAH



#### **ABSTRACT**

#### Doctor of Philosophy

# LOW POWER HIGH SPEED COMPUTING

#### USING RAPID SINGLE FLUX QUANTUM CIRCUITS

#### Sasan RAZMKHAH

TOBB University of Economics and Technology Institute of Natural and Applied Sciences Electrical and Electronics Engineering Program

Supervisor: Assoc. Prof. Dr. Ali BOZBEY

Date: February 2018

Nowadays, the need for the higher speed and lower power consuming computers lead to searching for alternative logics to CMOS technology. Recent advances in the field of superconductor logic technology and superconducting very large-scale integration (VLSI) circuit fabrication allows us to design complex rapid single flux quanta (RSFQ) circuits and structures with high number of Josephson junctions on one chip. These advances lead to developing logic circuits that consume power orders of magnitude less than MOSFETs and working at the relatively higher frequency. In this work, we have designed a 4-bit arithmetic logic unit (ALU) with bit-parallel architecture in rapid single flux quantum logic regime. The parallel architecture allows the simpler structure than Kogge-Stone while maintaining the good latency. The ALU was designed using standard cell library to be fabricated with STP2 standard (2.5 KA/cm2) process and have a latency of 620ps at the most critical path at 2.5 mV bias in 25 GHz clock frequency. The ALU consists of more than 9000 junctions and has 8 different operations including multiplication, add and subtract, and consumes about 2.4 mW of power. This logic unit was designed to be used as a coprocessor with external CMOS processors and be able to function with CMOS

memories. To confirm the working of the ALU, first all the parts were separately fabricated and tested in 4K pulse-tube custom designed cryocooler. The cooler and package is modified to measure high bias circuits.

**Keywords:** Arithmetic logic unit, Superconductivity, RSFQ, Cryocooler.

#### ÖZET

#### Doktora Tezi

# HIZLI TEKLİ FLUX KUANTUM DEVRELERİ İLE DÜŞÜK GÜÇ YÜKSEK, YÜKSEK HIZ BİLGİSAYAR

#### Sasan RAZMKHAH

TOBB Ekonomi ve Teknoloji Üniveritesi Fen Bilimleri Enstitüsü Elektrik Elektronik Mühendisliği Anabilim Dalı

Danışman: Doç. Dr. Ali BOZBEY

Tarih: Şubat 2018

Günümüzde, daha yüksek hız ve daha az güç tüketen bilgisayarlara olan ihtiyaç, CMOS teknolojisine alternatif mantık arayışına yol açmaktadır. Süperiletken mantık teknolojisi ve süperiletken çok geniş çaplı entegrasyon (VLSI) devre fabrikasyonu alanındaki son gelişmeler, tek bir çip üzerinde çok sayıda Josephson kavşağı içeren karmaşık hızlı tek akımlı kuantum (RSFQ) devrelerini ve yapılarını tasarlamamızı mümkün kılmaktadır. Bu ilerlemeler, çok sayıda MOSFET tüketen ve nispeten yüksek frekansta çalışan mantık devrelerinin geliştirilmesine yol açar. Bu çalışmada 4-bit paralel mimarisinde aritmetik mantık birimi (ALU), süperiletken tek akış kuantum mantıksal devrelerle tasarladık. Paralel mimari, iyi bir gecikmeyi korurken Kogge-Stone'dan daha basit bir yapıya izin verir. ALU, STP2 standart (2.5 KA / cm2) işlemi kullanılarak tasarlanmış ve 2.5mV kutuplama yapanda, 25 GHz saat frekansında 620pS gecikme özelliğine sahiptir. ALU, 9000'den fazla kavşaktan oluşur ve çarpma, toplama ve çıkarma da dahil olmak üzere 8 farklı işleme sahiptir ve yaklaşık 2.4 mW güç tüketir. Bu mantık birimi, harici CMOS işlemcileri ve CMOS bellekleriyle birlikte bir işlemci olarak kullanılmak üzere tasarlanmıştır.

ALU'nun çalışmasını teyit etmek için, ilk önce tüm parçalar ayrı olarak üretildi ve 4K nabız tüpü özel tasarlanmış kriyo-soğutucu içinde test edildi. Soğutucu ve çip paketı yüksek akım kutuplama için değiştirilmiş.

**Anahtar Kelimeler:** Aritmetik mantık birimi, Süperiletkenlik, Hızlı Tek Akış Kuantum, Kriyo-Soğutucu.

#### **ACKNOWLEDGEMENTS**

First of all I want to thank my advisor Dr. Ali Bozbey for guiding and supporting me over the years with his valuable advises, his knowledge and being a role model for me as a mentor and a researcher. Then I would like to thank TOBB University of Economics and Technology (ETU) for supporting my research and funding me. I would like to thank TOBB ETU Electrical and Electronics Engineering Department and all the professors for their valuable knowledge. I would like to thank my fellow laboratory colleagues, Mustafa Eren ÇELİK, Eren Can AYDOĞAN, Kübra ÜŞENMEZ and Mustafa Altay KARAMÜFTÜOĞLU that helped me greatly with my research. I would especially like to thank my amazing family for the love, support, and constant encouragement I have gotten over the years. In particular, I would like to thank my parents and my sister. I undoubtedly could not have done this without you.

The circuits were fabricated in the clean room for analog-digital superconductivity (CRAVITY) of National Institute of Advanced Industrial Science and Technology (AIST) with the standard process 2 (STP2). The AIST-STP2 is based on the Nb circuit fabrication process developed in International Superconductivity Technology Center (ISTEC). I would like to thank Prof. A. Fujimaki (Nagoya Univ., Japan) and his associates for kindly providing CONNECT cells.

This work is supported by TUBITAK with the project number 111E191. Therefore, I would like to thank TUBITAK for their financial supports.



# TABLE OF CONTENTS

|                                                | Page |
|------------------------------------------------|------|
| ABSTRACT                                       |      |
| ÖZET                                           |      |
| ACKNOWLEDGEMENTS                               |      |
| TABLE OF CONTENTS                              |      |
| LIST OF FIGURES                                |      |
| LIST OF TABLES                                 |      |
| ABBREVIATIONS                                  |      |
| LIST OF SYMBOLS                                |      |
| 1. INTRODUCTION                                |      |
| 1.1. Theory of Superconductivity               |      |
| 1.1.1.Josephson junctions                      |      |
| 1.1.2.SQUIDs                                   |      |
| 1.2.Outline of the Thesis                      |      |
| 2.SUPERCONDUCTOR CIRCUIT THEORY                |      |
| 2.1.Rapid Single Flux Quantum (RSFQ)           |      |
| 2.2. Adiabatic Quantum Flux Parametron (AQFP)  |      |
| 2.3.RSFQ Cells                                 |      |
| 2.3.1.Digital cells                            |      |
| 2.3.1.1.Wiring cells                           |      |
| 2.3.1.2.Logic cells                            |      |
| 2.3.1.3.Flip-flops                             |      |
| 2.3.1.4.DC/SFQ and SFQ/DC                      |      |
| 2.3.2.Analog design                            |      |
| 2.3.2.1.Passive transmission lines (PTL)       |      |
| 2.3.2.2.Driver and receiver circuits           |      |
| 2.4.Fabrication Process                        |      |
| 3.ARITHMETIC LOGIC UNIT                        |      |
| 3.1.Superconductor Arithmetic Logic Unit (ALU) |      |
| 3.2.ALU Architectures                          |      |
| 3.2.1.Serial                                   |      |
| 3.2.2.Parallel                                 |      |
| 3.2.3.Bit Slice                                |      |
| 3.2.4.Kogge Stone                              |      |
| 4.DEVELOPED ARITHMETIC LOGIC UNIT SUB-CIRCUITS |      |
| 4.1.Logic Unit                                 |      |
| 4.2.Adder                                      |      |
| 4.3.Multiplier                                 |      |
| 4.4.Multiplexer                                |      |
| 4.5.Passive Transmission Lines (PTLs)          |      |
| 4.6.AQFP Cells                                 |      |
| A 7 Interface Circuits                         | 91   |

| 4.7.1.Input register stage                         | 92  |
|----------------------------------------------------|-----|
| 4.7.2.Output register stage                        |     |
| 5.IMPLEMENTATION OF TEST SETUP                     | 97  |
| 5.1.System Integration                             |     |
| 5.1.1.Cryocooler                                   |     |
| 5.1.2. Wiring and connections                      |     |
| 5.1.3.Electronics                                  |     |
| 5.1.4.Packaging                                    | 105 |
| 5.1.5.Shielding                                    |     |
| 5.2. Testing the Noise and Stability of the System | 116 |
| 5.2.1.Josephson junction                           |     |
| 5.2.2.Connect JAND cell                            | 119 |
| 5.3.System Automation                              | 121 |
| 5.3.1.Methodology                                  | 122 |
| 6.RESULTS AND CONCLUSION                           | 127 |
| 6.1.Parallel Arithmetic Logic Unit                 | 129 |
| 6.1.1.Fabricated circuit                           | 132 |
| 6.1.2.Results                                      |     |
| 6.2.Serial Arithmetic Logic Unit                   | 136 |
| 6.2.1.Fabricated circuit                           | 138 |
| 6.2.2.Results                                      | 139 |
| 6.3.Conclusion                                     | 141 |
| REFERENCES                                         | 143 |
| RESUME                                             | 151 |
|                                                    |     |
|                                                    |     |
|                                                    |     |

### LIST OF FIGURES

| Page                                                                                                                                               | <u> </u> |
|----------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| Figure 1.1: The power consumption of different supercomputers around the world.  The blue line shows the superconductor projects power consumption | 2        |
| Figure 1.2 : The critical temperature of different superconductors versus the year of their find.                                                  | 6        |
| Figure 1.3: Zero resistivity in mercury as shown by Onnes in 1911                                                                                  | 7        |
| Figure 1.4 Floating of a superconductor bulk over a magnet during to the quantum lock. This example presents Meissner Effect.                      | 8        |
| Figure 1.5 : Presentation of the Meissner effect in a superconducting bulk as the temperature drops below critical point                           | 8        |
| Figure 1.6: Josephson current versus the magnetic field for two parallel junctions                                                                 | 9        |
| Figure 1.7 : Circuit model of a Josephson junction                                                                                                 | 1        |
| Figure 1.8 : Normalized current $I/I_c$ versus normalized voltage $GV/I_c$                                                                         | 2        |
| Figure 1.9 : The characteristic of a junction for $\beta_c$ =4.                                                                                    | 3        |
| Figure 1.10: The quantization of magnetic field inside superconductor loop. a) Not quantized. b) Quantized                                         | 4        |
| Figure 1.11 : Schematic of a DC-SQUID.                                                                                                             | 5        |
| Figure 1.12: The screening current and penetrated flux of a SQUID loop as we apply external magnetic flux                                          |          |
| Figure 1.13 : I-V characteristic of DC-SQUID and the output voltage at the terminals                                                               | 6        |
| Figure 2.1: I-V characteristic of a Josephson junction as it is biased                                                                             | 21       |
| Figure 2.2: The conversion of DC signal to SFQ pulses with DC/SFQ cell                                                                             | 22       |
| Figure 2.3 : Single data rate and double data rate data.                                                                                           | 22       |
| Figure 2.4 : SFQ pulse to DC conversion in a SFQ/DC cell. Note that each pulse causes an state change in the output                                |          |
| Figure 2.5 : Adiabatic switching versus conventional switching                                                                                     | 24       |
| Figure 2.6: The basic operation of an AQFP gate designed by Goto                                                                                   | 25       |
| Figure 2.7 : Schematic of a JTL cell used for RSFQ logic circuits                                                                                  | 27       |
| Figure 2.8 : Schematic of a RSFQ splitter cell                                                                                                     | 28       |
| Figure 2.9 : Merger circuit schematic for RSFQ process                                                                                             | 29       |

| Figure 2.10 : Moore diagram of the Josephson AND gate                                                                                                      | 30 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 2.11 : Schematic of a Josephson AND cell.                                                                                                           | 31 |
| Figure 2.12 : Moore diagram of an OR gate.                                                                                                                 | 31 |
| Figure 2.13 : Schematic of a Josephson OR gate.                                                                                                            | 32 |
| Figure 2.14 : Moore diagram of an XOR gate.                                                                                                                | 33 |
| Figure 2.15 Schematic of a Josephson XOR gate.                                                                                                             | 33 |
| Figure 2.16 : Schematic of a Josephson NOT gate.                                                                                                           | 34 |
| Figure 2.17 : Moore diagram of clocked Josephson DFF gate.                                                                                                 | 35 |
| Figure 2.18 : Schematic of a Josephson DFF gate designed in                                                                                                | 35 |
| Figure 2.19 : Moore diagram of TFF cell.                                                                                                                   | 36 |
| Figure 2.20 : Schematic of a Josephson TFF gate.                                                                                                           | 37 |
| Figure 2.21 : Moore diagram of T1 cell.                                                                                                                    | 37 |
| Figure 2.22 : Schematic of a Josephson T1 gate.                                                                                                            | 38 |
| Figure 2.23 : Schematic of a Josephson DC/SFQ gate.                                                                                                        | 39 |
| Figure 2.24 : Schematic of a Josephson SFQ/DC gate                                                                                                         | 39 |
| Figure 2.25 : The ladder model of a PTL.                                                                                                                   | 40 |
| Figure 2.26 : Driver circuit for PTL lines used in standard library                                                                                        | 41 |
| Figure 2.27 : Receiver circuit for PTL lines used in standard library                                                                                      | 41 |
| Figure 2.28: The layers of the STP2 fabrication process.                                                                                                   | 42 |
| Figure 2.29 : Layer properties of the STP2 process.                                                                                                        | 43 |
| Figure 3.1 : Block diagram of the FLUX processor                                                                                                           | 47 |
| Figure 3.2 : Block diagram of the FLUX-1 processor.                                                                                                        | 48 |
| Figure 3.3 : FLUX-1R chip layout designed w,ith 63107 Josephson junction. The power dissipation of the chip is about 10mW.                                 | 49 |
| Figure 3.4: (a) The block diagram for a single cell of the ALU that consists of thre switches and a half-adder. (b) 2-bit ALU using the single block cells |    |
| Figure 3.5 : The block diagram for 4-bit ALU based on HA and switches                                                                                      | 51 |
| Figure 3.6: Processor designs in Nagoya University known as Core-e series                                                                                  | 52 |
| Figure 3.7 : Block diagram of the processor architecture in the CORE e design                                                                              | 53 |
| Figure 3.8: The register file block in the CORE e4 processor.                                                                                              | 54 |
| Figure 3.9 : 4-bit ALU block with bit-sliced architecture to incorporate in the 32-bit ALU                                                                 |    |
| Figure 3.10 : Serial architecture for arithmetic logic unit.                                                                                               | 56 |
| Figure 3.11 : Bit-parallel architecture for ALU.                                                                                                           | 57 |

| Bit-slice architecture can be considered a combination of serial and parallel architecture such as Figure 3.12. The bit-slice is implementation of many smaller bit sized processors to make a higher bit processor |    |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 3.12: Logic circuits of 74181 CMOS ALU with bit-sliced architecture 5                                                                                                                                        | 58 |
| Figure 3.13 : Tree diagram of a Kogge-Stone adder for the bit routes                                                                                                                                                | 50 |
| Figure 3.14 : 8-bit Kogge-Stone adder designed at TOBB ETU and fabricated by STP2 process.                                                                                                                          | 51 |
| Figure 4.1 : 4-bit JAND logic gates with clock tree.                                                                                                                                                                | 54 |
| Figure 4.2 : 4-bit JOR logic gates with clock tree                                                                                                                                                                  | 54 |
| Figure 4.3 : 4-bit JXOR logic gates with clock tree                                                                                                                                                                 | 54 |
| Figure 4.4 : 4-bit JNOT logic gates with clock tree.                                                                                                                                                                | 55 |
| Figure 4.5 : Fabricated logic cells with their clock trees. a) JOR, b) JAND, c) JNOT, d) JXOR                                                                                                                       |    |
| Figure : 4.6 Reported JOR gate output waveform.                                                                                                                                                                     | 56 |
| Figure 4.7 Schematic of a 4-bit Kogge-stone architecture adder                                                                                                                                                      | 57 |
| Figure 4.8 : Layout of a 4-bit Kogge-stone architecture adder.                                                                                                                                                      | 57 |
| Figure 4.9 : Block diagram and layout of a 4-bit carry look ahead adder designed for the ALU structure                                                                                                              |    |
| Figure 4.10 : Verilog simulation result of adder stage                                                                                                                                                              | 59 |
| Figure 4.11 : Two bit input multiplier circuit.                                                                                                                                                                     | 59 |
| Figure 4.12 : Fabricated circuit for a 2-bit multiplier circuit. The size of the circuit is about 500um witout considering the DC/SFQ and SFQ/DC cells                                                              |    |
| Figure 4.13 : Schematic of the designed 4-bit multiplier cell for using in parallel                                                                                                                                 |    |
| ALU                                                                                                                                                                                                                 |    |
| Figure 4.14: Layout of the designed 4-bit multiplier cell for using in parallel ALU 7                                                                                                                               | 72 |
| Figure 4.15: Fabrication result of the designed 4-bit multiplier cell for using in parallel ALU. The circuit is fabricated with standard process STP2                                                               | 73 |
| Figure 4.16: Result of the measurements made on the multiplier circuit. The figure shows the result for $1\times3$ operation and as we see the output is 3 as well.                                                 | 74 |
| Figure 4.17 : The result for $2\times 2$ operation and as we see the output is 4 as well 7                                                                                                                          | 75 |
| Figure 4.18 : The result for $3\times5$ operation and as we see the output is 15 as well 7                                                                                                                          | 75 |
| Figure 4.19: Layout and schematic of 2 to 1 multiplexer circuit using toggle flip-flop and not gate.                                                                                                                |    |
| Figure 4.20 : Layout and schematic of 4-bit 2 to 1 multiplexer circuit using only toggle flip-flops and D-type flip-flop.                                                                                           | 77 |
| Figure 4.21: Layout and schematic of 4-bit 2 to 1 multiplexer circuit using T1 flip-flops and JAND gates                                                                                                            | 78 |

| flops. The circuit is fabricated in AIST CRAVITY with standard process STP2                                                                                   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 4.23 : waveform of inputs and outpu experimental result of a single cell from 2 to 1 4-bit multiplexer                                                 |
| Figure 4.24 : Ladder $\pi$ -model for the strip-line in standard process. For 20 $\mu m$ PTL, the values are: L $\pi$ =0.25pH and C $\pi$ =0.037pF            |
| Figure 4.25 : Block diagram of the test setup for designed cells. Figure numbers shows the designed parts.                                                    |
| Figure 4.26 : The receiver circuits with the JTLs and the SFQ/DC converters $84$                                                                              |
| Figure 4.27 : The driver circuits with respective JTLs and the DC/SFQ converters $84$                                                                         |
| Figure 4.28 : Input/output of the PTL line of $20\mu m$ width at 4.2 K. The expected output signal is generated externally to compare with output of PTL 85   |
| Figure 4.29 : Bit error rate (BER) measurement vs bias                                                                                                        |
| Figure 4.30 : The buffer gate schematic and layout designed in Cadence Virtuoso software. The coupling of the inductances is not shown in the picture 87      |
| Figure 4.31 : JSIM simulation results for the buffer gate                                                                                                     |
| Figure 4.32 : Fabricated buffer circuit. a) without shield. b) with superconductor shield                                                                     |
| Figure 4.33 : Majority gate in AQFP technology, left is the schematic and right is the layout. The coupling of the inductances is not shown in the picture 89 |
| Figure 4.34 : The output result of the majority gate as we apply two logics at same value. The output is inverted                                             |
| Figure 4.35 : The a) not shielded and b)shielded majority gate fabricated by standard process                                                                 |
| Figure 4.36: Layout and schematic of 4-bit 4 to 1 Input register circuit92                                                                                    |
| Figure 4.37 : Fabricated circuit of 4-bit 4 to 1 Input register circuit. The circuit is fabricated in AIST CRAVITY with standard process STP2                 |
| Figure 4.38 : waveform of inputs and output experimental result of a single cell from 4 to 1 4-bit input stage                                                |
| Figure 4.39: Layout and schematic of 4-bit 1 to 3 outputs register circuit95                                                                                  |
| Figure 4.40 : Fabricated circuit of 4-bit 1 to 4 output register circuit. The circuit is fabricated with standard process STP296                              |
| Figure 5.1 : Schematic of pulse-tube cryocooler                                                                                                               |
| Figure 5.2 : Temperature oscillation in the a) First stage, b) Second stage of the system under load                                                          |
| Figure 5.3 : LabVIEW program for temperature and vacuum control of the system during measurements                                                             |
| Figure 5.4: The wiring configuration of the cryocooler between stages                                                                                         |

| Figure 5.5 : Test setup hardware for low noise and low frequency RSFQ circuit measurements.                                                            | . 104 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
| Figure 5.6: The cryostat and the test setup for low and high frequency tests                                                                           | . 105 |
| Figure 5.7: Chip packaging of the test circuit used for power measurements                                                                             | . 106 |
| Figure 5.8 : Measuring the temperature on chip surface using series SQUID I-V curve.                                                                   | . 107 |
| Figure 5.9 : a) Thermal resistance from the circuit on the chip surface to the environment. b) Electrical resistance for one line of bias current path | . 109 |
| Figure 5.10 : The power graph for Table 5.1.                                                                                                           | . 111 |
| Figure 5.11: The power loss of each pin versus the current the pin carries (total current feed through 8 pin)                                          | . 112 |
| Figure 5.12: The chip before applying the epoxy and after applying epoxy on it                                                                         | . 113 |
| Figure 5.13 : The power graph for Table 5.2.                                                                                                           | . 114 |
| Figure 5.14: The power consumption graphs per current of each pin by different coverings of the chip.                                                  | . 115 |
| Figure 5.15: The copper dust medium for insertion of the pipes.                                                                                        | . 116 |
| Figure 5.16: TOP is the schematic of the J-AND cell and bottom is the picture of the fabricated cell.                                                  |       |
| Figure 5.17 : Un-shunted Josephson junction I-V curve.                                                                                                 | . 118 |
| Figure 5.18 : 300 series DC-SQUID I-V characteristics.                                                                                                 | . 119 |
| Figure 5.19: Input and output results of the single J-AND cell.                                                                                        | . 120 |
| Figure 5.20: Bit error rate of the and cell in different bias values.                                                                                  | . 121 |
| Figure 5.21 : (a) Hardware setup of the test bench and (b) Forward solution for determining the best working point.                                    | . 123 |
| Figure 5.22 : DFF and JOR output probability.                                                                                                          | . 124 |
| Figure 5.23: Bias margin percentage for each stage of the circuit.                                                                                     | . 125 |
| Figure 6.1 : Superconductor ALU with interface circuits in relation to the CMOS circuits.                                                              | . 128 |
| Figure 6.2 : The ALU used inside the coprocessor.                                                                                                      | . 130 |
| Figure 6.3 : Block diagram of parallel ALU.                                                                                                            | . 131 |
| Figure 6.4 : The fabricated ALU with STP2 process.                                                                                                     | . 133 |
| Figure 6.5 : The JSIM analog simulation of the ALU circuit on its most critical path (The clock tree).                                                 |       |
| Figure 6.6 : Simulation results for the ALU in various input conditions                                                                                | . 135 |
| Figure 6.7 : Experimental results from 4-bit parallel ALU                                                                                              | . 136 |
| Figure 6.8 : Block diagram of serial ALU.                                                                                                              | . 137 |
| Figure 6.9: A single cell of a serial ALU fabricated with standard process STP2                                                                        | 138   |

| Figure 6.10: 4-bit serial ALU fabricated with standard process STP2.            | 139 |
|---------------------------------------------------------------------------------|-----|
| Figure 6.11 : Clock in and clock out from a serial ALU tested in our cryocooler |     |
| system.                                                                         | 140 |
| Figure 6.12: Inputs and output signals of the serial ALU circuit.               | 141 |

# LIST OF TABLES

|                                                                                                                                                       | <u>Page</u> |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|
| Table 1.1 : Switching energy of different conventional logic circuits                                                                                 | 3           |
| Table 4.1 : $\pi$ -model parameters extracted for the striplines with no sky-plane                                                                    | 82          |
| Table 5.1 : Power and thermal gradient characteristic of cooler while applying different bias current via 4 wires. Second stage temperature is at 4.2 | 2K 110      |
| Table 5.2 : Power and thermal gradient characteristic of cooler while applying different bias current via 8 Be-Cu bias pins                           | 114         |
| Table 6.1: The operations of the parallel ALU and the select bits for them                                                                            | 132         |
| Table 6.2: The operations of the serial ALU and the select bits for them                                                                              | 137         |



#### **ABBREVIATIONS**

AC : Alternating Current ALU : Arithmetic Logic Unit

**AQFP** : Adiabatic Quantum Flux Parametron

**BCS**: Bardeen Cooper Schrieffer

**BER** : Bit Error Rate

CLK : Clock

DC : Direct Current DFF : D Flip Flop

**H**<sub>c</sub> : Critical Magnetic Field

I<sub>B</sub> : Bias Current
 I<sub>C</sub> : Critical Current
 JJ : Josephson Junction

**JTL** : Josephson Transmission Line

**LPF** : Low Pass Filter

LTS : Low Temperature Superconductor

PTL : Passive Transmission Line
RSFQ : Rapid Single Flux Quantum
QOS : Quasi One Junction SQUID

**SFQ** : Single Flux Quantum

**SQUID** :Superconducting Quantum Interference Device

**STJ** : Superconducting Tunnel Junctions

T<sub>C</sub>: Critical Temperature



# LIST OF SYMBOLS

The symbols used in this work are presented below.

| Symbols  | Explanation                     |
|----------|---------------------------------|
| A        | Current Dimension (Ampere)      |
| e        | Electron charge                 |
| f        | Frequency                       |
| L        | Inductance                      |
| n        | Nano                            |
| p        | Pico                            |
| μ        | Micro                           |
| m        | Mili                            |
| Φ        | Flux                            |
| $\Phi_0$ | Flux Quanta                     |
| ħ        | Plank Constant                  |
| δ        | Superconductor Phase Difference |
| Ψ        | Wave Function                   |
| I        | Current                         |
| S        | Second                          |
| t        | Time                            |
| T        | Temperature                     |
| V        | Voltage                         |
| τ        | Time Constant                   |



### 1. INTRODUCTION

Nowadays, the need for the high speed and low power consuming computers lead to searching for alternative logic families to Complementary Metal-Oxide-Semiconductor (CMOS) and silicon technologies. Recent advances in the field of rapid single flux quantum (RSFQ) technology and superconducting large scale integration allows us to fabricate complex RSFQ circuits and structures with high number of Josephson junctions on one chip [1], [2]. These advances lead to developing logic circuits that consume power orders of magnitude less than MOSFETs and working at a relatively higher frequency [3]–[5].

The power consumption in large scale computers is a significant problem that limits the computing power and scalability of CMOS circuits. Data centers and large computing facilities are getting larger for several reasons including growth in internet traffic, support centers for cellular and mobile devices, cloud computing and dependency of more applications on precise simulations. Bronk, et al. predicted that the energy consumption of the United States data centers would rise from 72 Tera Watts to 176 TWh until the year 2020 [6]. Reducing the energy consumption of the computational data centers could save up to \$15 billion each year. This calculation only accounts for energy savings and does not include the reduction in size, fewer cooling equipment, and the environmental benefits.

One of the high performance systems that need high power is supercomputer. Nowadays, these systems are in high demand and the power need for them would be increased by time. Data centers also use a lot of computing energy. DoE and DARPA are working on optimizing and reducing this power. The goal set by these organizations is seen in Figure 1.1 [7]. There are more than 500000 data centers worldwide and their estimated power consumption is over 40GW [8]–[10]. Of course there is limited information about data centers since they are mostly run by private companies and are not open to public domain.



Figure 1.1: The power consumption of different supercomputers around the world. The blue line shows the superconductor projects power consumption [7].

Reducing power consumption and increasing the speed in conventional CMOS logics are possible only by reducing the size of the chip or changing the semiconductor material. The size reduction will cause various problems and are limited by fabrication technology, the metal-oxide insulator endurance in high electrical fields and heat transfer from the surface of the chip. Changing the materials can help to reduce the CMOS functioning voltage but it also causes different challenges as the current technology is mostly based on silicon based semiconductors. Therefore, CMOS based computing units with interconnects from normal metals would not be able to keep up with the demand and reach the goal.

The energy consumption in logic devices comes from various sources. It could cause by resistors for biasing the circuit, the energy dissipation in interconnects, charging the gate capacitors or simply by switching the circuit and bit loss. The theoretical limit for energy dissipation in logic gates comes from bit loss in the gate as the gate switches. In a simple NAND gate, there are two input bits and one output bit. The lost bit energy is given by Shannon-Neuman-Landauer limit [11]–[13]. The energy dissipation for each bit loss is  $K_BTln2 \sim 4 \times 10^{-21}$  J. The switching energy dissipation for different logic regimes are given in Table 1.1.

Table 1.1 : Switching energy of different conventional logic circuits.

| Technology                                          | Switching energy                    | Frequency   |
|-----------------------------------------------------|-------------------------------------|-------------|
| CMOS or any other charge-<br>transfer based devices | 10 <sup>6</sup> K <sub>B</sub> Tln2 | 6 GHz @ 77K |
| Switching energy per JJ-SFQ                         | $10^3  \mathrm{K_BTln2}$            | 50 GHz@4.2K |
| Reversible Josephson  Junction circuits             | Below K <sub>B</sub> Tln2           | 5 GHz@4.2K  |

Superconductor logic circuits that are based on Josephson Effect can switch very fast and consume three orders of magnitude less energy than the conventional MOSFETs. They produce a small Single flux quanta (SFQ) pulse that travels at about one third of the speed of light with very low loss. Superconductors need cooling systems working below 10K temperature to function; however, all the datacenters already have coolers and cooling gases so it would not be that inconvenient.

RSFQ circuits was first introduced in 1980s by K.K. Likarev [4], [14]. They are current biased and to regulate the bias value of the cells, they use resistors. RSFQ circuits demonstrate very high speed and some even show speed up to 770GHz [15]. Most of the power consumption in RSFQ circuits comes from these bias resistances and they consume more than 99% of the power. In some alternative RSFQ technologies such as efficient RSFQ (ERSFQ) this resistor is omitted and the circuit consumes 100 times less power [16], [17]. However, cells designed with these technologies are larger and hence we lose some speed and are good for small circuits where low power consumption is a must.

The main problems with RSFQ technology that limits its use are lack of the compact and scalable cryogenic memory, connecting the circuits from cryogenic environment to room temperature, and fabrication process which is not yet advanced as CMOS processes [18], [19]. The other problem that limits the commercial use is the need for a cryostat that is robust for very long time and need minimum maintenance. Most RSFQ circuits are tested in liquid Helium based cryostats which is costly and cannot maintain high hours of function and are fit for research purposes.

In order to make the RSFQ technology viable and ready for the applications outside laboratory and in everyday commercialized usage, we need to develop a very robust system that can maintain its functioning status without the need for constant interference. This system should be easy to use so that a novice user with small amount of training could be able to operate it without the need for deep knowledge about cryogenics and superconductivity.

The user interface is also important. To communicate with the user, The computing part that consists of superconductor circuits should be linked to the CMOS circuits in the outside. The computing part is consist of different circuitry at cryogenic temperature. These circuits consist of different parts from registers memory to the interface circuits. However, the main part of any computing system is the processor or specifically arithmetic logic unit. Our goal is to design the superconducting circuits needed for such a complex system but first of all we needed to design and test the ALU unit.

Arithmetic logic unit (ALU) is the main block of any processor which performs arithmetic, bitwise and logic operations on its input registers. Every central processing unit (CPU), graphics processing unit (GPU) or floating-point unit (FPU) would have a single or multiple blocks of ALU. Each ALU would have at least two registers which are called operands. These registers are fed to the ALU via a random access memory (RAM) or a cache memory. The ALU then performs an operation on these registers. The operation is determined by the operation set register that gets its data from program flash memory. After the operation, the output of the ALU is stored in the output register which then transfer the data to the RAM or cache memory.

The ALU may have other outputs that determine the status of the unit. These status bits may also be referred to as flags. The parity flag determined if the output is an even or odd integer. This flag helps to confirm the data as it is send via data bus and correct any error in the data package. The carry-out flag shows the carry of an operation such as add or subtract and may also determine the overflow of shift operation. The zero flag determines if there is any data at the output register and finally the overflow flag bit shows if an operation such as add or multiply have caused the overflow of the output.

The goal of this work is to design an arithmetic logic unit with an efficient architecture in the RSFQ logic regime to work at a high speed with low power consumption. A test system is designed for robust function of the designed circuits

and measuring the bias margins and bit error rate (BER) of each circuit automatically. The arithmetic logic unit is packaged appropriately for the cryocooler system and also the intermediate circuits for communication of the unit with CMOS circuits are designed and tested.

In order to understand the RSFQ logic and the challenges that we face for design and the test of these circuits, first we have to understand superconductivity and superconductor devices. The base of many superconductor devices is Josephson junctions. For storing the flux in the superconductor circuits, we need SQUIDs. These basics to understand the circuits are discussed in following sections.

#### **1.1.** Theory of Superconductivity

In 1911 a scientist from Netherlands called Heike Kamerlingh Onnes, during his research in his laboratory found that if the Mercury (Hg) temperature drops below 4.2K (the temperature of liquid Helium), its DC resistance suddenly drops to zero. He called this phenomenon superconductivity [20]. Since then the effort to find this phenomenon in different materials with higher critical current ( $T_C$ ) is on the way. In 1913, superconductivity was found in lead (Pb) at  $T_C = 7.2$ K and in 1930 it was found in Niobium (Nb) at  $T_C = 9.2$ K.

In January 1986, Alex Muller Klaus and George Bednorz, Two researchers from IBM laboratory, have found superconductivity in Copper Oxide based ceramics. These ceramics showed superconductivity behavior way over the temperature value that was considered theoretical limit of superconductivity. After this find, many groups around the world start to work on superconductive materials and try to find materials with much higher critical temperature values. In 1993 researchers find HgBaCaCuO ceramic with 136K critical temperature. Figure 1.2 shows the superconductive materials' critical current versus the year that these materials were fabricated [20].

There are many theories for superconductive behavior. In 1934, a simple model called "two fluid model" was introduced by F & H London. This model can describe some of the superconductive behavior like Meissner effect. In 1950, Lev Landau and Vitaly Ginzburg introduced a phenomenological theory which could justify many superconductive macroscopic behaviors. Some years later Alexei Abrikosov categorized superconductors into two groups, Type I and Type II. However, the most

complete model for superconductivity was generated in 1957 by three physicists Leon Cooper & Robert Schrieffer & John Bardeen. They received Nobel Prize for their discovery in 1972. This theory that is known as BCS, is a microscopic theory which is still in use until today [20]–[22].



Figure 1.2: The critical temperature of different superconductors versus the year of their find [20].

Superconductive phenomenon is identified by two main behaviors, zero resistivity and Meissner effect. When the temperature of a superconductive material reaches the critical value, the DC resistance of the material suddenly drops to zero. This phenomenon is known as zero resistivity and it is different from perfect conductor. Figure 1.3 shows the zero resistivity in mercury as measured by Onnes and shows the first case of observed superconductivity.



Figure 1.3: Zero resistivity in mercury as shown by Onnes in 1911 [20].

The Meissner effect determines the behavior of a superconductor in a magnetic field. The superconductor material does not allow the magnetic field to pass through it. In type I superconductors, as we apply a magnetic field to the material, some currents will be formed in the material to oppose the applied magnetic field. This may be confused with Lens's law in which the conductor opposes the changes in the magnetic field by creating an opposing field via circular currents in the material. In the perfect conductor, due to no loss in the resistances, this current would stay forever. Figure 1.4 shows a superconductor bulk in the presence of a magnetic field. The bulk would be locked in its place due to the opposing circular currents in the material and the position of the bulk would be stable. This property is used in supertrains to float the train stably, motors, generators and many other technologies.



Figure 1.4 Floating of a superconductor bulk over a magnet during to the quantum lock. This example presents Meissner Effect [21].

What separates superconductor from perfect conductor material is the fact that if we already have a magnetic field and the material transition in perfect conductor state, the field will pass through the material that only oppose the changes in the magnetic field. However, as a material transition into superconductor state, the material would repel any magnetic field inside it as shown in Figure 1.5. The bulk would resist magnetic field until the field becomes so big that the whole superconductor bulk collapses. This phenomenon is known as Meissner effect.



Figure 1.5: Presentation of the Meissner effect in a superconducting bulk as the temperature drops below critical point [21].

There are two classes of superconductor materials, Type I and Type II. The type of superconductor is determined by the behavior of the material in magnetic field. In type I as we apply magnetic field to the superconductor, the material would resist the magnetic field and act as a perfect diamagnetic until the field reaches the critical value and the whole material collapses. In type II there are two different critical field values ( $H_{c1}$  and  $H_{c2}$ ). The material like the type I superconductor, resist the magnetic field until the  $H_{c1}$  value is reached. When we apply bigger magnetic field, the material would not collapse completely and still act as superconductor. However, the magnetic field can penetrate the material in quantified values. This quantum of flux is determined by the equation (1.1). [21], [23]

$$\Phi_0 = \frac{h}{2e} = 2.07 \times 10^{-15} wb \tag{1.1}$$

The year 1962 could be considered a critical point in the history of superconductivity. In this year, Josephson Effect was discovered by Brian D. Josephson, 1973 physics Nobel laureate. Later, Jaklevice discovered the quantum interference between two Josephson junctions that are placed parallel on a superconducting loop. He presented the dependence of the critical current to applied magnetic field as Figure 1.6 [24].



Figure 1.6 : Josephson current versus the magnetic field for two parallel junctions [24].

The high frequency oscillations seen in the Figure 1.6 is due to quantum interference between two junctions. These oscillations are like the interference between the two parts of a light source as it passes through two parallel slits. Due to this phenomenon these devices are called direct current superconductor quantum interference devices

(DC-SQUIDs). SQUIDs can be used as magnetic field sensors and can sense magnetic fields as small as  $10^{-6}\Phi_0$ . These devices are actually flux to voltage converters. [24]

#### 1.1.1. Josephson junctions

The tunneling expression is used where an electron can pass through a potential barrier which normally could not cross according to classical physics laws. Tunneling junctions have various types but here we discus superconductor-insulator-superconductor (SIS) junctions. The idea of tunneling a Cooper pair (or super-electron) that are in a distance at each other without applying voltage was first demonstrate by Josephson in 1962.[24]

According to Ginzburg and Landau theorem Cooper pairs are described with the order function or wave function as equation (1.2).

$$\Psi(r) = |\Psi(r)| e^{i\theta(r)} \tag{1.2}$$

In this equation,  $|\Psi(r)|^2$  determines the Cooper pair electron density in the location r and phase  $\theta(r)$  is in relation with supercurrent at that area[21], [23]. As the two superconductors get near to each other, their wave function penetrates in the barrier between them and coupled to reduce the systems energy level. In this conditions Cooper pairs can tunnel through the insulator barrier without consuming any energy. The Josephson junction can be described with two main equations (1.3) and (1.4).

$$I = I_c Sin(\phi) \tag{1.3}$$

$$\frac{\partial \phi}{\partial t} = \frac{2e}{\eta}V\tag{1.4}$$

Equation (1.3) states that the current that goes through a junction is a function of critical current  $I_C$  and the phase difference in wave function in the both parts of the junction. Equation (1.4) states that the change in this phase difference is a function of voltage at the heads of the junction. By applying a DC voltage to the junction and integrating the (1.4) we get:

$$\phi = \phi_0 + \left(\frac{2e}{\eta}\right) \cdot V \cdot t \tag{1.5}$$

If we put this equation inside (1.3), we get the following equation:

$$I = I_c Sin(\omega_j t + \phi_0)$$
 (1.6)

Therefore, the AC current would pass through the Josephson junction with the frequency of f<sub>i</sub>, this frequency is:

$$f_{j} = \frac{\omega_{j}}{2\pi} = \left(\frac{1}{2\pi}\right) \cdot \frac{2e}{h} \cdot V \tag{1.7}$$

Equation (1.7) shows that the frequency is the function of voltage with the fix coefficient. Since we can measure frequency with very high precision, US National Bureau of Standards accepts this equation as a standard for the voltage. The coefficient of the voltage is:

$$\frac{2e}{h} = 483593.420 \frac{GHz}{V} \tag{1.8}$$

It is noteworthy that the equations (1.3) and (1.4) only state the current that electron pairs (Cooper pairs) carry. In the condition that the voltage is not zero, a quasiparticle current caused by normal electrons in the material also exists parallel to Cooper pairs current. We could also have some leakage current due to imperfections in the insulator. To model a circuit that describes all these currents, a circuit shown in Figure 1.7 is demonstrated.



Figure 1.7: Circuit model of a Josephson junction [21].

In this circuit, beside the Josephson element, a capacitor for displacement current and a voltage dependent resistor for leakage and normal electron current are placed [21], [22].

To investigate the I-V characteristics of a Josephson junction, a differential equation based on the Figure 1.7 circuit can be written as:

$$I = I_c \sin(\phi) + GV + C\frac{dV}{dt}$$
(1.9)

If we use the Josephson equations, we can find the right part of equation (1.9) just as a function of  $\phi$ :

$$I = \frac{\eta C}{2e} \frac{d^2 \phi}{dt^2} + \frac{\eta G}{2e} \frac{d\phi}{dt} + I_c \sin(\phi)$$
 (1.10)

From this equation and replacing the t with the other time variant  $\theta$ , we derive equation (1.11),

$$\frac{I}{I_c} = \beta_c \frac{d^2 \phi}{d\theta^2} + \frac{d\phi}{d\theta} + \sin(\phi)$$
 (1.11)

which contains the  $\beta_c$  parameter. This parameter is the main parameter of a Josephson junction and is called Mc-Cumber parameter.  $\beta_c$  is the ratio of capacitor suspense in Josephson frequency to junctions conductivity.

$$\beta_c = \frac{\omega_c C}{G} = \left(\frac{2e}{\eta}\right) \left(\frac{I_c}{G}\right) \frac{C}{G}$$
 (1.12)

In order to find the I-V characteristics in different  $\beta_c$  values we should find average voltage  $V = \langle (h/2e) \frac{d\Phi}{dt} \rangle$  in a constant current in various  $\beta_c$  values. Figure 1.8 shows the Normalized current I/I<sub>c</sub> versus normalized voltage GV/I<sub>c</sub> for  $\beta_c$ =0 and  $\beta_c$ = $\infty$ .



Figure 1.8: Normalized current I/I<sub>c</sub> versus normalized voltage GV/I<sub>c</sub> [22].

It is obvious that for  $\beta_c$ =0, for every current we would have constant voltage. Also for V=0 there could be a significant current crossing the junction. To find the exact function between the average voltage and applied current, we should numerically

integrate the equation (1.12). Figure 1.9 shows the result for solving this equation at  $\beta_c$ =4.



Figure 1.9 : The characteristic of a junction for  $\beta_c$ =4 [22].

If a junction with  $\beta_c$ =4 is connected to a DC current supply and the current start raising from zero, the I-V characteristic would be as the arrows shown in Figure 1.9. If the current reach the I<sub>c</sub> level, voltage would jump from zero to a non-zero value. Now if we reduce the current, voltage would decrease from another path until reaches zero in I<sub>min</sub>. In this case the energy that is lost in the hysteresis of this I-V is equal to a  $\Phi_0$ . This would create a voltage pulse in the junction that is also known as single flux quantum (SFQ) pulse.

## **1.1.2. SQUIDs**

SQUID was first introduced in 1964 in Ford Research Labs two years after the invention of Josephson junctions. As stated before SQUIDs can detect magnetic fields as small as  $10^{-6}\Phi_0$  and it is basically a device that converts magnetic flux to electric voltage [25]. There are different kinds of SQUID sensors including DC-SQUID, RF-SQUID and Quasi One Junction SQUID (QOS).

The DC-SQUID is two similar Josephson junctions that are shunted in a superconductor loop. DC-SQUIDs are used for detecting very small magnetic fields and are one of the basic elements in superconductor circuits since they can store a  $\Phi_0$  in their loop.

The magnetic flux that pass through the SQUID ring should be integer multiplication of the magnetic flux quanta  $\Phi_0$ . Figure 1.10 shows that if the magnetic flux is not quantized, the wave function of superconductor cannot close itself in the superconductor ring. This could be compared to standing waves in the rope or cable as both ends of it is fixed.



Figure 1.10: The quantization of magnetic field inside superconductor loop.
a) Not quantized. b) Quantized [24].

Figure 1.11 (a) shows the general view of a DC-SQUID sensor and part (b) shows the equivalent circuit of the DC-SQUID using RSCJ model. In part (a) the gray color shows the superconductor material and the black part is the insulator between two superconductor parts which makes Josephson junction. Keep in mind the junctions 1 and 2 should have same characteristics for the SQUID to function correctly. In part (b) the junctions are replaced with RSCJ model and the superconductor loop is replaced with inductances.  $I_{N,1}$  and  $I_{N,2}$  are the currents caused by mostly thermal noise in the JJ.

Now if the external magnetic is laser than the  $n\Phi_0$ , a current would form in the loop to compensate the excess magnetic flux. This current would affect the I-V characteristic of the Josephson junctions on the superconductor ring. The changes in the interference of the junctions can be then detected easily.



Figure 1.11: Schematic of a DC-SQUID [26].

If there is an external magnetic field B such as Figure 1.11 (a), the current will form in the superconductor loop such as J. This current would also be called screening current.

As the value of the external magnetic flux increases, the screening current also increases until the external field reaches  $0.5\Phi_0$ , then a  $\Phi_0$  field would penetrate in the SQUID loop to decrease the whole energy of the system. Figure 1.12 shows the screening current and the magnetic flux inside the loop as we apply an external magnetic field. The axises of the graphs are normalized for better understanding.



Figure 1.12: The screening current and penetrated flux of a SQUID loop as we apply external magnetic flux.

In order to read the flux changes better, we apply a bias current to the SQUID loop equal to  $I=2I_C$ . Because the junctions have same parameters, the current would be equal in both loop branches. Now if there is a magnetic flux that is not an integer coefficient of  $\Phi_0$ , the currents in the junctions would be:

$$\begin{cases} I_1 = \frac{I}{2} + I_{ind} \\ I_1 = \frac{I}{2} - I_{ind} \end{cases}$$
 (1.13)

Since the first part of the equation passes the value of the critical current, it would turn normal while the second part would be still superconductor. Therefore, the I-V characteristic of the DC-SQUID would be as in Figure 1.13. Now if we set the flux of the DC-SQUID on the value of  $(n+0.5)\Phi_0$  via a compensation coil, the output voltage at the measurement terminals would be maximum. This way, we would have the most voltage to flux ratio and the SQUID would be at the most sensitive state. The output voltage of the DC-SQUID would change by applying external flux as shown in Figure 1.13.



Figure 1.13: I-V characteristic of DC-SQUID and the output voltage at the terminals [26].

#### 1.2. Outline of the Thesis

The thesis is organized as follows. The general information about the computational power issue and historical background of the Superconductors and superconducting devices are briefly introduced in the introductory section, Chapter 1. The theory of

Josephson junctions and SQUIDs are also discussed. Finally, the purpose of this work and the reason for the presented problem is discussed in this very chapter.

In Chapter 2, superconductor logic theory is discussed. Different superconductor logic circuits such as RSFQ and Adiabatic quantum flux parametron (AQFP) are introduced in Section 2.1 and Section 2.2. The benefits and issues with each technology will be shown in these two sections. The technology we chose for our project and the reasons behind it are explained in this chapter. Then, some of the main cells and designing blocks of complex logics are demonstrated. The digital and analog cells of the RSFQ logics are shown and design principles of these circuits are discussed in Section 2.3. In this chapter, I also show the method of designing RSFQ logics and impedance matching for interfacing circuits. At last the fabrication process and the different layers available for the design of the cells are discussed.

In Chapter 3, we will take a look at the arithmetic logic units and the different structures and architectures that they come in. The different architectures for the CMOS process are shown. At the end each of the architectures are described for the RSFQ circuits and the pros and cons of them in RSFQ regime are discussed.

In Chapter 4, the different parts for an arithmetic logic unit are shown. These parts were designed, simulated, fabricated separately and then tested in our system to confirm the function of every circuit. The different parts that are discussed are both from the main circuit of the arithmetic logic unit and the interface registers used for handshaking ALU with lower frequency CMOS logics. All the circuits with their experiment results are demonstrated.

Chapter 5 is the description for the test system. In this chapter our efforts to make a robust system for testing analog and digital circuits in very low temperature and low noise environment is displayed. The test system consists of different parts. The mechanical parts are responsible for cooling down the structure and vacuuming the chamber. The control electric parts are responsible for stabilizing the temperature at stages and control the level of vacuum for safety of system. The measurement parts are in place to supply the needed bias and voltage signals and record data in high and low frequency. All these parts and automation of the test setup is discussed in this chapter.

Finally, Chapter 6 explains the achievements in the thesis and the conclusion of the work. In this chapter two different ALU with different structures are shown and the simulation and test results are discussed as well.

#### 2. SUPERCONDUCTOR CIRCUIT THEORY

After the discovery of superconductors, they were used in many different applications and fields. Most of these applications were using the superconductor as perfect conductor. These applications include power lines [27], fault current limiter switches [28], bolometers [29], narrow band filters [30] and cryotron[31]. However, after the Josephson junction discovery, many new applications surfaced. These application included the magnetometer and gradiometer using RF and DC SQUIDS, Josephson voltage standard [32], analog to digital converters quasi one junction SQUID (QOS) and digital logic circuits.

The biggest impact that semiconductor technology had in the last decades is in the field of computing. Josephson junction based logic circuits can contribute to this field dramatically. Although superconductor logic circuits have some drawbacks as in cooling and fabrication, they have advantage over semiconductors in case of speed and power consumption. Semiconductors have been optimized and the fabrication process is advanced for years and they are hitting the limits of the integration and speed limit. The superconductor logic circuits are rather new but they show very reliable function at orders of magnitude lower power consumption and by an order higher speed.

There are different approaches to superconductor circuits. Some include voltage state and flux state logics [33], [34]. Superconductor circuits have been demonstrated with low and high temperature materials. However, the HTS materials are not easily manipulated and the junctions fabricated with these materials would not have desired characteristics. Low temperature materials are easier to work with and there are commercial processes for Nb based superconductors with multiple layers [2].

For every logic design, we need a switch to change the signal path and a memory to store the data bit. In semiconductors the transistor is the switch and the capacitor stores data. In superconductor circuits, the switch is the Josephson junction or SQUID and the memory is the inductance.

Before RSFQ technology and even the invention of Josephson junctions, cryotron superconductor digital circuits were introduced. These circuits used two different superconductor materials with different critical magnetic fields ( $H_c$ ). As the magnetic field reached the critical value, the material would switch and become resistive. The memory in this material was a simple superconductor loop that could store magnetic flux and therefore current. However this technology was abandoned as the speed of the gates was limited by the (L/R) value and the semiconductors could compete with this speed and had better margins.

In this chapter, we investigate two different Josephson based logics, Rapid Single Flux Quantum (RSFQ) and Adiabatic Quantum Flux Parametron (AQFP). Then we demonstrate some of the cells and basic circuits that we fabricated in these logics. The advantage and disadvantages of these methods will be investigated and finally the fabrication process that we used for the circuits are shown.

### 2.1. Rapid Single Flux Quantum (RSFQ)

RSFQ was first introduced as alternative superconductor logic by K.K. Likharev and his coworkers. in late 1980s [4], [14]. In this technology, the Josephson junction act as a switch and the data is stored as a magnetic flux in SQUID loops. As mentioned before, RSFQ logic circuits have very low power consumption and therefore are considered to use as an alternative logic in computing centers [35], [36]. The integration capabilities and the high speed of the RSFQ circuits are also dependent on the fabrication process. Various groups are currently working on the processors based on the RSFQ technology [37]–[43].

In RSFQ circuits the data is transferred as a voltage pulse known as single flux quanta (SFQ). Each SFQ pulse has a same energy of one quantum of flux ( $\Phi_0$ ). The pulses have a width of some Pico-seconds and therefore the circuits can function at hundreds of gigahertzes in theory, such as a toggle flip-flop cell that is reported to work at 770 GHz [44], [45]. Since the storage in the RSFQ circuits are magnetic flux hence current, the bias of the circuits should also be current. The junctions in the RSFQ circuit are current biased up to %90 of their critical value. Figure 2.1 shows the I-V characteristic of the Josephson junction as it is biased.



Figure 2.1: I-V characteristic of a Josephson junction as it is biased.

Figure 2.1 demonstrates the mechanism of the SFQ pulse generation in RSFQ circuits. As we bias  $(I_B)$  a junction to near critical current value  $(I_C)$ , if a small current excitation comes to the junction, the current would pass the critical value and voltage would be generated over the junction. The transition that is shown by 2 takes about 1ps time and the junction would go back to the starting point on the path 3. In this transition the phase of the junction changes  $2\pi$  and the junction would go back to the state that it was in with no memory of the change. In this way, the junction would be similar to a pendulum.

The pulses that are generated in RSFQ circuits are in the order of Pico-seconds and the amplitude depends on the fabrication process. In the standard process which we used for our circuits, the amplitude of the pulses was about 400µV. The fast pulse and the small amplitude make these pulses impossible to observe with our equipment. However, in order to interact with RSFQ circuits, we need to generate SFQ pulses and also read the output of the circuits. Therefore, two circuits are introduced that convert the DC signal (it is called DC because the frequency of changes is really small compared to SFQ pulses) to SFQ pulse and vice versa. The DC/SFQ cell is like a quasi-one junction SQUID (QOS) that detects a threshold and as the current pass that threshold (in our case 1 mA), the circuit generates an SFQ pulse. Figure 2.2 demonstrates the Input and output of the circuit. As seen in the figure, the DC current signal is converted to SFQ pulses and the duration of each pulse is in order of Pico-seconds.



Figure 2.2: The conversion of DC signal to SFQ pulses with DC/SFQ cell.

The counterpart of the DC/SFQ circuit is SFQ/DC circuit that converts the incoming SFQ pulses to DC signal. The SFQ/DC circuit is like a toggle flip-flop that changes the state as it detects an SFQ pulse. On each incoming pulse, if the state is zero, the circuit changes to state one and start oscillating. Since these oscillations are really fast, at the output we only see the average sum of the pulses or the RMS value which is a DC signal. It should be noted that the state of the SFQ/DC is not important and the transitions between each state would determine if we have a data. This is like a transition edge double data rate (DDR) in semiconductor logic. Figure 2.3 shows the difference between single data rate and DDR. Figure 2.4 shows the input pulse and the result for output data [46].



Figure 2.3 : Single data rate and double data rate data.



Figure 2.4 : SFQ pulse to DC conversion in a SFQ/DC cell. Note that each pulse causes an state change in the output.

As we discussed before, because of the RSFQ circuits nature, the Josephson junctions are biased with current rather than voltage. Since there in no resistance in the circuits, the static energy consumption of the circuits is zero. However, in large integrated circuits with thousands of junctions, it is not practical to have a bias line for each individual junction at cells. To maintain a correct bias distribution for each junction, a resistance is added at each bias input of a cell. This resistance would enable us to bias the circuits with 2.5mV voltage. The main energy consumption source in RSFQ as mentioned in the introduction is the static power dissipation from these resistances that cause about %99 of the power consumption.

In some other variations of RSFQ circuits such as e-RSFQ (efficient RSFQ) or LRSFQ, these resistances are replaced by a combination of Josephson junction and inductances or just very large inductances respectively. These circuits could consume two orders of magnitude less power but they become much bulkier or compromise the bias margin of a normal RSFQ circuit [47]. In following sections we will discuss more about the RSFQ cells used in our circuits.

## 2.2. Adiabatic Quantum Flux Parametron (AQFP)

The computation merits of any system are the power consumption and the speed of the system. For many years, the power consumption was not the main issue and the main focus was on increasing the speed. In recent years due to huge increase in computation needs, power consumption draws attention and is now considered the major metric of the computer design [48]–[50]. As mentioned in chapter 1, Landauer's law stated that a bit loss in the system causes energy consumption of K<sub>B</sub>Tln2 to compensate the changes in the entropy of the system [51]. This sets the theoretical limit for energy consumption of the logics such as two input gates that have one bit output such as OR gate. However, this limit applies to the logics that have irreversible logic operation including CMOS circuits and RSFQ [12], [13].

Edward Fredkin showed that if we use reversible computers, we can surpass this theoretical limit and have less energy consumption in the system [52], [53]. The Fredkin gate prevent the power consumption resulted from the bit loss by conserving the entropy of the system. The Fredkin gate has 3 inputs and 3 outputs and is reversible since the inputs are derived from outputs. Many different models and physical devices have been investigated for reversible computation [54]–[56].

One of the most efficient computing methods is reversible computing which allows the user to perform the needed calculations without losing any bits. Adiabatic quantum flux parametron (AQFP) is a form of a new superconducting logic that performs reversible computing method system and uses 2 to 3 orders of magnitude less power than former RSFQ circuits by performing reversible computing. In reversible computing because no data bit is lost, its power consumption is smaller than the theoretical Shannon-Neuman-Landauer limit [57]–[59].

Figure 2.5 shows the basics of adiabatic switching. Instead of classical switching which the bit pass through energy barrier and need energy to go back to the state before.



Figure 2.5: Adiabatic switching versus conventional switching [60].

In adiabatic switching the transition does not involve any barrier and therefore the bit is not lost. Operation principle for AQFP is based on the gate invented by Goto et al. [61]. Figure 2.6 shows the quantum flux parametron basic gate that operates in adiabatic regime. Because of the adiabatic nature of these cells the clock frequency of the cells cannot go higher than ~ 5GHz.



Figure 2.6: The basic operation of an AQFP gate designed by Goto [61].

In Figure 2.6 the  $I_x$  is the bias current. There are two superconducting loops that each has a junction. As  $I_x$  is coupled to these loops via the mutual inductances, a flux will form in one of the loops depending on the polarization of  $I_{in}$  current. Each flux will determine the state of the output current. Since there is no flux generation and the flux just go from one loop to another on the change of input polarization, the system would work adiabatically and there will be small power dissipation at switching. The bias current is also AC and therefore there is no static power waste.

## 2.3. RSFQ Cells

After close examination of the available in superconductor logic technologies, we have decided to use RSFQ technology for this project. The factors that we considered include, practicality of design, interface with room temperature systems and CMOS compatibility. RSFQ circuits have been around for 20 years and the basic cells are optimized and have a large bias margin. The fabrication processes are available

commercially for superconductor VLSI. The interface between RSFQ and equipment such as oscilloscope, amplifier and pattern generator is possible due to existence of DC/SFQ and SFQ/DC cells. RSFQ circuits are compatible with CMOS memory after an amplifier stage. Therefore, we decided that we would use RSFQ technology for designing the core of the processor.

There are different circuits and cells available in RSFQ technology. Many of these cells were designed by K. K. Likharev et al. in the first introduction of the technology [4]. Other cells were gradually designed and optimized by various groups throughout the years. The circuits in RSFQ technology can be divided in two main groups. First group are the digital cells that by the pulses, perform digital logic computation. The second set of cells is analog ones. These cells would convert analog signals to digital or perform filtering, wave guides and sensing. The analog cells consist of analog to digital converters such as QOS, CMOS to RSFQ logic converter like DC/SFQ cell and passive transmission lines (PTL) and their driver and receiver that are wave guides for SFQ pulses.

# 2.3.1. Digital cells

There are various types of digital cells available in RSFQ circuits. The first kind is wiring cells. One of the problems with RSFQ circuits is the fan-out. The fan-out determines that how many inputs a cells output can drive. In RSFQ circuits the fan-out is one. The limited fan-out of the RSFQ circuits causes the need for many cells in the wiring between the gates and the architecture and signal distribution tree to be complicated. For this purpose, many different cells have been introduced. These cells help to overcome the fan-out problem and distribute the signal where ever they are needed without losing data.

Other type digital cells are the logic cells that allow us to perform logic operation on the incoming input data. The logic cells include: AND, OR, XOR and NOT gate. It is noteworthy to mention that all these cells in RSFQ logic regime are clocked. Since RSFQ logic works with pulses rather than voltage level, the clock is needed to control the state of the gate and helps synchronization of the signals by resetting state of the gate to zero. The clock's importance can be seen well in the Moore diagrams in upcoming section.

Finally the last type of digital cells that we discuss here are flip-flops. The flip-flops are the memory cells in the RSFQ and used for the registers of the ALU. The D-type flip-flop is used at the output stages of the sub circuits to insure the synchronization of the outputs with clock signals. Toggle flip-flops are used in the circuits such as adder, multiplier and multiplexer. These cells are discussed in detail at coming sections.

#### 2.3.1.1. Wiring cells

One of the most used basic wiring cells in RSFQ technology are Josephson transmission lines (JTL). JTL cell is a transmission line that also acts as a signal router. If the signal gets corrupted due to noise, JTL cell would reform the shape of SFQ signal. Figure 2.7 shows the schematic of a JTL cell.



Figure 2.7: Schematic of a JTL cell used for RSFQ logic circuits.

In Figure 2.7 we see that the JTL has two Josephson junction,  $J_1$  and  $J_2$ . These two junctions are biased to %90 of their critical currents via the bias line. When a pulse comes from the input port Din, it will pass through the inductances until it reaches the superconductor loop involving the junctions. This loop will not store the SFQ pulse because  $I_cL < \Phi_0$  and therefore  $\Phi_0$  could not be stored inside it. As the outside excitation reaches the loop, the junctions would generate an SFQ pulse and this pulse would travel to the Dout port. The bias resistance is there to insure that at 2.5mV the junctions get the right amount of bias current. The inductances at input and output of the circuit are for the impedance matching. The impedance of the inputs and outputs are set at  $2\Omega$  for the circuits designed in standard process. There are many variations

of JTL cell. The schematic of the cells are almost the same but in order to conserve space in big circuits, the layout is altered.

As mentioned before, the fan-out in RSFQ circuits is one. Therefore to distribute a signal pulse to multiple inputs, we have to use a circuit to split a signal to more signals. This circuit is known as splitter. The splitter could multiply an incoming signal to two or three pulses.

Figure 2.8 shows the schematic of a splitter cell used in RSFQ circuits. The first part of the circuit until the two way is a buffer stage like half a JTL cell. This stage assures that the SFQ pulse is in right shape and the impedance of the input is matched with other circuits. Since there is no resistance in RSFQ circuits, the voltage splits according to the inductances at two ways. The inductances for both branches are the same in the splitter, therefore the signal would be halved and enter each branch. The critical current value for  $J_2$  and  $J_3$  junctions are small, therefore, they would response to smaller excitation and create SFQ pulse. As the inductances and the junction values are similar in the splitter, the output pulses would exit the circuit via port B and C at the same time.



Figure 2.8 : Schematic of a RSFQ splitter cell.

The other wiring cell that we used in this work is the merger. Merger cells act as the opposite of splitter cell. When two SFQ pulses come to the merger at the same time, this cell would combine the two pulses to one pulse. However, if one pulse comes to

one of the inputs, the cell gives output too. This cell is used when we want to combine two or more data lines to one line.

The schematic of a merger cell with two inputs is given in Figure 2.9. As seen in this figure, the input ports A and B are connected to the  $J_3$  and  $J_4$  junctions via a JTL buffer line. Like splitter this JTL at inputs would guaranty the shape of the SFQ pulse and impedance match to other circuits. As an SFQ pulse comes from one of the inputs,  $J_3$  or  $J_4$  junctions would pass portion of that pulse through and that is enough to excite  $J_5$  to generate a pulse.  $J_3$  and  $J_4$  junctions also prevent the pulse from reflecting and going back to the inputs.



Figure 2.9: Merger circuit schematic for RSFQ process.

When two SFQ pulses simultaneously or with very small time difference come from both the inputs, after passing through  $J_3$  and  $J_4$  junctions would not be strong enough to excite  $J_5$  to generate two SFQ pulses and instead it only generates one pulse in the output.

#### **2.3.1.2.** Logic cells

Logic cells are the basic blocks for the digital circuits and digital computing. In this section we will discuss the four main gates that exist in RSFQ technology. These gates include AND, OR, XOR and NOT. As mentioned before, because of the RSFQ logic's pulsed nature, all the gates in this technology are clocked.

The first gate that we discus here is the AND gate. The AND logic is true when all of the inputs are present. Figure 2.10 is the Moore diagram of a Josephson AND gate as presented in [62], [63].



Figure 2.10: Moore diagram of the Josephson AND gate [62].

The cell's state is not clear as we apply bias to activate it. Therefore we should apply a clock signal at first to reset the cell to state zero. After that depending on the incoming input, the circuit would switch to state 1 or 2. At this point, if the clock signal comes the cell would go back to state zero and would be reset. However, when the other input comes at the states 1 or 2, the device would go to state 3 and by incoming clock signal we would have an SFQ pulse at the output.

The main difference that we see here between RSFQ AND logic and CMOS AND logic is the lack of clock signal in the CMOS AND. The pulse nature of the RSFQ logic make us to store the coming signal inside the logic and make the operation with the clock signal to prevent data loss.

The function of the AND cell can better be understood in Figure 2.11. As it is shown in the schematic of the cell, the first stages after the input are similar to JTL. This stage would ensure that the signal is corrected and has a right timing. As the inputs pass through the JTL stage, they reach the main loop for the AND gate. There are

two values of inductances in the loop. The loop with junctions and 0.2pH could not store flux while the other loop with 2.41pH inductance will store flux in it. As the clock pulse comes, if there are flux in both loops, there would be an SFQ pulse generated to go to output. If there is no flux or just flux is stored in one of the loops, there would be no output and the stored fluxes and the clock gate pulse will go to the ground.



Figure 2.11 : Schematic of a Josephson AND cell.

The next gate that we used in the design of ALU is the Josephson OR gate. The OR logic is true when each of the inputs or both of them are true. Figure 2.12 shows the Moore diagram of clocked Josephson OR gate. In this diagram, when each or both of the inputs becomes true, the state of the cell changes to 1. As the clock signal comes in the state 1, an output would generate and the state of the circuit goes back to 0. At state 0, the clock pulse would not affect the cell.



Figure 2.12: Moore diagram of an OR gate [62].

The schematic of the Josephson OR gate is shown in Figure 2.13.



Figure 2.13: Schematic of a Josephson OR gate.

As seen in Figure 2.13, the first stages in the inputs and clock signal paths, is the buffer stage similar to JTL circuit. The input signals then pass through junctions  $J_3$  and  $J_4$  and then stored inside the SQUID loop with junctions  $J_8$  and  $J_9$ . As the clock pulse comes to the circuit, if there is a stored flux inside the SQUID loop, it will empty the loop to the output port C. The other junctions such as  $J_5$  and  $J_{12}$  prevent the signal to go to the wrong path.

The next gate that we used in the design of ALU is the Josephson XOR gate. This gate is used inside the logics and also to make a subtractor from adder. The XOR logic is true when just one of the inputs is true. Figure 2.14 shows the Moore diagram of clocked Josephson XOR gate. In this diagram, when there is an input coming to the cell, the cell would go to the state 1 or 2. When the cell is in these states, if a clock pulse comes, the cell would go back to state 0 with a signal generated at output C. However, when the cell is in state 1 or 2 if the inputs b or a come respectively, the cell changes to state 3. At this state the clock pulse would reset the cell to state 0 without generating any signal.



Figure 2.14: Moore diagram of an XOR gate [62].

The schematic of the Josephson XOR gate is shown in Figure 2.15. This schematic is used for the gates in the CONNECT library. As seen in Figure 2.15, the first stages in the inputs and clock signal paths, is the buffer stage similar to JTL circuit. If there is an input SFQ pulse from any of the a or b ports, the flux would be stored inside the loop containing junctions  $J_8$  and  $J_{11}$ . Now by incoming clock pulse, the SFQ pulse would go to output. However, if there is a flux stored inside the loop and the other input comes, a path would be open and the SQF pulse would then go to the ground and the loop would be empty. This way, the clock pulse would not generate any SFQ pulse at the output.



Figure 2.15 Schematic of a Josephson XOR gate.

The next gate that we used in the design of ALU is the Josephson NOT gate. This gate is used inside the logics. The NOT logic is true when the input is false. Figure 2.16 shows the schematic of the Josephson NOT gate.



Figure 2.16: Schematic of a Josephson NOT gate.

In this circuit, the incoming input would cause an open path to ground plane and the incoming clock SFQ pulse would go to ground. If there is no input available, the path to the ground would be close and therefore the clock pulse would go to the output port generating an SFQ pulse.

#### **2.3.1.3.** Flip-flops

The flip-flops are basic memory cells in RSFQ technology. Flip-flops can be used to store data or the state of the circuit to perform various tasks such as multiplexing, demultiplexing and synchronization. There are different types of flip-flop cells available but we only discuss the ones that we used in our circuits.

The first type of the flip-flop that was used in the ALU architecture is the D-type flip-flop also known as delay flip-flop. The DFF logic is true when there is an input signal before the clock signal. Figure 2.17 shows the Moore diagram of clocked Josephson DFF gate. In this diagram, when there is an input coming to the cell, the cell would go to the state 1. In this state the clock signal can cause a pulse generation in output and resets the state of the DFF to state zero.

It is important to note that there are various types of DFF circuits available in RSFQ technology. Some of the have two outputs that one output is the inverted version of another. Also RDFF cells have a reset port that if the DFF is in 1 state the reset signal would make it go to state zero without any pulse generation at output. Another type

of DFF is EDFF. In DFF cells if more than one input comes to cell without any clock pulse, the DFF would not work correctly and needs to be reseted. EDFF cell don't have this problem. So at input stages that the signals are not in control of the designer, it is better to use EDFF cells.



Figure 2.17: Moore diagram of clocked Josephson DFF gate [62].

Figure 2.17 shows the Moore diagram of a D-type flip-flop. As seen in the figure, an input pulse would change the state of the cell to 1. When a clock pulse comes at this state, there would be an output.

Figure 2.18 shows the schematic of a Josephson DFF gate designed in the standard cell library.



Figure 2.18: Schematic of a Josephson DFF gate designed in.

This is one of the less complex circuits that only consist of an SQUID loop to trap the input flux and a clock trigger to release it to the output. The loop containing junctions  $J_4$  and  $J_6$  is the SQUID loop and  $J_1$  would supply the trigger as a clock pulse comes.

The next cell is the toggle flip-flop cell or TFF. As the name indicates, the cell's output would toggle by each incoming pulse. This cell is very useful for designing sequential logic circuits and frequency dividers. Figure 2.19 shows the Moore diagram of a toggle flip-flop. As seen in the figure, an input pulse would change the state of the cell to 1 while generating an output in first port and when the next pulse comes at this state, there would be an output at the other port and the state would go back to 0.



Figure 2.19: Moore diagram of TFF cell [62].

Figure 2.20 shows the schematic of a Josephson TFF gate designed in standard library. This cell is also not very complex. The circuit has a main loop containing the junctions  $J_2$  and  $J_3$ . This loop would keep the state of the TFF cell. Each pulse would change the state of the loop and depending on the state that the loop is in, junctions  $J_4$  or  $J_5$  would be triggered and send an SFQ pulse to output ports Dout0 or Dout1 respectively.

A variation of this cell is used in the SFQ/DC converter. In the SFQ/DC the output is constantly generated and the circuit is in oscillation mode. Therefore when the state of the circuit changes from zero to one, the Dout1 that is connected to the  $50\Omega$  resistor, would constantly generate pulses.



Figure 2.20: Schematic of a Josephson TFF gate.

The next cell is the clocked toggle flip-flop cell or T1. As the name indicates, the cell is acting like an ordinary TFF but the clock pulse could reset the state of the cell or initiate the output. This cell also acts as a one-bit full adder. Figure 2.21 shows the Moore diagram of a T1 cell. As seen in the figure, a data pulse would send the cell to state 1. Now depending on the incoming signal, whether it is from the clock or data line, there would be an output in sum or carry respectively. The clock would always set the state to zero and can be used to reset the cell.



Figure 2.21: Moore diagram of T1 cell [62].

Figure 2.22 shows the schematic of a Josephson T1 gate designed in standard library. The circuit has a main loop just like the TFF cell that keeps the state of the circuit.

Each pulse from Data port would change the state of the loop and if the loop has already a flux inside it, which means the state is 1, an SFQ pulse would be send to Carry output port. However, in this cell unlike TFF, a path for clock pulse is synched to the main loop that can trigger the loop and if there is a flux inside it, it will send an SFQ pulse to Sum output port.



Figure 2.22: Schematic of a Josephson T1 gate.

#### 2.3.1.4. DC/SFQ and SFQ/DC

Earlier in this chapter, we investigated the DC/SFQ and SFQ/DC cells input and output signals. Here we will see the circuits that would do these operations. Figure 2.23 shows the schematic of a Josephson DC/SFQ gate designed in standard library. This cell is actually a threshold detector and as the threshold of the input passes about 1mA, the junctions J<sub>1</sub> and J<sub>2</sub> would oscillate and generate an SFQ pulse. The rest of the circuit is like a JTL cell and just reform the pulse shape and match the circuit to the rest of the cells. The function of the DC/SFQ converter cell could be better understood by the studying of the clock less quasi one junction SQUIDs (QOS). These cells act in a similar matter.



Figure 2.23 : Schematic of a Josephson DC/SFQ gate.

Figure 2.24 shows the schematic of a Josephson SFQ/DC gate designed in CONNECT library. As it is seen in this figure, the circuit has a similar loop to TFF cell. However, this loop is biased and when the state of the loop changes from 0 to 1 by an incoming pulse, instead of just one SFQ pulse, the loop starts to oscillate and make an SFQ pulse train. As another input comes, the cell resets to 0 state and the pulse train would be stopped. The output pulse train have an average DC value therefore is considered DC pulse.



Figure 2.24 : Schematic of a Josephson SFQ/DC gate.

## 2.3.2. Analog design

Some of the cells that are used in RSFQ logic design are basically analog circuits and need some considerations in design such as impedance matching, buffering the output signal and filtering. In this section we will discuss the passive transmission lines (PTL) with their respected driver and receiver circuits. In chapter 4, we will show our designs for PTL, driver and receiver circuits.

#### **2.3.2.1.** Passive transmission lines (PTL)

One of the problems that arise in large scale RSFQ logic circuits is the wiring of the cells and clock paths. Due to high bandwidth of about 200 GHz of the SFQ pulses and limited fan-out of the RSFQ cells, the majority of the wirings are done with active lines also known as Josephson Transmission Lines (JTL) [64], [5]. Excess use of JTLs for wiring causes a dramatic increase in power consumption and the delay value of the logic circuits. One of the solutions to this problem is to use the passive transmission lines (PTL) [65]. The main drawbacks of the PTLs are that they consume much space especially in a 4 layer process such as AIST Standard Process (STP2) [66].

The mechanism of the pulse propagation on the strip-line has been studied. This shows us that we should characterize the superconducting line at about 165 GHz for SFQ pulse propagation. So for designing a PTL that matches in impedance with the rest of the cells, we have to consider its model at this frequency. The PTL could be modeled with ladder  $\pi$ -model. Figure 2.25 shows a model of a PTL used in RSFQ circuits. The parameters depend on the fabrication process and can be found via high frequency simulations.



Figure 2.25: The ladder model of a PTL.

### 2.3.2.2. Driver and receiver circuits

The driver and receiver circuits were also modeled using the RCSJ model for the Josephson junctions. Figure 2.26 and Figure 2.27 show the model for the driver and receiver circuits matched for unshielded 38 µm transmission lines used in CONNECT cell library. Each PTL needs a driver so the line is matched in impedance with the rest of the cells and also the pulse has enough power to travel the line. The receiver circuit at the end of transmission line is necessary to match the impedance and also convert the traveled pulse to SFQ pulse again. The driver and receiver cells

were designed in a way that they also match the  $2\Omega$  impedance of logic cells in CONNECT library.



Figure 2.26: Driver circuit for PTL lines used in standard library.

As seen in Figure 2.26,  $L_{IN}$  would match the  $2\Omega$  impedance and the rest of the circuit,  $J_{out}$  will provide the power and  $L_S$  and  $R_S$  match the impedance for the line. Figure 2.27 shows the receiver in which the  $L_{IN}$  and  $R_{IN}$  are for line impedance matching and the rest is the buffer for the signal to reform the shape of SFQ pulse.



Figure 2.27: Receiver circuit for PTL lines used in standard library.

#### 2.4. Fabrication Process

The fabrication process that we used in our project to fabricate the circuits was by a micro-fabrication facility named CRAVITY based in National Institute of Advanced Industrial Science and Technology (AIST) in Tokyo, Japan [67], [68]. There are different Niobium based processes available in this facility with different layer numbers and critical currents such as standard process and advanced process.

The process that we used is known as standard process (STP2) [66]. This process is developed and optimized for more than 10 years and the parameter margins are very reliable. In this process there are 4 Niobium superconductor layers and there is Molybdenum layer for resistances. The substrate is silicon and the insulator between the layers is silicon oxide. Figure 2.28 shows the cross section of STP2 fabrication process and the layers and their thicknesses are seen here.



Figure 2.28: The layers of the STP2 fabrication process.

As it is seen in Figure 2.28, the bottom layer is the ground plane. Generally the ground plane is not used for circuit designs and inductances. The ground plane acts as magnetic shield and would not let outside magnetic flux to penetrate the circuit and distort the function of cells. However, if the outside noise is strong, some flux would penetrate the weak links of the superconductor plane which is the junctions. To prevent this we place some holes inside the ground plane known as moats far from the junctions. These holes would be the new weak links and would trap the outside magnetic flux.

As seen in the Figure 2.28, the Josephson junctions are placed between the counter (M3) and base (M2) Niobium layers. The insulator between these two layers to form Josephson junction is aluminum oxide. The top layer which is the thickest of all is control layer and usually used for interconnects or magnetic shielding. Figure 2.29 demonstrates the layers and their functions in this process.

| Layer name | Alias | Layer No. for<br>STR data | Clear/<br>Dark | Function                               | Material | Thickness (nm) |
|------------|-------|---------------------------|----------------|----------------------------------------|----------|----------------|
| GP         | M1    | 1                         | clear          | Ground plane                           | Nb       | 300            |
|            | I1    |                           |                | Inter layer insulator                  | SiO2     | 200            |
| RES        | RES   | 3                         | dark           | Resistor                               | Mo       | 80             |
|            | I2    |                           |                | Inter layer insulator                  | SiO2     | 100            |
| RC         | RC    | 9                         | clear          | Contact hole between RES and BAS       |          |                |
| GC         |       | 2                         | clear          | Contact hole between GP and BAS        |          |                |
| BAS        | M2    | 4                         | dark           | Lower electrode of JJ and lower wiring | Nb       | 300            |
| JP         | JP    | 5                         | dark           | Protection for JJ                      | Al, AlOx |                |
| JJ         | JJ    | 6                         | dark           | Josephson junction (JJ)                | Nb       | 150            |
|            | I3    |                           |                | Inter layer insulator                  | SiO2     | 400            |
| BC         |       | 7                         | clear          | Contact hole between BAS and COU       |          |                |
| JC         |       | 10                        | clear          | Contact hole between JJ and COU        |          |                |
| COU        | M3    | 8                         | dark           | Upper wiring                           | Nb       | 400            |
|            | I4    |                           |                | Inter layer insulator                  | SiO2     | 500            |
| CC         |       | 11                        | clear          | Contact hole between COU and CTL       |          |                |
| CTL        | M4    | 12                        | dark           | Top wiring or shield layer             | Nb       | 500            |

Figure 2.29: Layer properties of the STP2 process.

#### 3. ARITHMETIC LOGIC UNIT

The demand for a faster and lower power consuming circuits is rising in recent years. CMOS technology has reached the limits for lowering the power of the circuits and increasing the clock frequency because of the nature of MOSFETs. One of the alternative solutions for the CMOS technology is the superconducting circuitry like RSFQ logics that has no dissipation in interconnects and the nature of switching in this logic is different from MOSFETs.

In this chapter we will investigate different ALU architectures designed with RSFQ logics. Later in final chapters, we will discuss the architectures that we chose for our design, the test system that was prepared for the experimental measurements and final experimental and simulation results for the cells.

# 3.1. Superconductor Arithmetic Logic Unit (ALU)

Arithmetic Logic Unit (ALU) is the main part of every processor or microprocessor architecture. There have been reports about the arithmetic logic unit (ALU) design with superconducting logic units and demonstrating the function of them in liquid helium cryostats. First works toward the design of arithmetic logic units using RSFQ logics were done by Semenov et al. in 90's [69]. Their purpose was designing a bit-serial architecture for the fabrication process that they have in that time. The schematic was designed for about 5000 junctions with up to 20GHz clock frequency. The design proposed in that article, was a bunch of small functional circuits that all have similar data and port standards. Then these circuits were loosely coupled and make a whole processor. The units get activated by an input signal and deactivated as they give the output and a handshake signal. Therefore, the instruction set could be applied as a decoder that gives activation signal to the units.

Two main units of this design are accumulator and ALU. The accumulator circuits have the duty to connect different circuit blocks to the main bus as they are activated by instruction set and disconnect them as they finish their job. Accumulators have

two handshake bits which is seen in the schematic as the AccEmpty and AccFull. These bits determine if the accumulator is ready for the next command or done with previous command. The circuit other than wiring cells such as merger and splitter consists of Muller C-element and DFF cells.

The main part of the processor is pipelined ALU. The designed ALU as demonstrated in the schematic at [69] have three logic operators and two arithmetic operators, add and subtract. The subtractor is built by using XOR gates at the input register of an add operand and calculate the two's compliment of the register.

Based on the instruction set that is feed to the ALU block via the accumulator, the ALU block would produce an output and puts the output on the data bus and send the Done handshake bit. This bit determines that the data is ready and the accumulator would collect the data and puts it on the output register of the processor. Then it will be decided that if the data should go to the memory block on the processor or should be transferred to the random access memory (RAM) attached to the processor. This is a simple pipeline serial architecture that was designed in the same way as the first generation of the processors such as Intel-4004 or Intel-8008. The schematic was designed but they did not fabricate the chip since at that time the fabrication process for LTS was not stable enough for the number of junctions required in this design.

The next notable attempt at designing the RSFQ processor was done also at SUNY Stony Brook, and the processor was called FLUX. This work was done some years later at 2001 by Bunyk et al. in an attempt to fabricate a 16-bit ALU with registers and instruction set memory for over 30 commands [70]. The chip was designed and had over 90000 junctions and the area of 10×15mm2. However there are no experimental results available for any of these works in the literature.

Figure 3.1 shows the block diagram of the designed FLUX processor. The block diagram supposed to mimic the placement of the different blocks in the final layout. The instruction or flash memory is placed at the top of the chip. This unit contains the instruction sets and the program. The ALU and registers are placed side by side in a manner that minimizes the need for excessive wiring and hence adding delay. The input stage controls the flow of data to the chip by sending receive flag bit and the output stage control the flow out of the chip by send bit.

The full layout of the chip for FLUX was not designed and some parts of it were designed and the layouts of those parts were fabricated as the proof of the concept.

These parts include some of the basic gates, a 512-bit shift register and some of the memory blocks. The group moved on to designs with less junctions and bias requirements.



Figure 3.1 : Block diagram of the FLUX processor [71].

Bunyk and his colleagues continued their work on the ALU architecture and in 2003 designed the FLUX-1 microprocessor [71]. Flux-1 chip was designed as an 8-bit microprocessor for 20GHz frequency and parallel partitioned architecture. This design contains 63000 Josephson junctions with the dimensions of  $10\times10~\text{mm}^2$ . Though this chip was more promising than its predecessor and had a better power dissipation, experimental result was not reported to our knowledge.

Figure 3.2 shows the block diagram of the FLUX-1 processor. As it is seen in Figure 3.1 and Figure 3.2, the architecture of the chips is very similar to each other. However in FLUX-1 chip, by applying some optimization methods, the numbers of the junctions are dramatically decreased.



Figure 3.2: Block diagram of the FLUX-1 processor [71].

FLUX-1 had some distinctive characteristics in its architecture that include, two way Long Instruction Word (LIW) architecture that make the interconnections and interactions shorter hence reducing connectivity between the blocks of the processor and storage memory, bit-stream data process in the eight different 8-bit ALU's and registers, pipelining that allows the data and instructions to be in flight in one clock cycle, the module based design that allows the stacking of the blocks to be easier and finally this chip have a small size instruction set that makes the programming for the chip easier.

The optimized chip is then called FLUX-1R. The chip then was fabricated using 4 metal layers, 1.75µm Nb process. Figure 3.3 shows the photo from the fabricated FLUX-1R chip. The chip still consumes about 10mW of power which result in about 4 Amperes of current bias at 2.5mV. This value for the current bias is still very high and therefore there was no experimental measurement for the chip.



Figure 3.3 : FLUX-1R chip layout designed w,ith 63107 Josephson junction. The power dissipation of the chip is about 10mW [71].

The work on a RSFQ processor was continued and in 2004, Kang et al. from Incheon University in Korea have designed a RSFQ ALU with half adders in serial pipeline architecture [72]. They used 1KA/cm2 process and measured the chip in liquid Helium cryostat. Figure 3.4 shows the block diagram for the ALU architecture designed with only half-adder (HA) cells.

In part (a) we see the single cell for the ALU design. By changing the state of each switch we can generate different logics. If a and b is on, the output will be OR logic,

a and c give the ADD operator, b is the AND gate and if the a is on we have XOR logic.



Figure 3.4: (a) The block diagram for a single cell of the ALU that consists of three switches and a half-adder. (b) 2-bit ALU using the single block cells [72].

Figure 3.5 shows the block diagram for 4-bit ALU based on HA and switches. As seen in this figure, for each added bit to the arithmetic logic unit, there is a need for

addition of an extra stage of DFF and HA. Therefore, this architecture would have a big latency in higher bits and would not be effective. However in lower number of bits it would be a simple architecture with good latency and low bias value.



Figure 3.5: The block diagram for 4-bit ALU based on HA and switches [72].

Müller et al. from Stellenbosch University in South Africa have designed a superconductor ALU at 2006 using RSFQ-AT gates [73]. The gate was designed based on the new gates but there is no evidence of chip design and experimental measurements. This design for the microprocessor was done by Verilog models.

The work on the RSFQ ALU was continued by different groups worldwide but the most recent and notable affords are done by Fujimaki et al. at Nagoya University, Japan. In 2015, they have designed a 4-bit bit-slice ALU for using at a 32-bit RSFQ microprocessor[75]. They used 10kA/cm2 with 9 layer of Nb known as advanced process (ADP) to fabricate the circuits. The size of the chip was 3×1.7 mm2 with working frequency of 50GHz.

There are different RSFQ based microprocessors designed by Fujimaki laboratory group in Nagoya University called CORE e series. The CORE e series are general purpose microprocessors with random access memories and bit-serial architecture [74]. Figure 3.6 demonstrates the different CORE e designs with their characteristics and the layout of the chip. As it is seen in the table, the number of junctions in the CORE designs is very smaller than their predecessors.



Figure 3.6: Processor designs in Nagoya University known as Core-e series [74].

There are two reasons for it. The first reason lies in the architecture of the processors. Most of the designs before try to mimic the architecture of the CMOS processors which cause the RSFQ circuit to get much bulkier and in the CORE designs; the architecture is optimized for the RSFQ logics. The second reason lies in the fabrication process. The older design usually use the 3 layer Niobium process with the ground plane, while in the CORE design, Advanced process is used which have 7 Niobium layers and therefore, they had multiple layers of wiring on top of each other which is not possible in 3 layer process. Therefore the chips get much smaller and easier for measurement in the ADP process.

Figure 3.7 demonstrates the block diagram of a CORE e processor. The instruction memory is where the program is saved and it feeds the instruction register as the register signals it. The register file has all the data and the inputs that we want to work on. The write back buffer has the job of controlling the flow of the data and it gets the data from ALU and memory block or instruction register and feed it to register file. The data then is stored in the data memory according to the address value that is determined by instruction register.



Figure 3.7 : Block diagram of the processor architecture in the CORE e design [74].

An example for the work done by this group is core e4 processor, a general purpose 8-bit RSQF microprocessor with bit-serial architecture that theoretically can operate at 80GHz [77],[9].

Figure 3.8 shows one of the main blocks in the CORE e4 processor. The register file block as mentioned earlier, has the job of feeding the data to the ALU block. The register file has an extra copy of the register zero. This way it can copy the data to data memory and feed the ALU inputs at the same time and make the circuit faster.

It is noteworthy here to mention the NDROC gates in the superconductor cell library that was not discussed in Chapter 2. The non-destructive register cells or NDRO gates, from flip-flop branch, act as a normal DFF cell but the data would not be destroyed as we read the data by a clock signal. The clock signal would not reset the state of the cell to zero in these circuits.



Figure 3.8: The register file block in the CORE e4 processor [9].

Figure 3.9 demonstrates the 4-bit ALU block with bit-sliced architecture to incorporate in the 32-bit ALU. The final processor is 32 bits. In order to minimize the circuit requirements without losing much speed and computational power, the ALU is broken into 4-bit blocks that are connected in series. However these blocks have a bit-sliced architecture that makes them faster.

All of these designs and the experimental results were achieved in liquid Helium cryostats. However, the decrease in helium resources and the short duration that these cryostats could maintain the superconductivity of the circuit makes it impractical in commercial uses and it is only good for laboratory environment. One of the solutions to make the chips available commercially is the compatibility with the closed cycle cryostats. On the other hand, closed cycle systems have their own limits ranged from higher noise level and vibration to limited cooling power at the target cold head.



Figure 3.9 : 4-bit ALU block with bit-sliced architecture to incorporate in the 32-bit ALU [76].

### 3.2. ALU Architectures

For different applications, different ALU architectures are used. In graphic processor units (GPU), the architecture is optimized for calculation of the sum of products (SOP) in the minimum clock cycle. In field-programmable gate array (FPGA) chips, the ALU consists of a blocks of logic cells, full-adder and DFF with multiplexers. In a central processing unit (CPU), the ALU part consists of collection of multibit blocks interconnected with memory registers to increase the processing speed and parallel processing.

For our design, we wanted to make a general purpose processor. A general purpose processor should be able to perform different tasks that are asked from it with a moderate speed and should give reliable outputs. Therefore, we considered four different architectures to choose from. These architectures are serial, parallel, bit-

sliced and Kogge-Stone. In the following section, we will discuss each of the architectures with their pros and cons.

#### **3.2.1.** Serial

One of the simplest and most used architectures for design of the cascade logic units such as adders and ALU is the serial architecture. In serial or bit-serial architecture, the output data is calculated one bit at the time and the data is propagated inside the circuit in a single wire one bit at the time. Figure 3.10 shows the architecture for a serial ALU. The left part is the single cell block of the serial ALU and the right part is the connection between these blocks to form an N-bit ALU. Here, two bits generate inside each block, the OUT and Carry bits. The OUT bit goes directly to the output while the Carry bit propagates to the next block. The critical path for calculating the latency of this ALU is the Carry bit path.



Figure 3.10: Serial architecture for arithmetic logic unit.

The serial architecture as seen in the figure is really simple, does not have feedback bits or complexity and most importantly since the bits propagate one at the time, for RSFQ logic that has fan-out of one, there is no data bus and the wiring would be simple. The bit-serial architecture is mostly used in massive parallel processing since they have smaller chip area and N serial processors perform better than one N-bit processor.

#### 3.2.2. Parallel

The parallel or bit-parallel architecture is the opposite of a serial processor. In parallel architecture, the bits do not propagate one at a time and all the bits get processed at the same time and the output result will go to the next stage via a data bus. Figure 3.11 shows a simple parallel structure for a 4-bit ALU. In this structure, the data from registers are read at the same time with a clock signal and then they will be processed and propagate through ALU by 4-bit data buses. Then the outputs are selected via a 4-bit multiplexers.



Figure 3.11: Bit-parallel architecture for ALU.

The bit-parallel architecture have the advantage of lower latency and the pipelining of the input registers are easier as the data is read and written at the same time. However this structure would cause bulkier interconnects in RSFQ logics since the fan-out is one and there is a need for clock tree and data buses.

# **3.2.3. Bit Slice**

Bit-slice architecture can be considered a combination of serial and parallel architecture such as Figure 3.12. The bit-slice is implementation of many smaller bit sized processors to make a higher bit processor.



Figure 3.13 : Logic circuits of 74181 CMOS ALU with bit-sliced architecture [78].

Figure 3.13 shows the inside of 74181 arithmetic logic unit that was the main part of many processors in late 70's and 80's. It was a 4-bit ALU that was used inside many processors with 8 16 and even 32 bit from Intel and Texas Instrument. The bit-slice architecture was used due to its smaller size and interconnects compared to bit-parallel and better latency and higher speed and easier pipelining compared to bit-serial. However it was died out in CMOS technology due to advances in fabrication process.

Nowadays, the bit-sliced architecture is mostly used in Quantum computing and superconductor logics. A sample of this architecture used for RSFQ ALU was discussed in this chapter in CORE-e processors [79]. In these processors, a 4-bit ALU is designed that will be used to process a 32-bit data part by part.

### 3.2.4. Kogge Stone

The Kogge-Stone architecture is probably one of the most complex architectures that can be used in RSFQ circuits. In a serial architecture, for an N-bit adder, we need N stages. Each stage would generate one bit of the output and one bit of the carry to propagate to the next stage. In Kogge-Stone architecture, for an N-bit adder we only need Log2(N)+1 stages. For example, for a 32-bit adder there should be 32 stages for serial architecture and only six stages for Kogge-Stone.

Figure 3.14 shows the tree diagram for an 8-bit Kogge-Stone adder that was designed in our laboratory. The basic elements for Kogge-Stone architecture are the gray cells and black cells. These cells are seen in the figure with their assigned color. In Kogge-Stone, each stage does not generate an output bit and propagates a carry bit like bit-serial. The outputs are generated by the gray cells.

The Kogge-Stone architecture is based on the generation of a multiple outputs from each stage and then these outputs would propagate to the cells from the next stage. The next stage or level that consists of different gray cells and black cells also get these outputs and process them for the next level until the last level is reached. The number of levels are much less than the serial architecture, especially in higher bits but the interconnects are much more complicated in comparison with other discussed structures.



Figure 3.14: Tree diagram of a Kogge-Stone adder for the bit routes [80].

Although the Kogge-Stone architecture does not need high fan-out like Brent-Kung architecture, but it still needs much higher fan-out compared to bit-serial. Therefore, it is not ideal for RSFQ design, since the high number of fan-outs and the clock tree for each gray and black cell make the circuits very bulky and therefore the bias value and power consumption for the chips would be high. However, the Kogge-Stone adder has a very good latency especially at higher bits and therefore it is used in CMOS architectures. Figure 3.15 shows an 8-bit Kogge-Stone adder designed in our laboratory. As seen in the figure, the adder is very bulky and the bias value is about 800mA.

In this Kogge-Stone adder structure, the gray cells were designed by the standard library cells and therefore, the cells were a bit bulky and the bias of the cells were high. However, the main problem was not the size of the gray cells and as these cells were replaced with compound gray cells, the bias value did not change that much. The main issue was rised from the complicated clock tree and the interconnects between the gray cells. The clock tree and interconects not only distribute the signals, but also have to adust the delays between the cells.



Figure 3.15 : 8-bit Kogge-Stone adder designed at TOBB ETU and fabricated by STP2 process [80].

### 4. DEVELOPED ARITHMETIC LOGIC UNIT SUB-CIRCUITS

After choosing the architecture of the ALU, we had to design all the parts of the circuit and then connect these parts to each other with right timing in the chip to form a functional circuit. However, a complex circuit like ALU can have up to 10000 junctions and if there is an error in the function of the circuit, the troubleshooting and determining the error point would be impossible. Therefore, we decided to first design, simulate and fabricate each part separately and then by testing these parts and obtaining the bias margins, we could debug the circuits easier.

First we designed all the logic gates for the 4-bit ALU with their clock tree and delay lines. For the arithmetic part, we decided to have an adder, multiplier and subtractor. We have already designed and demonstrate the adder for Kogge-stone architecture [80]. This adder had a very good latency but since the Kogge-stone architecture is complicated, the cell's size was big for our purpose.

### 4.1. Logic Unit

In our ALU we have four gates AND, OR, XOR and NOT. These gates were designed and fabricated separately in a four bit structure. Figure 4.1, Figure 4.2, Figure 4.3 and Figure 4.4 demonstrate the schematic and layout of the 4-bit digital gates designed in Cadence Virtuoso. As seen in this figures, for test purposes, at inputs we have DC/SFQ and at the outputs we have SFQ/DC gates for the measurements. These circuits were fabricated in STP2 standard process in AIST CRAVITY foundry. Then the circuits were tested in our cryocooler system. The test setup is discussed in details at chapter 5. Figure 4.5 shows the fabricated circuits as seen under optical microscope. Each a, b, c and d parts show the 4-bit logic structures and the size of the cells without considering the SFQ/DC and DC/SFQ converters are about 200μm in width.



Figure 4.1 : 4-bit JAND logic gates with clock tree.



Figure 4.2: 4-bit JOR logic gates with clock tree.



Figure 4.3 : 4-bit JXOR logic gates with clock tree.



Figure 4.4: 4-bit JNOT logic gates with clock tree.



Figure 4.5: Fabricated logic cells with their clock trees. a) JOR, b) JAND, c) JNOT, d) JXOR.

To confirm the function of the circuits, we have tested the circuits and observed the output of the SFQ/DC converters. Figure : 4.6 is the plot for the output results of one of the JOR gates. As we can see in the plot, the output occurs at the rising edge of the clock pulse. Each change in the state at the output is an SFQ pulse and is true logic. The margins for the bias of the circuits then was measured.



Figure : 4.6 Reported JOR gate output waveform.

### 4.2. Adder

At first, the Adder circuit was designed with Kogge-stone architecture. As we stated before, the Kogge-stone is faster since it has less stages. However, in RSFQ logics, the fan-out of one and the clock tree would make this architecture very complex and the circuit would be big. Therefore the bias requirement for Kogge-stone architecture is higher than conventional designs. Figure 4.7 and Figure 4.8 shows the designed schematic and layout of a Kogge-stone adder in Cadence Virtuoso.



Figure 4.7 Schematic of a 4-bit Kogge-stone architecture adder.



Figure 4.8 : Layout of a 4-bit Kogge-stone architecture adder.

Since the Kogge-stone adder was large for our design of ALU, we decided to change the architecture to a serial design. Figure 4.9 shows the block diagram and fabricated circuit of the adder circuit. In this architecture, we used T1 flip-flop cells as a full adder. By merging the inputs and carry and feed them to the input of a T1 cell between the clock pulses, we can see the cell as a full adder with sum and carry out. The sum is then extracted as one of the outputs and the carry will propagate to the next T1 cell as carry-in bit.



Figure 4.9 : Block diagram and layout of a 4-bit carry look ahead adder designed for the ALU structure.

This method is not as fast as Kogge-Stone, especially in higher bits. However in RSFQ circuits, because of minimal connections between the cells, the serial structure would have smaller circuit size and therefore smaller bias requirements. By comparing Figure 4.8 and Figure 4.9, we can see this difference. Figure 4.10 shows the Verilog simulation result for some of the inputs to the adder circuit.



Figure 4.10: Verilog simulation result of adder stage.

# 4.3. Multiplier

The multiplier circuit is one of the most complicated circuits designed in this work. Conventional ALUs don't include multiplier circuit in their architecture. However we put a multiplier cell in our architecture so the ALU could be used for specific purposes such as signal processing and digital filtering. Figure 4.11 the schematic and layout of a 2 by 2 multiplier circuit.



Figure 4.11: Two bit input multiplier circuit.

Figure 4.11 shows the a 2 by 2 multiplier circuit with 4 output bits and a clock out signal for checking the circuit. As seen in the schematic, the JAND gates will do the bitwise multiplication. After the AND gates, we use T1 cells as the full adders to add the results of the bit wise multiplication. By applying right delay time, the circuit could work easily with one clock cycle.

Figure 4.12 is the photo of the fabricated 2 by 2-bit multiplier, fabricated at AIST CRAVITY foundry by standard process, STP2. The total size of the test circuit is about 850  $\mu$ m. By removing the DC/SFQ and SFQ/DC cells at the inputs and outputs, the circuit size drops to about 500  $\mu$ m. This cell consume about 80mA of bias currrent and can be used up to 25GHz clock frequency.



Figure 4.12: Fabricated circuit for a 2-bit multiplier circuit. The size of the circuit is about 500um witout considering the DC/SFQ and SFQ/DC cells.



Figure 4.13 : Schematic of the designed 4-bit multiplier cell for using in parallel ALU.

Figure 4.13 shows the schematic of a 4 by 4-bit multiplier with 4 bit output and Figure 4.14 shows the Layout. The cell calculates 4 least significant bits from multiplication of the inputs. The algorithm for multiplication is same as the 2-bit multiplier that was discussed before. Since this circuit is much bigger than the 2-bit version, we have to delay the clock in the clock tree so there would be time for the

full adders to calculate the result. There are two approaches for delaying SFQ pulses in RSFQ circuits. First is by using the JTL lines in the way of the pulse. Each JTL depending on the fabrication process and cell design adds some Pico-seconds of delay to the pulse. However, this method is effective in small delays. If we overuse JTL in the path of signal, the jitter could cause stability issues and the bias value of the circuit could get very big. Other method is to use flip-flop buffers such as DFF in the path of the signal to make it wait. We use the later in the design of the multiplier circuit.



Figure 4.14: Layout of the designed 4-bit multiplier cell for using in parallel ALU.

Figure 4.14 shows the Layout of the designed 4-bit multiplier circuit designed in cadence virtuoso using standard library cells. Since the circuit is large with too much wiring cells, it needs many bias lines connected to it. In big cells we also need to place enough Moat cells around the main circuit. The purposes of the moat cells are to trap the magnetic flux caused by big current from bias lines. For big circuits, it is better to place extra layers of moat surrounding the circuit.

Figure 4.15 shows the fabricated circuit in AIST CRAVITY with standard process STP2. As we can see in the picture, there are total of 8 main bias lines to the circuit and two lines for SFQ/DC and DC/SFQ cells. The circuit has total bias of 320mA at 2.5mV and the total power of 0.8mW at 25GHz. The size of the circuit as seen in the picture is 1.3×1.3 mm<sup>2</sup> and have a total of 2000 Josephson junctions.



Figure 4.15: Fabrication result of the designed 4-bit multiplier cell for using in parallel ALU. The circuit is fabricated with standard process STP2.

Figure 4.16 shows the result that we acquired from measuring the multiplier circuit in our closed cycle cryocooler system. The figure shows the result for  $1\times3$  operation

and as we see the in the output, the state changes in output 1 and 2 indicates that the result is 3 as expected.



Figure 4.16: Result of the measurements made on the multiplier circuit. The figure shows the result for  $1\times3$  operation and as we see the output is 3 as well.

Figure 4.17 and Figure 4.18 shows the other result from the measurements made on the multiplier cell. These results confirm the function of the circuit. However, the bias margins could not be measured because of the problem caused by surface heating due to the high current bias values. This problem is explained in details in Chapter 5. We tackled the high bias problem of the circuits by changing the packaging of the chips and making some considerations in the wirings. However, the measurement results are from before the changes hence the margins are very small in the results.



Figure 4.17 : The result for  $2\times 2$  operation and as we see the output is 4 as well.



Figure 4.18 : The result for  $3\times 5$  operation and as we see the output is 15 as well.

### 4.4. Multiplexer

The multiplexer circuit is an essential part of each ALU design. A multiplexer is a circuit that is used to transfer data from one of its input lines to the output port. If a multiplexer has 2<sup>n</sup> inputs then it needs n select bit. In ALU multiplexers are used to transfer the data from outputs of the logics or arithmetic circuits to the output register depending on the select bits. We have designed several different structures for the multiplexer. Figure 4.19 shows the layout and schematic of 2 to 1 multiplexer circuit using toggle flip-flop and not gate.



Figure 4.19: Layout and schematic of 2 to 1 multiplexer circuit using toggle flip-flop and not gate.

In this structure, the inputs will enter the input of a JNOT gate. The clocks of the JNOT gates are connected to the outputs of a toggle flip-flop cell. As the select signal is applied to the flip-flop, if there is one pulse in the select, the flip-flop would clock one of the JNOT gates and depending on the input, there would be a pulse at

the output of the gate. In case of two pulses of the select signal, the other JNOT gate would be clocked and the output would generate.

The problem with this structure is that the JNOT gate needs an input pulse to be activated at first so some of the data will go to waste. On the other hand, the toggle flip-flop cannot be reset to the zero state and the state will remain in the next operation and can cause error in the output. Figure 4.20 shows the layout and schematic of 4-bit 2 to 1 multiplexer circuit using only toggle flip-flops and mergers.



Figure 4.20 : Layout and schematic of 4-bit 2 to 1 multiplexer circuit using only toggle flip-flops and D-type flip-flop.

In this structure, the JNOT gates are omitted and the input is buffered via a DFF circuit. The clocks of the DFF gates are connected to the outputs of a toggle flip-flop cell. Same as the last structure, the select signal would be applied to TFF cell and depending on the number of pulses, one of the DFFs would generate signal for the output.

The same problem with TFF is also applicable in this structure. And also the state of the flip-flops is not clear at the start of the circuit, so first sets of data would not be correct. The other problem is that the DFF needs to be reset after each cycle and if two pulses enter the DFF without clock pulse, the cell would malfunction. Therefore this architecture is not effective at all. Figure 4.21 shows the layout and schematic of 4-bit 2 to 1 multiplexer circuit using T1 flip-flops and JAND gates. In this architecture, there is no DFF and instead, JAND gates are used. The output of the T1 cell is not connected to the clock but the input of JAND cells. In this design, the

select signal is not acting as the clock of the cells and the clock signal is applied from the main circuit which makes it easier to synchronize with other parts. The select signal is applied to the data input of the T1 cell.



Figure 4.21: Layout and schematic of 4-bit 2 to 1 multiplexer circuit using T1 flip-flops and JAND gates.

When the clock pulse comes, if the select is one, or rather there is an SFQ pulse at select, carry output would generate and if the select is zero, the sum output would generate. Each of these outputs are then connected to the input A of two the JAND

cells. The input B is connected to the input registers. Since the and is operated on the input registers and output of T1 cell, depending on the state of T1, only one JAND cell would have the output same as the input and the other one would give zero.

Figure 4.22 shows the fabricated circuit of 4-bit 2 to 1 multiplexer circuit using toggle flip-flops as it is fabricated in AIST CRAVITY with standard process STP2. As seen on the figure, the MUX is rather a compact circuit and does not demand much bias current so the margins for the bias was expected to be similar to the design values.



Figure 4.22: Fabricated circuit of 4-bit 2 to 1 multiplexer circuit using toggle flip-flops. The circuit is fabricated in AIST CRAVITY with standard process STP2.

Figure 4.23 shows the experimental result of a single cell from 2 to 1 4-bit multiplexer. As stated before, since there is no reset option in toggle flip-flop, the state of the flip-flop is unknown to us at the start of the cycle and can result in error.



Figure 4.23: waveform of inputs and outpu experimental result of a single cell from 2 to 1 4-bit multiplexer.

### 4.5. Passive Transmission Lines (PTLs)

As mentioned in chapter 2 the main drawbacks of the PTLs are that they consume much space especially in a 4 layer process such as AIST Standard Process (STP2) [66]. Since the change in the stripline width alters its impedance, narrower PTLs need new design for their receiver and transmitter circuits. Due to the limitations of the microwave simulation tools, it is not possible to completely model the JJ based circuits such as driver and receiver together with the PTLs.

Many efforts have been put into design of a stable transmission line in previous works [81]–[83]. Unfortunately, there is no report on a systematic approach for the design of PTLs together with their receiver and driver circuits. These PTLs' widths are mostly close to the unit cell sizes of the library and it is not practical to pass more than one PTL from an area that would be occupied by a unit cell. By reducing the widths of the lines, there could be multiple signal paths in a smaller area hence the size of the chips and the delay of the logic blocks could be reduced substantially.

This is especially important for a 4-layer process as PTLs and active circuits share the same layers altogether.

We have designed and simulated various narrower PTLs. The target width of the PTLs was decided as 5  $\mu$ m, 7  $\mu$ m, 10  $\mu$ m, and 20  $\mu$ m. Because of the superconducting nature of the striplines and the high frequency of the SFQ pulses, many unknown parameters have emerged in the modeling of the lines that made analytical approach impractical. Therefore, the finite element method was used to extract the stripline model parameters in STP2 process using SONNET software, a microwave simulation tool [84]. Using the extracted parameters and the resistively and capacitively shunted Josephson junction (RCSJ) model for the Josephson junction, we matched the impedance of the line to the driver and receiver by shifting the parameters by using numerical calculation software. The best matching parameters were selected and the lines and their driver and receiver circuits were designed and fabricated in AIST STP2 process. These designs match in impedance with CONNECT cells [63].

The mechanism of the pulse propagation on the strip-line has been studied. This shows us that we should characterize the superconducting line at about 165 GHz for SFQ pulse propagation. At this frequency the SFQ pulse has the highest energy. In order to simulate the superconducting stripline in the SONNET environment, we defined it as a conductor with the complex conductivity with  $\sigma_1$ =1.0642×10<sup>7</sup> $\mu$ s<sup>-1</sup> and  $\sigma_2$ =9.1286×10<sup>8</sup> $\mu$ s<sup>-1</sup> where  $\sigma_1$  and  $\sigma_2$  are the real and imaginary parts of the complex conductivity model as reported in [85], [86]. On the other hand, it is possible to make the superconductor model simulation by implementing S-parameters and adding surface thickness for penetration depth modeling [87]. However, our model was accurate enough compared to theoretical calculations to obtain the estimated impedance that was needed for the design of the matching parameters for the driver circuit. Table 4.1 presents the parameters extracted for the  $\pi$ -model of the superconductor striplines in the AIST standard process for 20  $\mu$ m long PTLs.

By using the SONNET software, the standard process layer definitions [66] was designed at the software environment. The striplines with different widths are simulated at the target frequency and then the  $\pi$ -model parameters for the lines were extracted for a 20  $\mu$ m long PTL. Figure 4.24 shows the model for the line of the

width of  $20\mu m$ . Using these parameters we built the ladder model of the strip-line of the length 500  $\mu m$  by using 25 segments of the  $\pi$ -model.



Figure 4.24 : Ladder  $\pi$ -model for the strip-line in standard process. For 20  $\mu$ m PTL, the values are: L $\pi$ =0.25pH and C $\pi$ =0.037pF.

Table 4.1 :  $\pi$ -model parameters extracted for the striplines with no sky-plane.

| Stripline width | $C_{\pi}$ (pF) | $L_{\pi}$ (pH) |
|-----------------|----------------|----------------|
| 20μm            | 0.037          | 0.25           |
| 10 μm           | 0.02           | 0.49           |
| 7 μm            | 0.016          | 0.69           |
| 5 μm            | 0.012          | 0.94           |

The driver and receiver circuits were also modeled in a numerical simulating program using the RCSJ model for the Josephson junctions used. In chapter 2 we showed the model for the driver and receiver circuits matched for unshielded 20  $\mu$ m transmission line. Figure 4.25 displays the block diagram of the test configuration used for each PTL. The driver and receiver cells were designed in a way that they also match the cells of the CONNECT library.



Figure 4.25 : Block diagram of the test setup for designed cells. Figure numbers shows the designed parts.

For designing the driver circuit, we used the RCSJ circuit model of the Josephson Junctions at the output junctions as shown in equation 4.1 [88].

$$G_{a} = C_{j}\omega j - \frac{2\pi I_{C} j}{\omega \Phi_{0}} + \frac{1}{R_{D}}$$

$$R_{O} = \frac{1}{G_{a}} + R_{S} + L_{S}\omega j$$

$$(4.1)$$

Where,  $G_a$  is the admittance of the Josephson junction from the RCSJ model,  $\omega$  is the characteristic angular frequency of the SFQ pulse for which the pulse has the maximum energy,  $C_j$  is the junction capacitance,  $I_c$  is the critical current of junction and  $R_D$  is the shunt resistor of the junction. In all the calculations  $\omega$  is considered as  $1.036 \times 10^{12}$  rad/s [89], [90].  $R_O$  shows the output impedance of the driver cell that was used to match the PTL line,  $R_s$  and  $L_s$  are the output resistance and inductance that are used for impedance matching with transmission line.

The receiver circuit consists of a serial inductance and resistance at the input with cascaded with a JJ-L-JJ loop. The loop inductance and Josephson junctions' critical current values could not be changed since it determines the matching characteristic with other cells in the library.

The input impedance for the receiver circuit is calculated in equation 4.2. The effects of junctions are determined by  $G_{a1}$  and  $G_{a2}$  parameters. Nevertheless these parameters have a small effect on the input impedance as the input impedance of the receiver cell is mainly determined by the input inductor,  $L_{IN}$ , and input resistor,  $R_{IN}$ .

$$G_{1} = \frac{1}{1/G_{a1} + L_{P1}\omega j}$$

$$G_{2} = \frac{1}{1/G_{a2} + L_{Loop}\omega j}$$

$$R_{INrec} = \frac{1}{G_{1} + G_{2}} + R_{IN} + L_{IN}\omega j$$
(4.2)

The resistors at the driver's output  $(R_s)$  and receiver's input  $(R_{IN})$  cause negligible drop in the bias margins as long as they are smaller than the transmission lines' impedance [82].

After the design, 8 different striplines were sent for the fabrication. The line widths were 5  $\mu$ m, 7  $\mu$ m, 10  $\mu$ m, and 20  $\mu$ m with and without sky plane. The length of the lines was 500  $\mu$ m and at the input and output of the driver and receiver circuits two JTLs were placed for the pulse shaping of the DC/SFQ and SFQ/DC converters. The fabrication was done by using AIST STP2 process which has four layers of Niobium. Fabricated circuits'  $J_c$  was shifted about 17%. This global shift caused a rise in the

junctions' critical currents and slight change in the values of inductances. The photo of the fabricated receiver and driver circuits are shown in Figure 4.26 and Figure 4.27.



Figure 4.26: The receiver circuits with the JTLs and the SFQ/DC converters.



Figure 4.27: The driver circuits with respective JTLs and the DC/SFQ converters.

The chips were cryo-packaged for 4K pulse-tube cryocooler system [91]. We applied the input as a series of 80,000 pulses and measured output results for different lines

by sweeping the biases for receiver or the driver circuits independently while the other biases were set to nominal value in low frequency. Figure 4.28 shows the results for the non-shielded 20  $\mu$ m stripline.

After obtaining the output data by sweeping the bias values separately for driver and receiver cells, the bit error rates out of 80,000 pulses for the PTLs in different widths was calculated. In order to calculate the BER, we checked for the number of incoming pulses and arrival time of each pulse in relation to the input signal as shown in Figure 4.28.



Figure 4.28 : Input/output of the PTL line of  $20\mu m$  width at 4.2 K. The expected output signal is generated externally to compare with output of PTL.

The BER then calculated by comparing the collected data with the expected output signal. The obtained graphs show us the reliable margin for the cells to function. By decreasing the width, the driver circuit needs to surge higher current to PTLs in order for the pulse to pass through. Therefore, the margin gets more vulnerable to changes in the junction's critical current and the narrower lines (5  $\mu$ m and 7  $\mu$ m) did not function correctly. We associate this to the high  $J_c$  spread of fabrication. The driver and receiver of 20  $\mu$ m PTL was also affected by the global shift of parameters and the receiver's bias margin dropped to  $\pm 10\%$  instead of bigger than  $\pm 30\%$  of expected

designed value. In order to measure the BER of each cell the other bias currents were set to the design values. Figure 4.29 shows the BER graph for 20  $\mu$ m stripline at 4.2 K.



Figure 4.29: Bit error rate (BER) measurement vs bias.

Figure 4.29 shows the BER versus bias of the receiver (a) and driver (b) circuits at 4.2 K for 20 $\mu$ m stripline. During the measurement of the Driver (receiver) cell, receiver (driver) cell bias is set to nominal value. By applying numerical methods we managed to find the optimum parameters for the best matching point between the stripline and the driver and receiver circuits. Using this approach, we managed to reduce the width of the PTLs by keeping about 10% bias margins for 20  $\mu$ m and 5% for 10  $\mu$ m width PTLs despite the global shift of the parameters in fabrication.

# 4.6. AQFP Cells

In order to investigate the properties of this technology, we have designed and fabricated some of the cells. Figure 4.30 shows the basic cell in AQFP which is the buffer gate. The buffer would act as a router and reform the shape of the signal. It would be at any input and output stage of an AQFP circuit.



Figure 4.30: The buffer gate schematic and layout designed in Cadence Virtuoso software. The coupling of the inductances is not shown in the picture.

The circuits were simulated with JSIM program [92]. JSIM is a SPICE based code that solves the Josephson junction equations for each superconductor circuit. For making a simulation in JSIM, first we generate a netlist with the elements and all the connecting nodes. Then JSIM solves the circuit numerically in the time steps specified by us. The output file gives us the values of the parameters that we

determined in the netlist file. The graphs are generated using MATLAB software. Figure 4.31 shows the simulation results for the buffer gate. The values of the junctions and inductances were then optimized by MATLAB software to get a good margin.



Figure 4.31: JSIM simulation results for the buffer gate.

The circuits then were fabricated with standard process. The fabrication process is disscused later in this chapter. Figure 4.32 shows the fabricated circuits. Figure 4.32a shows the circuit as there is no magnetic shield over the circuit and Figure 4.32b shows the circuit with magnetic shield. The magnetic shield is a superconductor layer that is shorted to the ground and would not allow the flux to penetrate the circuit.

As seen in figure 4.31, the states in AQFP logic circuits are much different than RSFQ. In AQFP, a negative signal equals to zero state and a positive signal determines a one state. The pulse amplitude in AQFP depend on the fabrication process. In standard process, the pulse level is about  $5\mu A$ .



Figure 4.32 : Fabricated buffer circuit. a) without shield. b) with superconductor shield.

The other gate in the AQFP technology that has multiple uses is the Majority gate. The majority gate acts as a poling gate meaning the output would be the majority of the inputs. Therefore, we should have odd number of inputs in this gate. We have designed a three input majority gate as seen in Figure 4.33.



Figure 4.33: Majority gate in AQFP technology, left is the schematic and right is the layout. The coupling of the inductances is not shown in the picture.

The input stages act as the buffer gate and they put a flux in each loop depending on the input current. The output inductances are then coupled to a superconductor loop and give the input for output buffer gate. The coupling current would be in the polarity of the dominant logic and hence the output would be the dominant bit of all the inputs. Figure 4.34 shows the output result from majority gate from JSIM simulation. As seen in this figure, the output is the same as the dominant signal but it is inverted due to not buffer at the output.

Some complicated circuits like a full adder was also designed using majority and buffer gates in this technology but was never measured. The fabrication error caused by the leakage between the superconductor base layer and the ground plane cause the circuits to malfunction and therefore we could not get any results from experimental measurements. The last section of this chapter would discuss more on the problem in the fabrication process.



Figure 4.34: The output result of the majority gate as we apply two logics at same value. The output is inverted.



Figure 4.35: The a) not shielded and b)shielded majority gate fabricated by standard process.

#### 4.7. Interface Circuits

We will mention the duties of the interface circuits in the architecture of the final design in detail at chapter 6. The main goal of these circuits is to maintain the speed of RSFQ circuits while interfacing with CMOS circuits without losing any data as they communicate.

The registers would provide a passing stage from CMOS low frequency clock to RSFQ high frequency clock simply by getting more data lines from CMOS circuits in parallel and feed them to the RSFQ circuits in serial mode. There should also be handshake signals for both RSFQ and CMOS parts in order to make the connection without data loss. These stages need to have a memory to act as a data buffer to prevent miss communication. The handshake signals that we chose for this design are simply the clock signals of RSFQ and CMOS circuits.

## 4.7.1. Input register stage

As was stated before in Chapter 1, the RSFQ circuits lack robust compact memory to this day. Therefore, to store data and recover it, we need to rely on CMOS memories. The CMOS circuit has much lower clock frequency compared to RSFQ logics. Input register circuit will convert high data lines from CMOS circuit to serial data line for RSFQ circuit. In our design, the stage would convert 16 bit input to 4 bit output. This way if the CMOS circuit works at 6GHz frequency, the RSFQ circuit can function at 24GHz with no problem. Figure 4.36 shows the schematic and layout of a simple input stage.



Figure 4.36: Layout and schematic of 4-bit 4 to 1 Input register circuit.

The data will come to the input stage and will be loaded to DFF cells at the input with CMOS clock. After that, the data will go to the output DFF cells and will wait there. The clock line for output DFF are connected to TFF cells' outputs. When

RSFQ clock goes to TFF cells, they will change state and their output will change by order and therefore, the DFF cells will get unloaded to the output line. The final results are merged and this way, 4 bit input is converted to one serial line of output without losing any data.

Figure 4.37 demonstrates the fabrication result for the input register stage. As seen in the image for a four times clock frequency conversion and a four bit input register, the size of the cell is about  $1400\mu m$ . For two input registers and one instruction set, about 3.8mm space is needed is Standard process. The bias value needed for the cell without considering the SFQ/DC and DC/SFQ cells, were about 140 mA at 2.5mV.



Figure 4.37: Fabricated circuit of 4-bit 4 to 1 Input register circuit. The circuit is fabricated in AIST CRAVITY with standard process STP2.

Figure 4.38 shows the inputs and output experimental results' waveform of a single cell from 4 to 1 4-bit input stage. As seen in the picture, the inputs are loaded in the cell with a single clock pulse that is not shown here and then with a clock four times the frequency, the inputs are converted to a serial line at the output. The first and fourth input pulses are seen at the output at the first and fourth clock pulse.



Figure 4.38: waveform of inputs and output experimental result of a single cell from 4 to 1 4-bit input stage.

## 4.7.2. Output register stage

The reverse structure of input stage is the output register stage. The output register stage acts as the serial to parallel conversion to match the high frequency outputs of the RSFQ circuit to low frequency inputs of the CMOS logics and memory.

Figure 4.39 shows the schematic and layout of 4-bit 1 to 3 outputs register circuit designed in Cadence Virtuoso software. The Idea behind this design is the same as input register stage, with the data paths reversed. In this structure, the TFF cells are clocked with RSFQ clock to unload the data from output of the RSFQ circuit to the input of the EDFF cells. The EDFF cells are used instead of DFF since the inputs are distributed with splitter cells that could cause DFF to malfunction.

Figure 4.40 shows the fabricated circuit of 4-bit 1 to 4 outputs registers circuit. The circuit is fabricated in AIST CRAVITY with standard process STP2. The circuit size is about 1.8mm for a four bit output. The bias is higher than the input stage and is about 210mA at 2.5mV.



Figure 4.39: Layout and schematic of 4-bit 1 to 3 outputs register circuit.

It is noteworthy to mention that the difference between the size of the cells seen between the layout in figure 4.39 and the fabricated circuit in figure 4.40 comes from the SFQ/DC cells that are not shown in figure 4.39. These converters are added for the stand alone tests and are not needed in the final design for the complete coprocessor circuit.



Figure 4.40: Fabricated circuit of 4-bit 1 to 4 output register circuit. The circuit is fabricated with standard process STP2.

#### 5. IMPLEMENTATION OF TEST SETUP

The use of a cryogenic environment for the superconductor applications is inevitable. Due to the limited resources for Helium and its rising price, as well as practical reasons, the use of the liquid Helium is favorable at research and scientific applications but it is not feasible for commercial applications. Due to advances in cryogenics technology, coolers get more efficient and smaller. Especially the one stage coolers that are used for high temperature superconductor devices have been improved over years [93], [94]. Unfortunately, one stage coolers only could get to ~35K and low temperature superconductor circuits need at least two stages to reach to 4.2K [91]. In addition, use of closed cycle cryocoolers enables the user to utilize intermediate temperature levels for mounting auxiliary components such as interface circuitry or filters.

There are different classes for commercial multi-stage coolers, based on Gifford-McMahon (GM) and Stirling-type pulse tube. There are many disadvantages to the moving parts inside the GM and Stirling cryocoolers. The displacer could cause vibration, reducing lifetime of cooler and may also cause heat conduction in its moving axis and also heat loss due to friction. This part is removed from the Pulse Tube type crycoolers [95].

Successful operation of SFQ circuits in closed cycle refrigeration systems have already been demonstrated by groups from USA, Japan and Europe. For instance, Hashimoto et al. used a 4K two stage 1-W Gifford-McMahon cryocooler in order to demonstrate the function of SFQ 2×2 switch chip at 40GHz [96]. R.J. Webber et al. used a Gifford-McMahon cryocooler for a DC voltage reference from Josephson junctions at the frequency of 20GHz for fully automatic voltage standard system [97], they also demonstrate the function of SFQ receiver with about 11000 junctions for 7.7GHz satellite receiver [98], [99]. There is also a review on 4 –stage 4 K Stirling-type pulse-tube cooler by Lockheed-Martin from this group at HYPRES that demonstrate the stability of cryocoolers for SFQ circuits and the advantage of

multistage systems for mounting auxiliary components and HTS filters at different temperatures [63], [97]. Yoshikawa et al. also show the functionality of GM type cryocooler with high-throughput and high bandwidth SFQ circuits. They demonstrate a flip-flop circuit with very good BER at 10GHz frequency [100]. Ortlepp et al. shows the cryocooler capabilities for precise measurements [101]. For more details see references [63], [96]–[99], [101], [102].

In the following section the cryocooler's structure will be discussed and the wiring scheme to cold head and the casket for the chip carriers will be demonstrated. We also mention the electronics that are used in low and high frequency tests. The connections to the room temperature electronics are also discussed. Shielding conditions for low noise measurements are mentioned. In following section the heat transfer from chip to the cold head, power limits of the cryocooler and our efforts to improve it are explained. Finally, the efforts to automate the cooler and the cooler's operation results and stability are explained.

Section 3 discusses the measurement results of the un-shunted Josephson Junction, series DC-SQUIDs and a logical AND gate, one of the basic cells from CONNECT library [63], in the cryocooler setup. These are basic RSFQ logic elements that are used to make more complex circuits.

#### **5.1.** System Integration

### 5.1.1. Cryocooler

For superconductor logic application we needed a low noise environment with sufficient cooling power at 4K temperature. We chose the pulse tube cooler that has one order of magnitude smaller vibration than GM type and no moving metal parts. Therefore it has longer maintenance-free operation period and lower vibrating noise. With 500mW cooling power at 4.2K, the Sumitomo RP-062B 4K pulse-tube Cryocooler with F-50 compressor from Sumitomo seemed fit for our purpose. It provides sufficient cooling power even after all the cabling, damping and additional shielding.

In order to further reduce the vibration of the chip holder, mechanical damping braids were installed at the first and second stages as shown in Figure 5.1. The braids are high purity deoxygenated coppers. The earth magnetic field can cause magnetic

noise if the cooler has vibrating parts. These damping braids have reduced the vibration of the 4K stage one order of magnitude, from  $40\mu m$  to less than  $4\mu m$ , but also reduced the second stage cooling power from 500mW to 250mW.

Figure 5.1 shows the block diagram of the cryocooler. In this figure, the red plate shows the steel plate that is connected to the outside and the O-ring is placed on it for vacuuming. The orange plate is demonstration of the first stage of the cooler. This stage is connected to the compressor head via copper braids to reduce the vibration. There is an aluminum cover that is screwed to this stage which has radiation shield on it to separate inside of the first stage from the outer part. The purple area shown on the compressor head is the intermediate stage. This stage could reach to 8K and is used for connecting of the wires before going to second stage to reduce the heating load. The blue area is the second stage of the cooler which could reach to 3.7K in temperature. This stage is where the chip is placed and the magnetic shield is also thermally connected to this stage.

Figure 5.2 demonstrates the temperature oscillation of our system at 4.2K. As seen in Figure 5.2, temperature fluctuation on the cold head is about 10mK over 20 hour period. More details on the use of cryocoolers in superconducting circuitry are available at [101], [99].

In Figure 5.2 the top graph shows the temperature changes in first stage of the cryocooler. This stage is shown by red plate in Figure 5.1. The temperature oscillation in first stage is not an issue for us since this stage would not hold any RSFQ sensitive circuitry. The oscillation in this stage is caused by a huge thermal load that is placed on it.

In Figure 5.2 the bottom graph shows the temperature changes in second stage of the cryocooler. This stage is shown by blue plate in Figure 5.1. The temperature oscillation in second stage is small as seen in the graph. This stage holds the RSFQ circuits and therefore the PID parameters of the temperature control should be fine-tuned to minimize any overshoot or unwanted changes.



Figure 5.1: Schematic of pulse-tube cryocooler



Figure 5.2 : Temperature oscillation in the a) First stage, b) Second stage of the system under load.

In order to control the system, we used the LabVIEW program. The settings for the temperature controller and vacuum level are all configurable in the graphic interface. The PID parameters for the heater are controlled by program. The PID would manage the speed of heating and overshoot of the temperature at the cold head. The vacuum controller also assures molecular pump's safety by controlling the back pump level before starting the process and shutting down molecular pump in emergency situations. Figure 5.3 shows the program interface and temperature levels for the first and second stage. The first stage sensor is deployed on the 55K plate (Orange plate in Figure 5.1) and the second stage sensor is on the 4.2K plate (Open blue plate in Figure 5.1).



Figure 5.3 : LabVIEW program for temperature and vacuum control of the system during measurements.

### 5.1.2. Wiring and connections

Figure 5.4 shows the wiring configuration of the cryocooler between the different stages. The second stage plate has a gold layer to avoid radiation losses and the heaters and Si sensors are placed on the stage plates.



Figure 5.4: The wiring configuration of the cryocooler between stages.

To operate superconductor circuits, there is a need for a low noise environment; hence the wiring to the circuits should be properly shielded. However, having shielded wires between stages significantly raises the thermal power load on the 4K stage. In big circuits the bias currents can easily reach 1 Ampere, which cause a lot of joule heating if the wires have high resistance. Therefore, using longer wires to decrease thermal load could increase Joule heating effect. One should also keep in mind that any kind of magnetic material cause magnetic noise in SFQ circuits. Conventional wiring around thermal loads usually has steel or other iron based

metals and could cause magnetic pollution at the circuit. To overcome these problems, we use phosphor-bronze twisted wires for bias lines. These wires have significantly lower thermal conductance than normal copper lines and has acceptable electrical properties. The bias line resistance between the room temperature and the chip holder was measured as  $2.1\Omega$  that  $0.6\Omega$  was between middle and second stage. By maintaining good thermal connection between the wires and putting most of the thermal load on the first and middle stage, there won't be significant heating at the 4K stage.

The wiring for signal lines need to have a very good RF-shielding while maintaining a high frequency response for testing in high frequencies. For the signal lines, twoshielded coaxial wires were connected from source to first-stage of the cooler using G3PO connectors to save space. From the first stage to the second stage, the shielded coplanar Be-Cu flexible flat cables were used[103]. These wires have a thermal load of 4mW per every 10 signal lines on second stage. The total 40 signal wires cost the system 16mW of thermal power at the second stage. The coplanar Be-Cu signal lines have 2.5 GHz frequency response which limits the maximum signal frequency. From second stage to the chip holder, rigid copper shielded cables were used and then connected to the chip carrier via Be-Cu spring contact pins. The chip-holder is placed inside a three layered µ-metal magnetic shield which is connected to second stage. The magnetic shield reduces low frequency magnetic fields up to 70dB which brings the magnetic noise level down to ~3nT as measured by fluxgate magnetic sensor. In total there are 20 bias lines and 40 signal wires in the system. The thermal radiation from second stage and magnetic shield, and the joule heating from bias lines consume about 65mW in total at second stage, which leaves about 185mW of power at chip holder.

#### **5.1.3.** Electronics

There are two Si-diode temperature sensors, calibrated for our setup and two heaters at first and second stage plate. These accessories are connected to Lakeshore 340 temperature controller which gives the drives heaters and measures the excess power in the system. The vacuum system consists of the rotary back pump and a turbo molecular pump. The temperature controller and the vacuum units are connected to PC via GPIB and serial port. They are monitored and controlled by LabVIEW

program. Figure 5.5 shows the block diagram of the low frequency test setup.

To generate the input signals and clock waveform, pattern generator system with 300MHz bandwidth and Double Data Rate (DDR) capabilities is used that could increase the frequency to 600MHz. The waveform goes through a band-pass filter to achieve compatible RSFQ waveform. DC to SFQ convertor generates a SFQ pulse as the input amplitude reaches 1mA. A precise multichannel current source is used to give the input bias currents for the circuits with low bias values. In case of the high bias level circuits, the Hypres 48-channel current source is used. Some custom lowpass filters with 100 Hz bandwidth were used on the precise current source card. The filters prevent the current output from sharp changes and damaging the circuit. Data acquisition card and/or logic analyzer samples the output signal after they pass through an NF low noise preamplifier with maximum gain of 60dB. At low frequency, the bandwidth of the system is limited by NF preamplifiers at about 1MHz. At high frequency tests, the amplifiers are changed for HYPRES high frequency amplifiers with 2GHz cutoff frequency. All the cards are controlled via LabVIEW program. Figure 5.6 shows the system used for testing the chips at low frequency with precision and higher frequencies for Bit Error Rate (BER) calculations.



Figure 5.5 : Test setup hardware for low noise and low frequency RSFQ circuit measurements.



Figure 5.6: The cryostat and the test setup for low and high frequency tests.

# 5.1.4. Packaging

Even though the RSFQ chips have extremely low power consumptions in orders of few mW, some joule heating occurs due to the bias wires and logic gates' switching. This heat dissipation is mainly generated close to the surface of the chip. In liquid Helium based cryostats, the chip is in direct contact with the liquid Helium which results in a low thermal resistance between the chip surface and the cold medium. Besides, liquid Helium cryostats have much higher cooling power. Therefore, heat removal which is caused by the bias and shunt resistors in the circuits is sufficiently fast and there would be no thermal gradient between the circuits and cold medium. However, some of problems that rise in closed cycle refrigerator systems are not only the cooling power is limited but also it is not easy to benefit from the available power since the thermal contact between the chip surface and cold head is not very good. The chip is connected to the copper holder by silver paste to insure good thermal contact. Since the chip is in the medium of vacuum, the heat generated on the surface of the chip could be removed either via the substrate (in our case is Silicon) or via the wire bonds that provide bias currents or I/O interface to the circuits. As the temperature approaches 4K, carrier density in the Silicon substrate decreases and the heat conductance of the substrate would decrease drastically [104]. Also when the Niobium (Nb) material of the circuits becomes superconductor, the surface heat conductance of the chip would be very low and therefore the heat removal from the wire bonds at the sides of the chip would not be effective enough [105]. Temperature controller keeps the temperature of the 2nd stage at 4.2K but some thermal gradient might occur between the RSFQ circuits and second stage due to the discussed reasons.

Figure 5.7 shows the package setup for the chip used in heat dissipation and removal measurements. As shown in Figure 5.7, the chip is directly in contact with the copper plate which is in contact with the cryostat second stage plate at 4.2K. The parts are:

1) the series SQUID that work as temperature sensor. 2) The chip with circuit on it.

3) Epoxy at the surface of the chip. 4) Copper layer that the chip is connected to. 5) PCB with gold paths on it so we can wire bond the chip to it.



Figure 5.7: Chip packaging of the test circuit used for power measurements.

Such problems are normally negligible for small scale circuits. However, during our experiments that requires about 2A bias current, we realized that we have to overcome the surface heating problem of the chip in vacuum. Otherwise, positive feedback caused by the heat at bias resistors would force the temperature of the circuits to raise much higher levels than the Critical temperature (TC) of Niobium. This could damage the circuit and make the chip unusable.

In order to measure the amount of heat dissipation and heat removal capacity of the

system configuration, we used a sample chip with RSFQ circuits that can withstand 2 Amperes. The bias current was fed to the chip via 8 different pins. We used two different wire bond diameters of 25µm or 75µm to see the effect of the heat removal through the wire bonds. On this chip we also included a series SQUID circuit with 300 DC-SQUIDs to monitor the surface temperature of the chip based on the I-V characteristic of the series SQUID. Figure 5.8 demonstrates the I-V curve of the series SQUID.



Figure 5.8 : Measuring the temperature on chip surface using series SQUID I-V curve.

At top (a) is the case when no bias current is applied and at the bottom (b) there was

about 400mA bias current applied over 8 pins when the second stage of cryocooler was fixed at 4.2 K. The Y-axes of the scope shows the critical current as the current is measured over a  $1k\Omega$  resistor and passed through 10 times amplification. The X-axis depends on the number of series SQUIDs.

In order to measure the chip surface temperature, ideally we need to attach a temperature sensor on the chip surface. However, an external sensor would not be practical. Not only this external sensor would damage the surface of the chip but also because of the thermal resistance between the sensor and the chip surface, the readings would not be accurate. Therefore to measure the surface temperature, we put a series DC-SQUID array on the chip surface as shown in Figure 5.7. A side benefit of this array is to monitor the junction critical current spread and the setup noise level.

As the temperature on the chip surface rises due to bias current, the I-V characteristic of the series SQUIDs would change accordingly. The relation between the critical current of the SQUIDs and the temperature of the chips is described in (5.1).

$$\frac{J_{C}(T)}{J_{C}(0)} = \left[1 - \left(\frac{T}{T_{C}}\right)^{2}\right] \sqrt{1 - \left(\frac{T}{T_{C}}\right)^{4}}$$
 (5.1)

The I-V curve of the series DC-SQUID was measured with a DAQ card and the temperature of the system were calculated based on (5.1)[23]. In Figure 5.8 (a) the critical current of the junctions at 4.2K when there is no applied current is about 100µA. If we consider that the chip is at 4.2K when there is no applied current and the critical current of our Niobium thin films are about 9.1K, the temperature of the chip surface could be calculated based on the percentage drop of the critical current. After applying 400mA of current, the critical current level drops to %38 of the original at 4.2K. This temperature is equal to about 7.2K as seen in Table 5.1.

Figure 5.9 shows the electrical resistances that bias current have to go through to reach the circuit on the chip.  $R_{B1}$ ,  $R_{B2}$ ,  $R_{B3}$  are the portion of bias wire resistance from 300K to 45K, 45K to 8K, and 8K to 4.2K stages respectively.  $R_{contact}$  is contact resistance between the chip holder and spring contacts,  $R_{bond}$ -wire is the resistance of the bond wires from chip to chip holder,  $R_{chip}$ -bias is the effective resistance on the chip due to the bias resistors.  $R_{P-H}$  is the thermal resistance between the bottom of the chip and copper plate,  $R_{C-P}$  is the thermal resistance of the bod wires from chip to

chip holder,  $R_{S-H}$  is the surface the thermal resistance between the chip surface and copper plate.



Figure 5.9 : a) Thermal resistance from the circuit on the chip surface to the environment. b) Electrical resistance for one line of bias current path.

These resistors are the main cause for the Joule heating. The calculations for Joule heating is done based on this model. Since the power at first stage is very high, RB1 is neglected in calculations. We assumed that total resistance is about  $2.17\Omega$  on average. Figure 5.9 shows the thermal resistance that the generated heat would pass through and get removed from the chip. Since the chip is placed in vacuum medium, the thermal resistance between chip surface and the holder (RS-H) is very high and the effect could be neglected for the time being.

Excess power of the 2nd stage is measured by the monitoring the power supplied by the temperature controller heater to keep the stage at 4.2K (as shown in the top left corner of Figure 5.3). The expected excess power is defined as the difference of the total available power at the 2nd stage and the Joule heat calculated using the model

shown in Figure 5.9. As shown in

Table 5.1, when the total applied current increases, the difference between second stage and the chip surface temperature also increases in our sample chip even though there is still remaining cooling power at the 2nd stage at 4.2K. This is not an expected result as the total heat generated through the bias path shown in Figure 5.9 is much lower than the excess power as shown in

Table 5.1 and Figure 5.10. We associate the increase of difference between the measured and expected excess power to the heat dissipation on the surface of the chip itself.

Table 5.1: Power and thermal gradient characteristic of cooler while applying different bias current via 4 wires. Second stage temperature is at 4.2K.

| Current (mA) | Measured<br>Excess power<br>(mW) | Temperature at chip surface (K) | Expected Excess Power (mW) | Power loss at chip surface (mW) |
|--------------|----------------------------------|---------------------------------|----------------------------|---------------------------------|
| 0            | 168                              | 4.2                             | 168                        | 0                               |
| 100          | 157                              | 4.2                             | 163                        | 6                               |
| 200          | 134                              | 4.9                             | 148                        | 14                              |
| 400          | 17                               | 7.2                             | 87                         | 70                              |
| 600          | 0                                | -                               | -13                        | -                               |

In Figure 5.10 the graph shows how the difference grows exponentially between the calculated value and the measured value as the current increases. Expected excess power is based on the calculation of the Joule heating effect of the bias path, measured excess power is the result of direct measurement of the excess cooling power at the 2nd stage. Solid line shows the discrepancy between the expected excess power and measured excess power at the 2nd stage.



Figure 5.10: The power graph for Table 5.1.

As seen in Table 5.1 and Figure 5.10, after applying large enough current, the lack of heat removal causes a thermal gradient between the chip surface and the second stage. After some point, it sets so high that the Niobium exits its superconducting state and causes a positive heating feedback under constant bias current and the chip to heat up rapidly.

To determine the source of the thermal gradient problem, different parameters, sources of the heat resistance and power loss were investigated. One of these sources was the Joule heating of the contact junctions between the chip package and the figure spring contacts in the package. The other source that could be the reason for the power loss was the bond wires that connect the circuit pads to the chip package as seen in Figure 5.12. As we measured the contact and  $25\mu m$  bond wire resistances, we see that they are about  $0.5\Omega$  and  $0.17\Omega$  respectively. Then we changed the bond wire diameters to  $75\mu m$  with the expectance of lower joule heating and lower thermal resistance ( $R_{C-P}$ ). Figure 5.11 shows the results for the measurement with different bond wire widths. For this measurement four different structures were investigated:  $25\mu m$  and  $75\mu m$  diameter bond wires and they were connected to the

chip surface and directly to the copper holder under the chip substrate. As shown in Figure 5.11 the width of the bond wire have no effect on the breaking point of the power vs. current line, and both of the breaks happened at the same current of about 40 mA. The breaks of the curves do not happen in the lines that connect the package directly to the copper holder. The graphs for all 4 measurements match each other until the break point. After the break point, the cases that the wires are connected to the chip holder ground follow the profile of Joule heating ( $\propto I^2$ ) while the other case in which the bonds connected on chip surface are much sharper.



Figure 5.11: The power loss of each pin versus the current the pin carries (total current feed through 8 pin).

From the results shown in Figure 5.11, we conclude that the heat loss problem lies in the chip itself. To solve this problem, we decided to cover the chip surface with a layer of epoxy to create a direct thermal link between the chip and the holder to decrease the chip surface to holder thermal resistance (RS-H). For this purpose, we covered the chip with an epoxy that can withstand the 4K temperature without fracturing. The thermal expansion of the epoxy should not put stress on the wire bonds, the chip or the junctions themselves. After some trial and error with different materials, we decided that Stycast 2850 FT epoxy satisfies these conditions. Figure 5.12 shows the chip package for the cooler before and after applying epoxy on the surface.



Figure 5.12: The chip before applying the epoxy and after applying epoxy on it.

Table 5.2 and Figure 5.13 show the remaining power in the second stage and temperature at the chip surface as we apply current to the bias pins with an epoxy on the chip surface. The temperature at the second stage sensor was fixed at 4.2K. We see that the epoxy on the chip surface would decrease the thermal resistance between the chip surface and the holder dramatically (shown by RS-H in Figure 5.9), hence the thermal gradient between chip and second stage is dropped as seen in Table 2 and Figure 5.13 and shows how the difference grows exponentially between the calculated value and the measured value as the current increases.

Table 5.2 : Power and thermal gradient characteristic of cooler while applying different bias current via 8 Be-Cu bias pins.

| Current (mA) | Measured<br>Excess power<br>(mW) | Temperature at chip surface (K) | Expected<br>Excess Power<br>(mW) | Power loss at chip surface (mW) |
|--------------|----------------------------------|---------------------------------|----------------------------------|---------------------------------|
| 0            | 165                              | 4.22                            | 165                              | 0                               |
| 100          | 164                              | 4.21                            | 164                              | 0                               |
| 200          | 162                              | 4.21                            | 162                              | 0                               |
| 400          | 154                              | 4.23                            | 153                              | -1                              |
| 800          | 112                              | 4.29                            | 115                              | 3.5                             |
| 1000         | 80                               | 4.32                            | 87                               | 7.6                             |



Figure 5.13: The power graph for Table 5.2.

Figure 5.14 shows the results for the power measurement of different chips of the

same circuit with and without epoxy on the surface. The graph shows the power associated with one bias line as the current is applied via that line. Better thermal contact effect between the chip surface and second stage is clear in Figure 5.14. Of course an additional benefit of using an epoxy is to protect the chip and bonds to physical damage. However Epoxy Glass shown good thermal properties but it cracked at 4.2K and was not usable anymore.



Figure 5.14: The power consumption graphs per current of each pin by different coverings of the chip.

#### 5.1.5. Shielding

Because of the Josephson junction high intrinsic frequency, they are very sensitive to high frequency and RF noise. To conduct the experiments in a low noise environment, system was implemented in a shielded room. The shielded room reduced the electromagnetic noise at high frequencies at least 100dB. The main frequency that cause problem in the system was between 2- 4GHz, the cellular phone network. To insulate the cryocooler's compressor and vacuum pumps from the measurement system and eliminate any possible noise from them, Helium and vacuum pipe were inserted to the shield room via a copper dust filled medium that connects the pipes to the shield ground while maintaining flexibility to move the pipes. Figure 5.15 demonstrates the insertion medium for the pipes.



Figure 5.15: The copper dust medium for insertion of the pipes.

Another connection between the pulse tube cooler head and compressor are the control cables. These cables also needed to be inserted to the shield. Any ungrounded wire that enters to the shield room acts as an antenna and reduce the shielding capabilities drastically. To eliminate the antenna effects of the cables, it should pass through a low pass filter between the outside and inside the shielded room. As we observe the current-voltage specifications of the pulse tube-compressor connection, we realized that a usual powerline filter works for this purpose too.

Hence, by implementing these precautions, we could operate the pulse tube in a shielded room with about 10 dB deterioration from the designed value of 110dB at between 1GHz to 18GHz.

## 5.2. Testing the Noise and Stability of the System

For testing the robustness and noise level of our system, we measured three different structures in order to test the digital and analog response from the system. To test the digital response of the system, a basic Josephson-AND [64] cell based on the CONNECT library, which is used in larger integrated circuits repeatedly, was measured. The other two structures for testing the analog response and capabilities of

system, was a Josephson junction and series DC-SQUID with 100  $\mu A$  critical junction current. We implement an un-shunted Josephson junction. The quality of the I-V curves would show the precision of the system.

For testing the digital measurement capabilities, JAND cell with SFQ/DC and DC/SFQ converter cells placed in cooler and the bit error rate for this cell was measured by shifting bias values. Figure 5.16 shows the schematic and cell view of the J-AND.



Figure 5.16: TOP is the schematic of the J-AND cell and bottom is the picture of the fabricated cell.

# 5.2.1. Josephson junction

In order to measure the Josephson Junctions, we used the four-probe structure to eliminate the effect of the unwanted resistances. The outer probes would supply the current while the inner two probes would measure the voltage. The data for the measurement was acquired via Lecroy digital oscilloscope. Since the changes at the breakpoint of the graphs are very sharp, there is little to no data acquisition at these points in graph and it may cause a little tilt at the center. The un-shunted junction critical current is  $100\mu A$ . Figure 5.17 shows the result for the I-V curve measurement.

By using an analog oscilloscope, the I-V curves are better observed, Figure 5.18 shows the result of the I-V measurement from 300- series DC-SQUIDs.



Figure 5.17: Un-shunted Josephson junction I-V curve.



Figure 5.18: 300 series DC-SQUID I-V characteristics.

The difference in the measured current and the design value is due to the Jc spread (Changes in critical current of the superconductive layers due to fabrication imperfections) in fabrication process, therefore the Junctions open at a slightly different current level. Other than that, there is very little fluctuation in the graph that indicates the system has an acceptable noise level for further measurements[106].

#### 5.2.2. Connect JAND cell

Figure 5.19 shows the inputs and output of a J-AND cell. The test frequency is 10 KHz and the input values are 50 mV which creates 1 mAof current after passing through on chip  $50 \Omega$  resistors. The bias values were near the design value that shows acceptable fabrication parameters. This test was conducted outside the RF-shielded room.

As it is shown in Figure 5.19 despite the fact that there were no shielded room and the output was amplified about 200 times, the noise level in the output is very low and comparable with liquid Helium cryostats.



Figure 5.19: Input and output results of the single J-AND cell.

One of the most important parameters for the cells, in order to become applicable in large scale designs is bit error rate (BER). This parameter determines that in what bias margin the cell is working with the acceptable low error value. The cells should show very low BER in the design margins so at bigger circuits, the working condition could be high enough for fabrication. We have calculated the BER of the JAND cell that is shown in Figure 5.20. The bias margin was even better than predicted design value, which may have been caused by the changes in the fabrication process parameters.



Figure 5.20: Bit error rate of the and cell in different bias values.

The design value for the JAND cell is 3.7mA. The margin is calculated in (5.2).

$$\begin{cases} \frac{I_{\text{max}} - I_{\text{design}}}{I_{\text{design}}} = \%25\\ \frac{I_{\text{design}} - I_{\text{min}}}{I_{\text{design}}} = \%50 \end{cases}$$
(5.2)

The upper margin of the bias current is %25, while the lower margin is about %50. The values of the bias margins and the rapid changes in the boundary of the margins, shows a very low noise and robust function of the system.

## **5.3.** System Automation

In attempt to make an automated setup with the minimum possible user interface, we used various computer controlled instruments and wiring configuration integrated with LabVIEW programs, to apply the waveforms obtained from Verilog digital

simulator. Via comparing the circuit output and the expected output also from Verilog test-bench and changing the bias values accordingly we manage to calculate the circuit's bit-rate error at design bias values. Then by running an optimization algorithm similar to our previously reported works for circuit optimization, we determine the optimum bias point of the circuit by changing the bias values in an automated manner. Objective function of the circuit is set as to minimize the bit error rate and maximize the bias margin of the circuit.

Traditionally Verilog is used as a digital simulator in circuit design work flows. The designer should generate digital waveforms and set the timings right in order to get the correct output. Therefore, for every digital circuit the input, output and timings are known to user. The values for bias margins and operating parameters in design level are also known by incorporating analog simulators and Verilog. However, after fabrication due to the tolerances in the process, the optimum values and margins tend to change and in larger circuits, where there are a handful number of bias points, it would be a very complex and time consuming process to determine the optimum values for circuit operation. Moreover, if one needs to measure a gray-zone behavior or the bit rate error of a circuit, testing procedure get even more cumbersome as many waveforms should be measured for a number of bias parameters and it may get impractical to fully characterize a circuit. Eventually, we determine the bias margins around the optimal bias point and other parameters of the circuits if needed.

#### **5.3.1.** Methodology

By incorporating multi-channel function generator and data acquisition cards with non-gradient based optimization algorithms in Labview environment, we were able to send the proper waveform and bias currents to the test chip that was in the 4K cryostat environment. The output signals are acquired and then compared to the expected outputs. By taking enough samples, we will be able to calculate the output probability of the cell and the BER for every specific test condition. The setup and test flowcharts are shown in Figure 5.21.



Figure 5.21: (a) Hardware setup of the test bench and (b) Forward solution for determining the best working point.

The tests were taken in low frequency speed caused by the limitations in waveform generators and amplifiers. To generate the input and clock waveform NI PCI-6551 digital waveform generator card was used. The waveforms went through a band-pass filter to achieve compatible RSFQ waveform. NI PCI-6704 multichannel current source is used to give the input bias currents for Josephson junctions. Some custom low-pass filters with bandwidth of 10 Hz were used on this card to prevent the current output from sharp changes and at last NI PCI-6133 data acquisition card was used to sample the output signal after they pass through a low noise preamplifier [107].

Whole system was controlled with an automated Labview program. To achieve lower noise levels at cryostat system the cards were connected to the PC via fiber-optic cables. The bias currents could be set automatically or manually for single tests or swept to gain the margins of the cells.

In our former attempts we manage to optimize the cell's delays and margins in design stage using PSO algorithm[108], [109]. The goal of this implemented system was to calculate the BER of the fabricated RSFQ cells and optimizing the bias voltages in order to get the lowest possible bit rate error with highest bias margin and achieve a robust system which is operational in real life applications. The PSO algorithm was selected because of its simplicity and the ability to determine global best working condition and the initial guess was taken by the simulation results and

designed parameters [110].

By using this method we manage to eliminate the time consuming process of the test and the human error factor. Figure 5.22 and Figure 5.23 show us the margin results that we get from our custom designed JOR and DFF cell.



Figure 5.22 : DFF and JOR output probability.

Calculating the BER and finding the optimized bias gives us an insight in designing robust cells which are applicable in advanced and more complex circuits. The optimizing algorithm was very basic in this system which will be updated to more advanced algorithm in the future. In order to get precise results we need to replicate these tests in high frequency which will be done by implementing a high frequency test bed in the future.



Figure 5.23: Bias margin percentage for each stage of the circuit.

#### 6. RESULTS AND CONCLUSION

Nowadays, the need for the higher speed and lower power consuming computers lead to searching for alternative logics to CMOS technology. Recent advances in the field of superconductor logic technology and superconducting very large scale integration (VLSI) circuit fabrication allows us to design complex rapid single flux quanta (RSFQ) circuits and structures with high number of Josephson junctions on one chip. These advances lead to developing logics that consume power orders of magnitude less than MOSFETs and working at the relatively higher frequency [13], [111].

Many efforts have been made to develop stable superconducting processors and coprocessors [70], [75], [112]. However, all these circuits were designed without any consideration for power dissipation since they were tested in liquid helium cryostat. This method is not very practical for commercial use for various reasons. Limited helium sources that lead to rising price of liquid helium will make the helium based cryostats obsolete in near future. More importantly the short term functionality of liquid helium cryostats between each recharge, make these circuits only fit for research purposes. For a stable and long term use of RSFQ logics there is a need for a low noise environment with very stable temperature.

Figure 6.1 shows the structure of an arithmetic logic unit with the designed interface circuits. These interface circuits are to synchronize the low frequency CMOS clock to high frequency RSFQ clocking. The function of the input and output registers are mentioned in chapter 4. Here the parts that are filled with blue color are the superconductor circuits work at the second stage of cryocooler. The CMOS circuits could be mounted at the first stage.

In this chapter we will discuss the design for a 4-bit parallel and serial arithmetic logic unit with RSFQ logic regime to be tested in closed cycle cryostat system. The parallel architecture is relatively simple in comparison with others and much faster than the serial ALU, but it needs higher bias current. Since RSFQ gates need clock

signal to operate, the clock tree and wiring between gates, consume most of the chip space and bias current.



Figure 6.1 : Superconductor ALU with interface circuits in relation to the CMOS circuits.

In order to operate such a relatively large circuit in cryocooler, some conditions should be considered. One of the main problems with such big circuits is the bias value. The RSFQ circuits are biased with current. However the power consumption for the circuits is low, because of the low voltage of 2.5mV, the current needed would be large. The large current can cause heating in the wires and that will lower the remaining power at the second stage of the cryocooler. Therefore, special wires for bias lines were used and the main heating load was put on the first and middle stage of the cooler.

Superconductor circuits are highly sensitive to magnetic noise. The other problem caused by the high current bias is the magnetic field generated around the bias line. To solve this problem, in large circuits there is a need for a bias line that has a negative value of the applied bias to drain that bias and negate the magnetic field.

# 6.1. Parallel Arithmetic Logic Unit

The memory design is still a problem in RSFQ technology[18], [19], [113]. To overcome this issue the coprocessor was designed to work alongside a CMOS processor. The CMOS processor gives the instruction sets to the RSFQ processor and stores the inputs and out-puts in the external memory. The FPGA and CMOS memory could be placed at the first stage of the cryocooler at 55K. However, the RSFQ clock is much faster than the CMOS clock frequency and we cannot over clock the CMOS to RSFQ frequency. To communicate with the CMOS processor without losing computing power or data, there are different input and output stages in place. These stages or registers will act as a serial to parallel and parallel to serial converter with data buffer to synchronize the both clocks. The data would be stored in the input buffer of the input stage by incoming CMOS clock and would be unloaded to the RSFQ ALU by the series of fast clock generator pulses that is placed on the chip. Figure 6.2 shows the block diagram for the ALU in this computational structure.

In order to design an arithmetic logic unit, there are various architectures to choose from. Some of these architectures include serial, parallel, bit-sliced and Kogge-Stone. Every one of them has its own advantages and disadvantages. Since the gates that we use in RSFQ circuits are clocked, the clock tree would consume most of the power and cause the most delay as the architecture gets complex. The Kogge-Stone architecture have a good delay and latency, however its complex tree causes much power consumption and make the circuit bulky as seen in our previous work [114]. The parallel architecture is not as complex as Kogge-Stone but the output delay is higher when scaled to higher bits.



Figure 6.2: The ALU used inside the coprocessor.

Figure 6.3 shows the block diagram of the design. The RSFQ cells are all clocked and need a clock tree to spread the signal to the entire chip at the right time. For this purpose, various buffers are placed at the input of each stage. These buffers are designed with a simple D-flip flop cell. They would hold the input of each stage for clock signal and would synchronize the outputs of all the stages. This will reduce the bit error rate of the circuit drastically and give a robust latency for arithmetic and logic outputs. The other advantage of using DFF at the outputs to delay the signals is reduction in the bias current of the circuit. However these DFF cells would need clock signals and it may add to the clock tree complexity.



Figure 6.3 : Block diagram of parallel ALU.

In total the ALU have eight main instruction sets. Later, this ALU will be used with another processor which is working in room temperature to overcome the memory problem of superconductor circuits. The parts for this ALU consist of a 4-bit multiplier, 4-bit logic gates including AND, OR, XOR and NOT gates, adder and subtract circuits and the needed flag bits. Table 6.1 shows the operators and their instruction set. The ALU will generate 3 different flag signals. These flags are Zero, Negative and Carry. Also for synchronizing the outputs and inputs together, input and output registers were designed and fabricated.

Table 6.1: The operations of the parallel ALU and the select bits for them.

|       | XOR   | OR    | AND   | NOT  | ADD   | SUB   | MUL   |
|-------|-------|-------|-------|------|-------|-------|-------|
|       | (A^B) | (A B) | (A&B) | (~A) | (A+B) | (A-B) | (A×B) |
| $S_0$ | 10    | 11    | 10    | 11   | 00    | 10    | XX    |
| $S_1$ | 10    | 11    | 11    | 10   | 10    | 10    | 11    |
| $S_2$ | 11    | 11    | 11    | 11   | 10    | 10    | 10    |

#### 6.1.1. Fabricated circuit

The circuits were designed and then fabricated in AIST CRAVITY with STP2 process [66]. The chip was fabricated using CONNECT cells [63]. The ALU has in total more than 9000 JJs and consumes about 1 Ampere of bias current. Since the RSFQ circuits are very sensitive to the magnetic fields, the bias could not be fed to the chip directly. As mentioned before, the high bias value should be compensated by applying reverse biases, beside the main bias points[115]. In reverse biasing the applied bias value would be drained by a negative bias via a ground pin placed beside the original bias pin. Then the magnetic fields caused by big biases would neglect each other and we would have a higher bias margin. However, this method would double the amount of bias applied to the chip and the final value would be about 2 Amperes of bias. The chip has 20 bias points in total, four of them are dedicated to (DC/SFQ) and (SFQ/DC) modules and the remaining 16 bias points are for the main circuit. For compensating the large magnetic field generated by the input main bias, 8 of the 16 input biases are dedicated to the bias reversing, in which we apply the same value of bias that we applied to chip, to the ground with negative polarization. Figure 6.4 demonstrates the fabricated chip with the different stages in the ALU. The series DC-SQUID was also placed on the chip. By monitoring the I-V characteristic of the series DC-SQUIDs, not only we can determine the quality of the fabricated chip but also the surface temperature of the chip at different bias values could be measured as discussed in Chapter 5. The Chip includes different blocks that are connected together these blocks include: 1) The XOR gates at input of adder for subtraction. These gates make the two's complement of the register B on the arrival of the subtractor select bit. 2) Carry look-ahead adder stage that is discussed in Chapter 4. 3) Multiplier stage that is also discussed in Chapter 4 and is the biggest block of the circuit. 4) Logic Unit with Not, AND, OR and XOR gates. 5) The multiplexer stages using the T1 cells. 6) Zero and carry flags to control the state of the chip and output accuracy. 7) Series SQUIDs for temperature and fabrication parameter measurement.



Figure 6.4: The fabricated ALU with STP2 process.

# **6.1.2.** Results

The analog simulation was done by JSIM which is a Spice based simulator for Josephson junction based circuits. The output clock signal, the most critical path of chip, was monitored for different bias values. Figure 6.5 shows the clock signal input and output of the logic unit on the circuit's longest path. As it is seen in this figure, the latency for the output on the most critical path of the signal is about 1ns for normal conditions.



Figure 6.5: The JSIM analog simulation of the ALU circuit on its most critical path (The clock tree).

The ALU was simulated in Verilog with different bias values to determine the latency and work condition for all the possible input values. The digital simulation result for the ALU using Verilog is shown in Figure 6.6. In this figure all the possible input combinations are fed to circuit and then the outputs are analyzed.



Figure 6.6 : Simulation results for the ALU in various input conditions.

Figure 6.6 includes: a) The result for the adder as the inputs change. b) Result for the multiplier. c) Result for the Josephson OR gate. The bias was swept around 2.5 mV design value and the correct function of the ALU in different biases was investigated. All the sub-circuits were also simulated by Verilog and JSIM. The bias margins for the circuits and the output probability of them were also tested using analog and digital simulation. Then these subparts were fabricated independently and were tested in our cryocooler system as seen in Chapter 4.

Figure 6.7 shows the results from measurements made in liquid Helium cryostat in Nagoya University. The problems from fabrication mentioned in the conclusion prevent us from getting results from this measurement.



Figure 6.7: Experimental results from 4-bit parallel ALU.

# 6.2. Serial Arithmetic Logic Unit

We have designed another ALU with serial architecture. The serial architecture has a basic cell and by repeating this cell and connecting the clock and carry signal from each cell to the other one, we are able to create as many bits as possible. Figure 6.8

shows the architecture of a single serial ALU cell block and the connection between different cells to form an n-bit ALU.

While the serial architecture is very easy to scale, at higher bits it is not very useful and the output delay will negate the RSFQ high clock frequency. Since the serial architecture is easy to implement and the bit number can be increased just by cascading the units, many RSFQ logic units are designed with this architecture [75], [79], [112], [116], [117].



Figure 6.8: Block diagram of serial ALU.

Table 6.2 shows the list of operations that are available in the structure of serial ALU. The select bit sequence to choose these operations are also shown. The main difference between the operations in the serial ALU and parallel ALU is the multiplier cell that is available in parallel architecture. In the serial architecture, the multiplier cannot be placed because of the nature of circuit.

| 1 able 6.2 : | The operations | s of the serial | I ALU and | i the select | bits for them. |
|--------------|----------------|-----------------|-----------|--------------|----------------|
|              |                |                 |           |              |                |

|       | XOR   | OR    | AND   | NOT  | ADD   | SUB   |
|-------|-------|-------|-------|------|-------|-------|
|       | (A^B) | (A B) | (A&B) | (~A) | (A+B) | (A-B) |
| $S_0$ | 10    | 11    | 10    | 11   | 00    | 10    |
| $S_1$ | 10    | 11    | 11    | 10   | 10    | 10    |
| $S_2$ | 11    | 11    | 11    | 11   | 10    | 10    |

### **6.2.1.** Fabricated circuit

After the design of the single cell for using in the serial ALU architecture, we also fabricated the single cell to test it and confirm its functionality. Figure 6.9 shows the design of a single cell block for the serial ALU. The cell size is about 1.6mm and consumes about 110mA of bias current at 2.5mV. By combining four of these single ALU cells, we could form a functional 4-bit serial ALU. Figure 6.10 shows the fabricated serial ALU with standard process STP2. The final design of the ALU consumes about 480mA of bias current at 2.5mV which is half the amount that a serial ALU consumes.



Figure 6.9 : A single cell of a serial ALU fabricated with standard process STP2.

In the serial structure, the carry bit is generated in each cell after the operation is done and this bit is propagated to the other cells via passive transmission lines. The other bit that propagates between the cells is the clock signal. The clock signal path determines the most critical path in the ALU and therefore determines the latency of the circuit.



Figure 6.10: 4-bit serial ALU fabricated with standard process STP2.

### **6.2.2.** Results

To generate the input signals and clock waveform, pattern generator system with 300MHz bandwidth and Double Data Rate (DDR) capabilities is used that could increase the frequency to 600MHz. The waveform goes through a band-pass filter to achieve compatible RSFQ waveform. DC to SFQ convertor generates a SFQ pulse as the input amplitude reaches 1mA. A precise multichannel current source is used to give the input bias currents for the circuits with low bias values. In case of the high bias level circuits, the 48-channel current source is used. Some custom low-pass filters with 100 Hz bandwidth were used on the precise current source card. The filters prevent the current output from sharp changes and damaging the circuit. Data acquisition card and/or logic analyzer samples the output signal after they pass through an NF low noise preamplifier with maximum gain of 60dB. At low

frequency, the bandwidth of the system is limited by NF preamplifiers at about 1MHz. All the equipment is controlled via LabVIEW program to minimize the user interference in the tests.

The functionality of the serial ALU circuit was confirmed via applying clock signal patterns to it at low frequency and observing the output clock result. Figure 6.11 shows the result for the clock test. The output clock is the signal extracted from clock tree after traveling through all the circuit so it shows the most critical path in the circuit.



Figure 6.11: Clock in and clock out from a serial ALU tested in our cryocooler system.

Figure 6.12 demonstrate the output of a serial ALU after applying the instruction set for the ADD operation on two input registers. These signals would command the ALU to add 1010 at the B register with 0001 at the A register. The result should be 1011 but here we only see 0011. The fourth output is not functioning correctly and it is due to error in fabrication rules.

Since the architecture of the bit-serial ALU is less complicated than the parallel ALU, we were able to make some measurements despite the errors that caused due fabrication specifically in one of the most used cells in the ALU design.



Figure 6.12: Inputs and output signals of the serial ALU circuit.

## 6.3. Conclusion

A 4-bit parallel arithmetic logic unit (ALU) with rapid single flux quantum (RSFQ) logic was designed. The parallel architecture allows the simpler structure than Kogge-Stone while maintaining the good latency. The ALU was designed using standard cell library to be fabricated with STP2 (2.5 KA/cm2) process and have a latency of 620 ps at the most critical path at 2.5 mV bias in 25 GHz. The ALU consists of more than 9000 junctions and has 8 different operations including multiplication, add and subtract, and needs about 2.5 mW of power in reverse bias regime. This logic unit was designed to be used as a coprocessor with external CMOS processors and be able to be used with CMOS memories. To confirm the working of the ALU, first all the parts were separately fabricated and tested in 4K pulse-tube cryocooler at TOBB ETU. The results of these tests are presented in the thesis.

The sub-circuits for making the RSFQ ALU compatible in speed with external CMOS processor were also designed. All the sub-circuits were tested in our pulse-tube 4K cooler with satisfying bias margins. Because of the power limit to the cooler and high bias current of the ALU circuit in reverse biasing mode, the bias current could not reach its design value to test the circuit as whole. In order to overcome the problem many different methods have been incorporated and the packaging was improved. The simulation results showed about ±18% bias margin for parallel ALU. The tests were also carried out in the liquid helium cryostat but there were no satisfying results due to unforeseen errors in the fabrication rules. These errors cause leakage in the pins and some cells that prevent the signal to propagate correctly in the circuit.

A bit-serial ALU was also designed and fabricated. The designed ALU consumed about 1.3 mW of power and we were able to test the fabricated circuit in our cryocooler system. The test system was very robust during the whole process but the fabrication errors mentioned earlier prevent us from getting the bias margins for the designed ALU.

#### REFERENCES

- [1] **Nagasawa, S., et al.,** (2004). Nb 9-Layer Fabrication Process for Superconducting Large-Scale SFQ Circuits and Its Process Evaluation, *IEICE Trans. Electron.*, E97–C, 132–140.
- [2] **Tolpygo, S. K., et al.,** (2015). Inductance of Circuit Structures for MIT LL Superconductor Electronics Fabrication Process With 8 Niobium Layers, *IEEE Trans. Appl. Supercond.*, 25, 1–5.
- [3] **Holmes, D. S., Ripple, A. L., Manheimer, M. A.**, (2013). Energy-Efficient Superconducting Computing Power Budgets and Requirements, *IEEE Trans. Appl. Supercond.*, 23, 1701610–1701610.
- [4] **Likharev, K. K., Semenov, V. K.,** (1991). RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems, *IEEE Trans. Appl. Supercond.*, 1, 3–28.
- [5] **Brock, D. K.,** (2001). RSFQ technology: Circuits and systems, *High-Speed Integr. Circuit Technol.* 100 GHz Log., 11, 307–362.
- [6] C. Bronk, C., Lingamneni, A., Palem, K., (2010). Innovation for sustainability in information and communication technologies (ICT), *Inst. Public Policy, Rice Univ., Houston, TX, USA, Technical Report.*
- [7] < https://www.top500.org/>, Accessed: 17-Nov-2017.
- [8] **Koomey, J. G.,** (2008). Worldwide electricity used in data centers, *Environ. Res. Lett.*, 3, 34008.
- [9] Van Heddeghem, W., Lambert, S., Lannoo, B., Colle, D., Pickavet, M., Demeester, P., (2014). Trends in worldwide ICT electricity consumption from 2007 to 2012, *Comput. Commun.*, 50, 64–76.
- [10] **Shehabi, A., et al.,** (2016). United States Data Center Energy Usage Report.
- [11] **Markov, I. L.,** (2014). Limits on fundamental limits to computation, *Nature*, 512, 13570.
- [12] **Meindl, J. D.**, Davis, J. A., (2000). The fundamental limit on binary switching energy for terascale integration (TSI), *IEEE J. Solid-State Circuits*, 35, 1515–1516.
- [13] **Lloyd, S.,** (2000). Ultimate physical limits to computation, *Nature*, 406, 35023282.
- [14] **Likharev, K. K.**, (1986), Dynamics of Josephson junctions and circuits. *CRC*.

- [15] Chen, W., Rylyakov, A. V., Patel, V., Lukens, J. E., Likharev, K. K., (1999). Rapid single flux quantum T-flip flop operating up to 770 GHz, *IEEE Trans. Appl. Supercond.*, 9, 3212–3215.
- [16] Yamanashi, Y., Nishigai, T., Yoshikawa, N., (2007). Study of LR-Loading Technique for Low-Power Single Flux Quantum Circuits, *IEEE Trans. Appl. Supercond.*, 17, 150–153.
- [17] **Ortlepp, T., Wetzstein, O., Engert, S., Kunert, J., Toepfer, H.,** (2011). Reduced Power Consumption in Superconducting Electronics, *IEEE Trans. Appl. Supercond.*, 21, 770–775.
- [18] **Duzer, T. V., Feng, Y., Meng, X., Whiteley, S. R., Yoshikawa, N.,** (2002). Hybrid Josephson-CMOS memory: a solution for the Josephson memory problem, *Supercond. Sci. Technol.*, 15, 1669.
- [19] **Duzer, T. V., et al.,** (2013). 64-kb Hybrid Josephson-CMOS 4 Kelvin RAM With 400 ps Access Time and 12 mW Read Power, *IEEE Trans. Appl. Supercond.*, 23, 1700504–1700504.
- [20] **Cyrot, M., Pavuna, D.,** (1992). Introduction to Superconductivity and High-Tc Materials. *World Scientific Publishing Company*.
- [21] **Tinkham, M.,** (1996). Introduction to Superconductivity, *Courier Corporation*.
- [22] Van Duzer, T., Turner, C. W., (1981). Principles of superconductive devices and circuits, *John Wiley & Sons*.
- [23] **Poole, C. K., Farach, H. A., Creswick, R. J.,** (1999). Handbook of Superconductivity, *Academic Press*.
- [24] Clarke, J., Braginski, A. I., (2006). The SQUID Handbook: Applications of SQUIDs and SQUID Systems, *John Wiley & Sons*.
- [25] **Braginski, A. I., Clarke, J.,** (2005). Introduction, in The SQUID Handbook, J. Clarke and A. I. Braginski, Eds. Wiley-VCH Verlag GmbH & Co. KGaA, 1–28.
- [26] Chesca, B., Kleiner, R., Koelle, D., (2005). SQUID Theory, in The SQUID Handbook, J. Clarke and A. I. Braginski, Eds. Wiley-VCH Verlag GmbH & Co. KGaA, 29–92.
- [27] Larbalestier, D., Gurevich, A., Feldmann, D. M., Polyanskii, A., (2010). High-Tc superconducting materials for electric power applications, in *Materials for Sustainable Energy, Co-Published with Macmillan Publishers Ltd, UK*, 311–320.
- [28] **Leung, E. M.,** (2000). Superconducting fault current limiters, *IEEE Power Eng. Rev.*, 20, 15–18, 30.
- [29] Lee, A. T., Richards, P. L., Nam, S. W., Cabrera, B., Irwin, K. D., (1996). A superconducting bolometer with strong electrothermal feedback, *Appl. Phys. Lett.*, 69, 1801–1803.
- [30] **Matthaei, G. L.**, (2003). Narrow-band, fixed-tuned, and tunable bandpass filters with zig-zag hairpin-comb resonators, *IEEE Trans. Microw. Theory Tech.*, 51, 1214–1219.

- [31] **Buck, D. A.,** (1956). The Cryotron-A Superconductive Computer Component, *Proc. IRE*, 44,482–493.
- [32] **Benz, S. P., Hamilton, C. A.**, (1996). A pulse-driven programmable Josephson voltage standard, *Appl. Phys. Lett.*, 68, 3171–3173.
- [33] **Rose-Innes, A. C.**, (2012). Introduction to Superconductivity, *Elsevier*.
- [34] **Kadin, A. M.,** (1999). Introduction to superconducting circuits, *New York*.
- [35] **ITRS**, (2004). International Technology Roadmap for Semiconductors 2004 Emerging Research Devices.
- [36] **NSA**, (2005). Superconducting Technology Assessment, *National Security Agency of America (NSA)*.
- [37] **Filippov, T. V., et al.,** (2012). 20 GHz Operation of an Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit, *Phys. Procedia*, 36, 59–65.
- [38] **Dorojevets, M., Kasperek, A. K., Yoshikawa, N., Fujimaki, A.**, (2013). 20-GHz 8 8-bit Parallel Carry-Save Pipelined RSFQ Multiplier, *IEEE Trans. Appl. Supercond.*, 23, 1300104.
- [39] **Dorojevets, M., Ayala, C. L., Yoshikawa, N., Fujimaki, A.**, (2013). 8-Bit Asynchronous Sparse-Tree Superconductor RSFQ Arithmetic-Logic Unit With a Rich Set of Operations, *IEEE Trans. Appl. Supercond.*, 23, 1700104.
- [40] Filippov, T., Dorojevets, M., Sahu, A., Kirichenko, A., Ayala, C., Mukhanov, O., (2011). 8-Bit Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit, *IEEE Trans. Appl. Supercond.*, 21, 847–851.
- [41] **Dorojevets, M., Bunyk, P.,** (2003). Architectural and implementation challenges in designing high-performance RSFQ processors: a FLUX-1 microprocessor and beyond, *IEEE Trans. Appl. Supercond.*, 13, 446–449.
- [42] **Fujimaki, A., Tanaka, M., Yamada, T., Yamanashi, Y., Park, H., Yoshikawa, N.,** (2008). Bit-Serial Single Flux Quantum Microprocessor CORE, *IEICE Trans. Electron.*, E91–C, 342–349.
- [43] **Bunyk, P., Leung, M., Spargo, J., Dorojevets, M.,** (2003). Flux-1 RSFQ microprocessor: physical design and test results, *IEEE Trans. Appl. Supercond.*, 13, 433–436.
- [44] Chen, W., Rylyakov, A. V., Patel, V., Lukens, J. E., Likharev, K. K., (1999). Rapid single flux quantum T-flip flop operating up to 770 GHz, *IEEE Trans. Appl. Supercond.*, 9, 3212–3215.
- [45] Chen, W., Rylyakov, A. V., Patel, V., Lukens, J. E., Likharev, K. K., (1998). Superconductor digital frequency divider operating up to 750 GHz, *Appl. Phys. Lett.*, 73, 2817–2819.
- [46] **Askerzade, I., Bozbey, A., Cantürk, M.**, (2017). Digital Superconductivity Electronics, in Modern Aspects of Josephson Dynamics and Superconductivity Electronics, *Springer, Cham*, 89–118.

- [47] **Kirichenko, D. E., Sarwana, S., Kirichenko, A. F.**, (2011). Zero Static Power Dissipation Biasing of RSFQ Circuits, *IEEE Trans. Appl. Supercond.*, 21, 776–779.
- [48] Ball, P., (2012). Computer engineering: Feeling the heat, Nat. News, 492, 174.
- [49] **Service, R. F.**, (2012). Computer science. What it'll take to go exascale, *Science*, 335, 394–396.
- [50] **Takeuchi, N., Yamanashi, Y., Yoshikawa, N.**, (2014). Reversible logic gate using adiabatic superconducting devices, *Sci. Rep.*, 4, 6354.
- [51] **Landauer, R.**, (1961). Irreversibility and Heat Generation in the Computing Process, *IBM J. Res. Dev.*, 5, 183–191.
- [52] **Fredkin, E., Toffoli, T.**, (1982). Conservative logic, *Int. J. Theor. Phys.*, 21, 219–253.
- [53] **Fredkin, E.**, (1990). An informational process based on reversible universal cellular automata, *Phys. Nonlinear Phenom.*, 45, 254–270.
- [54] **Keyes, R. W., Landauer, R.**, (1970). Minimal Energy Dissipation in Logic, *IBM J. Res. Dev.*, 14, 152–157.
- [55] **Likharev, K.**, (1977). Dynamics of some single flux quantum devices: I. Parametric quantron, *IEEE Trans. Magn.*, 13, 242–244.
- [56] **Semenov, V. K., Danilov, G. V., Averin, D. V.,** (2003). Negative-inductance SQUID as the basic element of reversible Josephson-junction circuits, *IEEE Trans. Appl. Supercond.*, 13, 938–943.
- [57] **Phillips, I., Ulidowski, I.**, (2014). Event Identifier Logic, *Math. Struct. Comput. Sci.*, 24.
- [58] **Phillips, I., Ulidowski, I., Yuen, S.,** (2013). Modelling of Bonding with Processes and Events, *in Reversible Computation*, 141–154.
- [59] **Phillips, I., Ulidowski, I., Yuen, S.,** (2014). Concurrency and Reversibility, *in Reversible Computation*, 1–14.
- [60] **Takeuchi, N., Yamanashi, Y., Yoshikawa, N.,** (2014). Reversible Computing Using Adiabatic Superconductor Logic, *in Reversible Computation*, 15–25.
- [61] **Hosoya, M., et al.,** (1991). Quantum flux parametron: a single quantum flux device for Josephson supercomputer, *IEEE Trans. Appl. Supercond.*, 1, 77–89.
- [62] **Fujimaki, A., Tanaka, M.,** (2003). CONNECT cell library handbook.
- [63] Yorozu, S., Kameda, Y., Terai, H., Fujimaki, A., Yamada, T., Tahara, S., (2002). A single flux quantum standard logic cell library, *Phys. C Supercond.*, 378–381, 1471–1474.
- [64] **Likharev, K. K., Semenov, V. K.,** (1991). RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems, *IEEE Trans. Appl. Supercond.*, 1, 3–28.

- [65] **Tanaka, M., et al.**, (2005). Demonstration of a single-flux-quantum microprocessor using passive transmission lines, *IEEE Trans. Appl. Supercond.*, 15, 400–404.
- [66] **Hidaka, M., Nagasawa, S., Satoh, T., Hinode, K., Kitagawa, Y.,** (2006). Current status and future prospect of the Nb-based fabrication process for single flux quantum circuits, *Supercond. Sci. Technol.*, 19, S138.
- [67] <a href="http://unit.aist.go.jp/riif/openi/cravity/en/index.html">http://unit.aist.go.jp/riif/openi/cravity/en/index.html</a>, Accessed: 25-Dec-2013.
- [68] < https://unit.aist.go.jp/neri/cravity/en/index.html >, Accessed: 14-Jan-2018.
- [69] **Bunyk, P., Semenov, V. K.,** (1995). Design of an RSFQ microprocessor, *IEEE Trans. Appl. Supercond.*, 5, 3325–3328.
- [70] **Dorojevets, M., Bunyk, P., Zinoviev, D.,** (2001). FLUX chip: design of a 20-GHz 16-bit ultrapipelined RSFQ processor prototype based on 1.75- um LTS technology, *IEEE Trans. Appl. Supercond.*, 11, 326–332.
- [71] **Dorojevets, M., Bunyk, P.,** (2003). Architectural and implementation challenges in designing high-performance RSFQ processors: a FLUX-1 microprocessor and beyond, *IEEE Trans. Appl. Supercond.*, 13, 446–449.
- [72] **Kim, J. Y., Kim, S., Kang, J.**, (2005). Construction of an RSFQ 4-bit ALU with half adder cells, *IEEE Trans. Appl. Supercond.*, 15, 308–311.
- [73] **Gerber, H. R., Fourie, C. J., Perold, W. J., Muller, L. C.,** (2007). Design of an asynchronous microprocessor using RSFQ-AT, *Appl. Supercond. IEEE Trans. On*, 17, 490–493.
- [74] **Tanaka, M., et al.,** (2015). Development of Bit-Serial RSFQ Microprocessors Integrated with Shift-Register-Based Random Access Memories, in 2015 15th International Superconductive Electronics Conference (ISEC), 1–3.
- [75] Tang, G. M., Takata, K., Tanaka, M., Fujimaki, A., Takagi, K., Takagi, N., (2016). 4-bit Bit-Slice Arithmetic Logic Unit for 32-bit RSFQ Microprocessors, *IEEE Trans. Appl. Supercond.*, 26, 1–6.
- [76] Ando, Y., Sato, R., Tanaka, M., Takagi, K., Takagi, N., Fujimaki, A., (2016). Design and Demonstration of an 8-bit Bit-Serial RSFQ Microprocessor: CORE e4, *IEEE Trans. Appl. Supercond.*, 26, 1–5.
- [77] Ando, Y., Sato, R., Tanaka, M., Takagi, K., Takagi, N., (2015). 80-GHz Operation of an 8-Bit RSFQ Arithmetic Logic Unit, in 2015 15th International Superconductive Electronics Conference (ISEC), 1–3.
- [78] < https://www.righto.com/2017/03/inside-vintage-74181-alu-chip-how-it.html >, Accessed: 20-Mar-2017.
- [79] **Fujimaki, A., Tanaka, M., Yamada, T., Yamanashi, Y., Park, H., Yoshikawa, N.,** (2008). Bit-Serial Single Flux Quantum Microprocessor CORE, *IEICE Trans. Electron.*, E91–C, 342–349.
- [80] Ozer, M., Tukel, Y., Çelik, M. E., Bozbey, A., (2014). Design of RSFQ Asynchronous Pipelined Kogge-Stone Adder and Developing Custom Compound Gates, *Cryogenics*, 63, 174-179.

- [81] Suzuki, H., Nagasawa, S., Miyahara, K., Enomoto, Y., (2000). Characteristics of driver and receiver circuits with a passive transmission line in RSFQ circuits, *IEEE Trans. Appl. Supercond.*, 10, 1637–1641.
- [82] **Hashimoto, Y., Yorozu, S., Kameda, Y., Semenov, V. K.**, (2003). A design approach to passive interconnects for single flux quantum logic circuits, *IEEE Trans. Appl. Supercond.*, 13, 535–538.
- [83] **Polonsky, S. V., Semenov, V. K., Schneider, D. F.,** (1993). Transmission of single-flux-quantum pulses along superconducting microstrip lines, *IEEE Trans. Appl. Supercond.*, 3, 2598–2600.
- [84] < http://www.sonnetsoftware.com/products/sonnet-suites/>, Accessed: 26-Aug-2013.
- [85] **Mattis, D. C., Bardeen, J.,** (1958). Theory of the Anomalous Skin Effect in Normal and Superconducting Metals, *Phys. Rev.*, 111, 412–417.
- [86] Rafique, M. R., Kataeva, I., Engseth, H., Tarasov, M., Kidiyarova-Shevchenko, A., (2005). Optimization of superconducting microstrip interconnects for rapid single-flux-quantum circuits, *Supercond. Sci. Technol.*, 18, 1065.
- [87] **Takeuchi, N., Yamanashi, Y., Saito, Y., Yoshikawa, N.,** (2009). 3D simulation of superconducting microwave devices with an electromagnetic-field simulator, *Phys. C Supercond.*, 469, 1662–1665.
- [88] **Clarke, J., Braginski, A. I.,** (2002). The SQUID Handbook: Fundamentals and Technology of SQUIDs and SQUID Systems. *Weinheim; Cambridge: Wiley-VCH*.
- [89] **Yuce, B., Bozbey, A.**, (2010). Design of Relaxation Oscillator Based Ultrawideband SFQ Amplifier for Chip to Chip Interconnection, *J. Supercond. Nov. Magn.*, 24, 1071–1075.
- [90] **Ortlepp, T., Uhlmann, F. H.**, (2009). Impedance Matching of Microstrip Inductors in Digital Superconductive Electronics, *IEEE Trans. Appl. Supercond.*, 19, 644–648.
- [91] **Radenbaugh, R.**, (2004). Refrigeration for superconductors, *Proc. IEEE*, 92, 1719–1734.
- [92] **FANG, E. S.,** (1989). A Josephson integrated circuit simulator (JSIM) for superconductive electronics application, *Ext. Abstr. 1989 Int. Supercond. Electron. Conf. ISEC*89.
- [93] **Kusaka, K., et al.**, (2013). Long-Term Operation of the Superconducting Triplet Quadrupoles With Small Cryocoolers for BigRIPS In-Flight Separator and RI-Beam Delivery Line at RIKEN, *IEEE Trans. Appl. Supercond.*, 23, 4101305–4101305.
- [94] Kirkconnell, C. S., Hon, R. C., Perella, M. D., Crittenden, T. M., Ghiaasiaan, S. M., (2017). Development of a miniature Stirling cryocooler for LWIR small satellite applications, *Tri-Technology Device Refrigeration (TTDR) II*, 10180, 1018002.

- [95] **Hashimoto, Y., Yorozu, S., Kameda, Y.**, (2008). Development of Cryopackaging and I/O Technologies for High-Speed Superconductive Digital Systems, *IEICE Trans. Electron.*, E91–C, 325–332.
- [96] **Webber, R. J., Burroughs, C. J., Radparvar, M.,** (2007). Performance of a Cryocooled Nb DC Programmable Voltage Standard at 4 K, *IEEE Trans. Appl. Supercond.*, 17, 3857–3861.
- [97] **Webber, R. J., Dotsenko, V., Talalaevskii, A., Miller, R., Tang, J. C.,** (2008). Operation of Superconducting Digital Receiver Circuits on 2-Stage Gifford-McMahon Cryocooler, *Adv. Cryog. Eng.*, 42, 927–932.
- [98] Webber, R. J., Dotsenko, V. V., Delmas, J., Kadin, A. M., Track, E. K., (2009). Evaluation of a 4 K 4-stage Pulse Tube Cryocooler for Superconducting Electronics, in Cryocoolers 15th International Cryocooler Conference, 657–664.
- [99] Hashimoto, Y., Yorozu, S., Miyazaki, T., Kameda, Y., Suzuki, H., Yoshikawa, N., (2007). Implementation and Experimental Evaluation of a Cryocooled System Prototype for High-Throughput SFQ Digital Applications, *IEEE Trans. Appl. Supercond.*, 17, 546–551.
- [100] **Ortlepp, T., Fourie, C.,** (2008). Cyrocooler-cooled RSFQ circuits, *Present. Cryocooler Workshop*.
- [101] **Dotsenko, V. V., et al.,** (2009). Integration of a 4-Stage 4 K Pulse Tube Cryocooler Prototype With a Superconducting Integrated Circuit, *IEEE Trans. Appl. Supercond.*, 19, 1003–1007.
- [102] **Webber, R. J., et al.,** (2008). Operation Of Superconducting Digital Receiver Circuits On 2-Stage Gifford-Mcmahon Cryocooler, *in AIP Conference Proceedings*, 985, 927–932.
- [103] Engseth, H., Rafique, R., Kataeva, I., Intiso, S., Kidiyarova-Shevchenko, A., (2007). Room Temperature Interface for RSFQ Digital Signal Processor, *IEEE Trans. Appl. Supercond.*, 17, 979–982.
- [104] **Glassbrenner, C. J., Slack, G. A.,** (1964). Thermal Conductivity of Silicon and Germanium from 3K to the Melting Point," *Phys. Rev.*, 134, A1058–A1069.
- [105] **Maddock, B. J., James, G. B., Norris, W. T.,** (1969). Superconductive composites: Heat transfer and steady state stabilization, *Cryogenics*, 9, 261–273.
- [106] **Lee, P. A.,** (1971). Effect of Noise on the Current-Voltage Characteristics of a Josephson Junction, *J. Appl. Phys.*, 42, 325–334.
- [107] Harnisch, T., Kunert, J., Toepfer, H., Uhlmann, H. F., (1997). Design centering methods for yield optimization of cryoelectronic circuits, *IEEE Trans. Appl. Supercond.*, 7, 3434–3437.
- [108] **Tukel, Y., Bozbey, A., Tunc, C. A.,** (2013). Development of an Optimization Tool for RSFQ Digital Cell Library Using Particle Swarm, *IEEE Trans. Appl. Supercond.*, 23, 1700805–1700805.

- [109] **Tukel, Y., Bozbey, A., Tunc, C. A.**, (2013). Optimization of Single Flux Quantum Circuit Based Comparators Using PSO, *J. Supercond. Nov. Magn.*, 26, 1837-1841.
- [110] **Fourie, C. J., Perold, W. J.,** (2003). Comparison of genetic algorithms to other optimization techniques for raising circuit yield in superconducting digital circuits, *IEEE Trans. Appl. Supercond.*, 13, 511–514.
- [111] **Holmes, D. S., Ripple, A. L., Manheimer, M. A.,** (2013). Energy-Efficient Superconducting Computing for 2014; Power Budgets and Requirements, *IEEE Trans. Appl. Supercond.*, 23, 1701610–1701610.
- [112] **Filippov, T. V., et al.,** (2012). 20GHz Operation of an Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit, *Phys. Procedia*, 36, 59–65.
- [113] **Tanaka, M., Sato, R., Hatanaka, Y., Fujimaki, A.,** (2016). High-Density Shift-Register-Based Rapid Single-Flux-Quantum Memory System for Bit-Serial Microprocessors, *IEEE Trans. Appl. Supercond.*, 26, 1–5.
- [114] Ozer, M., Eren Çelik, M., Tukel, Y., Bozbey, A., (2014). Design of RSFQ wave pipelined Kogge-Stone Adder and developing custom compound gates, *Cryogenics*, 63, 174–179.
- [115] Kadin, A. M., Webber, R. J., Sarwana, S., (2005). Effects of superconducting return currents on RSFQ circuit performance, *Appl. Supercond. IEEE Trans. On*, 15, 280–283.
- [116] **Dorojevets, M., Ayala, C. L., Yoshikawa, N., Fujimaki, A.**, (2013). 8-Bit Asynchronous Sparse-Tree Superconductor RSFQ Arithmetic-Logic Unit With a Rich Set of Operations, *IEEE Trans. Appl. Supercond.*, 23, 1700104–1700104.
- [117] Filippov, T., Dorojevets, M., Sahu, A., Kirichenko, A., Ayala, C., Mukhanov, O., (2011). 8-Bit Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit, *IEEE Trans. Appl. Supercond.*, 21, 847–851.

# **RESUME**

Name : Sasan Razmkhah

**Nationality** : Iranian

Date and place of birth : 10.05.1987- Tabriz

E-mail : srazmkhah@etu.edu.tr

**ACADEMIC RECORDS:** 

• **B.Sc.** : Electrical Engineering, Sharif University of Technology, Tehran,

Sep. 2009.

• M.Sc. : Device and Nano Electronics, Sharif University of Technology,

Tehran, Dec. 2011.

• **Ph.D.** : Electronics and Electrical Engineering, TOBB University,

Ankara.

**EDUCATIONAL HONORS:** 

2015 TOBB ETU (Turkey) Special success scholarship.

2009 National Elites Foundation (Iran) Special talents scholarship.

2005 Entrance exam of Iranian universities Ranked 4th

2005 Entrance exam of Azad University(Iran) Ranked 8th

2004 Iran Bronze medal of Physics Olympiad

### LANGUAGE PROFICIENCY:

Farsi: Native
Turkish: Fluent
Azeri: Native
English: Fluent

### PUBLICATIONS, PRESENTATIONS AND PATENTS RELEVANT TO THESIS:

• **Razmkhah**, **S.**, Bozbey, A., 2017. Novel cryogenic packaging for integrated circuits, Patent number: GE-461088.

- **Razmkhah, S.**, Bozbey, A., 2016. Design of passive transmission lines for different stripline widths and impedances, *IEEE Trans. on App. Superconductivity*, 26, 1-6.
- **Razmkhah, S.**, Bozbey, A., 2013. Automatic characterization and measurement of custom designed RSFQ chips, Proceedings of: ASCAS2013, Asian Conference on Applied Superconductivity and Cryogenics, Capadokia, Turkey.
- Razmkhah, S., Bozbey, A., 2014. Fully Automated SFQ Chip Measurement Setup for Evaluation of Operating Condition and Bias Margins, Proceedings of: SSV2014-YS, Superconducting SFQ VLSI Workshop, Nagoya, Japan.
- Razmkhah, S., Bozbey, A.,2014. Representation of an 8-bit, 20GHz Pipelined RSFQ ALU as a Coprocessor, Proceedings of: ICSM2014, International Conference on Superconductivity and Magnetism, Turkey, Antalya.
- **Razmkhah, S.**, Bozbey, A., 2015. Design of narrow passive transmission lines, driver and receiver cells for SFQ circuits, Proceedings of: ISEC2015, International Superconductive Electronics Conference, Nagoya, Japan.
- Razmkhah, S., Bozbey, A., 2016. Demonstration of a Different Single Flux Quantum Logic and Arithmetic Circuits for Incorporating in Custom Designed Parallel Pipe-line Microprocessor, Proceedings of: ICSM2016, International Conference on Superconductivity and Magnetism, Fethiye, Turkey.
- Razmkhah, S., Bozbey, A., 2016. Design of a Stackable Single-Flux-Quantum Microprocessor with Bit-slicing Architecture in Close Cycle Cryocooler Environment, Proceedings of: ICSM2016, International Conference on Superconductivity and Magnetism, Fethiye, Turkey.
- Eren Çelik, M., Özer, M., **Razmkhah, S.**, Bozbey, A., 2016. Design and demonstration of large scale RSFQ logic by applying STATS tool for timing variants analysis, Proceedings of: ICSM2016, International Conference on Superconductivity and Magnetism, Fethiye, Turkey.
- Razmkhah, S., Bozbey, A., 2016. Demonstration of a Different Single Flux Quantum Logic and Arithmetic Sub-Circuits for Custom Designed Parallel Pipe-line Microprocessor in Cryocooler Environment, Proceedings of: SSV2016, Superconducting SFQ VLSI Workshop, Yokohama, Japan.

#### OTHER PUBLICATIONS, PRESENTATIONS AND PATENTS:

- Karamuftuoglu, M. A., **Razmkhah, S.**, Bozbey, A., 2018. Neuron circuit, Patent number: pending.
- Aydoğan, E. C., **Razmkhah, S.**, Bozbey, A., 2018. Superconductor addressing and readout circuit, Patent number: pending.
- Kokabi, A., Khoshaman, A., **Razmkhah, S.**, Hoseini, M., 2009. Response Analysis of Free-membrane Transition-Edge Detectors with thin substrate, *Journal of physics*, 153.
- Sarreshtedari, F., **Razmkhah, S.**, Alavi, M. H., Fardmanesh, M., 2010. RF SQUID electronic readout system for the frequency range of 650MHz to 1GHz with automatic parameter setting. IEEE Xplore ICEE, 442–445.
- Sarreshtedari, F., Hosseini, M., **Razmkhah, S.**, Mehrany, K., Kokabi, H., Schubert, J., Banzet, M., Krause, H., Fardmanesh, M., 2011. Analytical Model for the

- Extraction of Flaw-Induced Current Interactions for SQUID NDE, *IEEE Trans. on App. Superconductivity*, 21, 3442 3446.
- Razmkhah, S., Eshraghi, M. J., Forooghi, F., Sarreshtedari, F., Fardmanesh, M., 2011. Fundamental mode fluxgate magnetometers for active magnetic shielding, IEEE Xplore ICEE, 1–4.
- Sarreshtedari, F., Razmkhah, S., Hosseini, N., Schubert, J., Banzet, M., Fardmanesh, M., 2011. An efficient SQUID NDE defect detection algorithm by using an adaptive finite element modeling, *Journal of superconductivity and novel magnetic*, 24, 1077-1081.
- Kokabi, A., Khoshaman, A., Razmkhah, S., Hoseini, M., Fardmanesh, M., 2008. Response Analysis of Free-membrane Transition-Edge Detectors with thin substrate, Proceedings of: ICSM2008, International Conference on Superconductivity and Magnetism, Istanbul, Turkey.
- Sarreshtedari, F., **Razmkhah, S.**, Alavi, M. H., Fardmanesh, M., 2010. RF SQUID electronic readout system for the frequency range of 650MHz to 1GHz with automatic parameter setting, Proceedings of: ICEE, Iranian Conference on Electrical Engineering, 442-445, Tehran, Iran.
- Sarreshtedari, F., Razmkhah, S., Hosseini, N., Mehrany, K., Kokabi, H., Schubert, J., Banzet, M., Krause, H., Fardmanesh, M., 2010. Model based inverse solution of SQUID NDE using a new numerical model for the extraction of flaw-induced current interactions, Proceedings of: ASC2010, Applied Superconductivity Conference, Washington, USA.
- Sarreshtedari, F., Razmkhah, S., Hosseini, N., Schubert, J., Banzet, M., Fardmanesh, M., 2010. An efficient SQUID NDE defect detection algorithm by using an adaptive finite element modeling", Proceedings of: ICSM2010, International Conference on Superconductivity and Magnetism, Antalya, Turkey.
- Razmkhah, S., Eshraghi, M. J., Sarreshtedari, F., Forooghi, F., Fardmanesh, M., 2011. Fundamental mode fluxgate magnetometer for active magnetic shielding, Proceedings of: ICEE, Iranian Conference on Electrical Engineering, Tehran, Iran.
- Sarreshtedari, F., **Razmkhah, S.**, Eshraghi, M.J., Mehrani, K., Schubert, J., Banzet, M., Fardmanesh, M., 2011. A novel Eddy current SQUID NDE optimization method for identification of unknown hidden defects, Proceedings of: EUCAS2011, European Conference on Applied Superconductivity, Den Haag, Netherlands.
- Sarreshtedari, F., Razmkhah, S., Eshraghi, M. J., Forooghi, F., Fardmanesh, M., 2012. Development of Two Stage Local Active Shield for High Tc SQUID Based MCG, Proceedings of: ICSM2012, International Conference on Superconductivity and Magnetism, Istanbul, Turkey.
- Bozbey, A., Balaban, D., Razmkhah, S., Febvre, P., Celik, C., Gaffet, S., Di Borgo, E., 2015. Magnetic field noise investigation for site selection of a SQUID-based Earth magnetic field recording station, Proceedings of: EUCAS2015, European Conference on Applied Superconductivity, Lyon, France.
- Bozbey, A., **Razmkhah, S.**, Fujimaki, A., 2015. Implementation of a closed cycle refrigerator test system for superconducting stripline detectors, Proceedings of: EUCAS2015, European Conference on Applied Superconductivity, Lyon, France.