A Direct Digital Frequency Synthesizer With Fourth-Order Phase Domain $\Delta \Sigma$ Noise Shaper and 12-bit Current-Steering DAC

Fa Foster Dai, Senior Member, IEEE, Weining Ni, Shi Yin, Member, IEEE, and Richard C. Jaeger, Fellow, IEEE

Abstract—This paper presents a direct digital frequency synthesizer (DDFS) with a 16-bit accumulator, a fourth-order phase domain single-stage $\Delta \Sigma$ interpolator, and a 300-MS/s 12-bit current-steering DAC based on the $Q^2$ Random Walk switching scheme. The $\Delta \Sigma$ interpolator is used to reduce the phase truncation error and the ROM size. The implemented fourth-order single-stage $\Delta \Sigma$ noise shaper reduces the effective phase bits by four and reduces the ROM size by 16 times. The DDFS prototype is fabricated in a 0.35-$\mu$m CMOS technology with active area of 1.11 mm$^2$ including a 12-bit DAC. The measured DDFS spurious-free dynamic range (SFDR) is greater than 78 dB using a reduced ROM with 8-bit phase, 12-bit amplitude resolution and a size of 0.09 mm$^2$. The total power consumption of the DDFS is 200 mW with a 3.3-V power supply.

Index Terms—CMOS integrated circuits, data conversion, delta-sigma modulation, digital-to-analog conversion, direct digital synthesizer, frequency synthesizers, integrated circuit design, sigma-delta modulation.

I. INTRODUCTION

DIRECT digital frequency synthesis (DDFS) is an important frequency technique that provides low cost synthesis with ultra fine resolution. Modern communication systems are placing increasing demands on frequency resolution, channel switching speed and bandwidth requirements of frequency synthesis. For instance, spread-spectrum applications require frequency synthesizers that are capable of tuning to different output frequencies with extremely fine frequency resolution and switching speed of the order of nanoseconds. The resolution and switching speed requirements of many systems are surpassing the performance capabilities of conventional analog phase-locked loops (PLLs). DDFS is a digital technique for frequency synthesis, waveform generation, sensor excitation, digital modulation and demodulation. Since there is no feedback in a DDFS structure, it is capable of extremely fast frequency switching or hopping at the speed of the clock frequency. DDFS provides many other advantages including fine frequency-tuning resolution, and continuous-phase switching. In addition, a DDFS can provide various modulations and generate arbitrary waveforms in the digital domain. The increasing availability of ultra high-speed digital-to-analog converters (DACs) allows a DDS to operate at clock frequencies of more than 10 GHz.

A conventional DDFS includes a digital accumulator and a look-up table to convert the phase word to a sinusoidal amplitude word. While a pure sinusoidal waveform is desired at the DDFS output, spurious tones can also occur due to the following processes. 1) The DDFS can achieve fine resolution using a large accumulator. However, this accumulator requires a huge look-up with $2^N$ addresses, where $N$ is the accumulator size. The look-up table normally takes the majority of the DDFS area. In order to reduce the look-up table ROM size, the phase word is normally truncated prior to the ROM input, which introduces quantization noise. 2) The ROM word width is normally limited by the finite number of bits in the DAC. In practical designs, the number of phase bits should be larger than the number of DAC bits to ensure that the DDFS output noise is dominated by the finite number of DAC bits.

To reduce the quantization noise associated with the phase word truncation, $\Delta \Sigma$ techniques have been applied to different locations in a DDFS [1], [2]. A complicated second-order $\Delta \Sigma$ modulator with LSB dithering is implemented in the frequency domain in [1]. The design forms only a numerically controlled oscillator (NCO) without a DAC, which produces a theoretical spurious free dynamic range (SFDR) of 110 dB. A simple delay unit is used in [2] to generate the first-order $\Delta \Sigma$ shaping effect. Insufficient noise shaping is shown at its output spectrum. The DDFS in [1] did not implement the DAC, and the work in [2] was not verified in hardware. None of $\Delta \Sigma$ DDFS techniques reported so far includes a high-order $\Delta \Sigma$ interpolator and thus has only a limited noise-shaping effect. High-order $\Delta \Sigma$ is needed to achieve a large SFDR, fine resolution and a small look-up table. This paper presents a novel DDFS architecture using a fourth-order phase domain single-stage $\Delta \Sigma$ interpolator to remove phase truncation error. A 300-MS/s 12-bit current-steering DAC with the $Q^2$ Random Walk switching scheme is also implemented on the chip.

In Section II, we will model the quantization noise and spurs associated with the DDFS. In Section III, we will present a single-stage $\Delta \Sigma$ interpolator for noise shaping in phase domain. We will also discuss the implementation of the proposed generic single-stage $\Delta \Sigma$ interpolator with any order in Section IV. The
design of the 12-bit current-steering DAC will be discussed in Section V, and the measured results of the DDFS implemented in a 0.35-μm CMOS technology will be presented in Section VI.

II. SPURS AND QUANTIZATION NOISE IN DDFS

A. Prior Art DDFS Architecture and Its Spectral Purity

A basic DDFS system as shown in Fig. 1 consists of an NCO to generate the sampled signal followed by a DAC used to convert the digital waveform to an analog signal. Since the DAC output is sampled at the reference clock frequency, a de-glitch low-pass filter (LPF) is typically used to smooth the waveform. The NCO uses an N-bit accumulator to generate a phase ramp based on the N-bit input frequency control word (FCW). A read-only memory (ROM) stores the amplitude information of the desired waveform. With the phase word as the address, the ROM’s output is the amplitude word of the synthesized waveform. The FCW is continuously added to the last sampled phase value by an N-bit adder. When the accumulator reaches the N-bit maximum value, the accumulator rolls over and continues. The rollover rate of the accumulator is hence the DDFS output frequency, namely, \( f_o = f_{\text{clk}}(\text{FCW}/2^N) \), where \( f_{\text{clk}} \) is the DDFS clock frequency.

Since the FCW can be stepped by unity, the resolution of the DDFS is given by \( f_{\text{clk}}/2^N \), and a DDFS can achieve a very fine resolution if the accumulator size \( N \) is large. In order to reduce the ROM size while keeping a fine step size, only the \( P \) most significant bits of the phase word are used to address the ROM. This truncation at the accumulator output causes a quantization error that will be discussed later. The ROM size is equal to \( 2^P \cdot D \), where \( D \) is the number of amplitude bits and equals the number of DAC input bits. While increasing the number of phase bits is feasible, increasing the number of DAC input bits is costly due to largely increased die size and power consumption.

Therefore, the goal of DDFS design is to minimize the phase truncation error such that the DDFS output noise is dominated by the DAC quantization noise. DDFS is a superior digital frequency synthesis and modulation technique due to its features such as fine frequency tuning resolution using large accumulator size, fast frequency switching at the speed of clock frequency, accurate quadrature output due to digitally controlled sine and cosine waveform generations, direct modulation capability in frequency, phase and amplitude domains, and compatibility with digital CMOS processing. However, a conventional DDFS suffers quantization noise and spurious tones. The DDFS spectral purity is dependent upon a number of factors including the clock phase noise, the number of phase bits applied to the sine lookup function, the number of bits in the lookup table, the DAC errors including nonlinearities and quantization noise, and the de-glitch filter noise, as shown in Fig. 1.

A DDFS has four principal noise and spurious sources: the reference clock (\( E_{\text{CLK}} \)), phase truncation (\( E_P \)), amplitude truncation (\( E_A \)), and DAC nonlinearities (\( E_{\text{DAC}} \)). For an NCO without the DAC, the quantization noise and spur are mainly caused by two nonlinear operations: 1) the truncation of the phase accumulator output bits in order to reduce the ROM size, and 2) the finite precision of the sinusoidal magnitudes stored in the ROM using a finite number of bits. In order to reduce the ROM size, various ROM compression algorithms have been developed. ROM compression also causes amplitude error and its effect can be modeled as an additive error \( E_{\text{COM}} \) as shown in Fig. 1. For DDFS, the nonlinearity and additional noise due to the DAC (\( E_{\text{DAC}} \)) and the de-glitch filter (\( E_{\text{LPF}} \)) further degrade the output spectrum. Next, we will analyze the quantization noise and spur due to phase truncation error (\( E_P \)).

B. Spur Due to Discrete Phase Accumulation

For an \( N \)-bit phase accumulator, the desired output period is given by \( T_o = (2^N/\text{FCW}) \cdot T_{\text{clk}} \). In addition, there is another periodicity in the discrete phase accumulation process, namely, \( T_{\text{spur}} = (2^N/\text{GCD}(\text{FCW}, 2^N)) \cdot T_{\text{clk}} \), where \( \text{GCD}(a,b) \) denotes the greatest-common-divisor of \( a \) and \( b \). The accumulator repeats its value at the intervals of \( T_{\text{spur}} \), which generates spurious tones in the frequency domain. Note that the spur period is integer multiple of the desired output period, since the FCW must be an integer multiple of \( \text{GCD}(\text{FCW}, 2^N) \). In general, the spurs of an \( N \)-bit phase accumulation are equally spaced and located at frequencies of

\[
 f_{\text{spur}} = F_n \cdot \frac{\text{GCD}(\text{FCW}, 2^N)}{2^N} \cdot f_{\text{clk}} 
\]  

(1)
where \( F_n \) is an integer number used to sequentially number the spurs. When the input frequency word is a power of two, that is, \( FCW = 2^i \), there will be no spurs due to phase accumulation. In this case, \( \text{GCD}(FCW, 2^N) = FCW \), the accumulator repeats at the same value after every overflow. As a result, the spurs overlap with the harmonics of the fundamental tone at \( f_{\text{spur1}} = (FCW \cdot f_{\text{clk}})/2^N = f_0 \).

C. Spurs and Quantization Noise Due to Phase Truncation

The phase truncation process introduces quantization noise, which can be modeled as a linear additive noise to the phase of the sinusoidal wave. At time step \( n \), the \( N \)-bit phase word at the output of the \( N \)-bit accumulator is updated as [1]

\[
\Phi[n + 1] = (\Phi[n] + FCW) \mod 2^N
\]

(2)

where \( \Phi[n] \) represents the phase at time step \( n \), and \( \lambda \mod B \) represents the integer residue of \( \Phi \) modulo \( B \). For example, \( 26 \mod 16 = 10 \). To reduce the ROM size, only the \( P \) most significant bits (MSB) of the accumulator output are used to address the look-up table. Truncating the \( N \)-bit phase word into \( P \)-bits causes a truncation error \( E_P \) expressed as

\[
E_P[n + 1] = (E_P[n] + R) \mod 2^{N-P}
\]

(3)

where \( R \) is the least significant \( (N-P) \)-bits of the FCW value. Hence, the output amplitude of the NCO can be expressed as

\[
S[n] = \sin \left( \frac{2\pi (\Phi[n] - E_P[n])}{2^N} \right)
\approx \sin \left( \frac{2\pi \Phi[n]}{2^N} \right) - \frac{2\pi E_P[n]}{2^N} \cdot \cos \left( \frac{2\pi \Phi[n]}{2^N} \right)
\]

(4)

where \( S[n] \) is the amplitude at time step \( n \) and small truncation error is assumed. The first term gives the desired sinusoidal output and the second term is the error introduced by phase truncation. As shown, the phase truncation error gives an amplitude-modulated term on the quadrature output of \( \cos(2\pi \Phi[n]/2^N) \). The phase error sequence represented by the truncated \( N-P \) bits satisfies the condition that \( |E_P[n]| < 2^{N-P} \). The phase truncation causes errors only when the greatest-common-divisor \( \text{GCD}(FCW, 2^N) < 2^{N-P} \). Otherwise, the \( N-P \) least significant bits (LSBs) of the phase word vanish and the phase truncation does not cause any error.

Phase truncation error is periodic. It is evident that \( E_P[n] \) can be modeled by an \( (N-P) \)-bit small accumulator with \( R \) as its input. Hence, the period of the error sequence \( E_P[n] \) is given by \( 2^{N-P}/R \). Therefore, the phase truncation spurs are mixed with the DDFS output frequency generating spurs at offset frequencies of

\[
f_{\text{spur}} = F_n \cdot \frac{\text{GCD}(FCW, 2^{N-P})}{2^{N-P}} \cdot f_{\text{clk}}.
\]

(5)

Note that the spurs due to the \( N-P \) bit accumulator form a subset of the spurs due to the \( N \)-bit accumulator. The number of spectral lines in the Nyquist band due to the truncated accumulator with length of \( N-P \) bits is given by [3]

\[
\Lambda = \frac{2^{N-P-1}}{\text{GCD}(FCW, 2^{N-P})}.
\]

(6)

The above simple analysis gives all the potential spur locations due to phase truncation. A more complete analysis of spur locations and magnitudes involves a Fourier transformation [3], [4]. The worst-case spur magnitude normalized to a signal magnitude of unity was found as [3]

\[
\xi_{\text{worst spur}} = \frac{1}{2^P \sin(\pi \text{GCD}(FCW, 2^{N-P})/2^{N-P})}.
\]

(7)

Note that the spurs due to phase truncation are nonexistent when the least significant \( (N-P) \)-bits of the FCW value are zeros, that is, \( \text{GCD}(FCW, 2^{N-P}) = 2^{N-P} \). However, for \( \text{GCD}(FCW, 2^{N-P}) < 2^{N-P} \), the magnitude of the worst spurs is a decaying function of \( 2^{N-P}/\text{GCD}(FCW, 2^{N-P}) \) with the maximum value of \((\pi/2 \cdot 2^{P}-3.92)\) dB when \( \text{GCD}(FCW, 2^{N-P}) = 2^{N-P-1} \), which means there is only one spur (\( \Lambda = 1 \)). If the number of spurs is large, i.e., \( \text{GCD}(FCW, 2^{N-P}) \ll 2^{N-P} \), the worst spur magnitude asymptotically approaches the lower bound of \( -60.2 \cdot P+3.92 \) dB in. In summary, the worst-case spur magnitude due to phase truncation can be estimated by

\[
\begin{align*}
\xi_{\text{max}} &= \frac{\pi}{2} \cdot 2^{P} - 60.2 \cdot P + 3.92, & \text{if } \Lambda = 1 \\
\xi_{\text{min}} &= -60.2 \cdot P + 3.92, & \text{if } \Lambda = \infty \\
\xi &= 0, & \text{if } \text{GCD}(FCW, 2^{N-P}) = 2^{N-P}.
\end{align*}
\]

(8)

Summing all the spur energy gives the total noise power. The noise-to-signal ratio (NSR) due to phase truncation can be found as [5]

\[
\text{NSR} = \left( \frac{\sin(\pi \text{GCD}(FCW, 2^{N-P})/2^{N-P})}{\sin(\pi/2)} \right)^2 - 1.
\]

(9)

The above equation shows that for a fixed number of phase bits \( P \), the NSR is an increasing function of the number of spectral lines \( \Lambda \), which results in higher noise level. For a fixed number of spectral lines \( \Lambda \), the NSR is a decreasing function of the number of phase bits, \( P \). Therefore, a larger number of phase bits leads to a lower noise level.

Therefore, the upper and lower bounds of the phase truncation noise can be found by assuming an infinite number of spurs (\( \Lambda = \infty \), i.e., \( \text{GCD}(FCW, 2^{N-P}) \ll 2^{N-P} \)) and one spur (\( \Lambda = 1 \), i.e., \( \text{GCD}(FCW, 2^{N-P}) = 2^{N-P-1} \)) as follows:

\[
\begin{align*}
\text{NSR}_{\text{max}} &\approx \frac{1}{4} \left( \frac{\pi}{2} \right)^2 = -60.2 \cdot P + 5.17, & \text{if } \Lambda = \infty \\
\text{NSR}_{\text{min}} &\approx \frac{1}{4} \left( \frac{1}{2} \right)^2 = -60.2 \cdot P + 3.92, & \text{if } \Lambda = 1 \\
\text{NSR} &= 0, & \text{if } \text{GCD}(FCW, 2^{N-P}) = 2^{N-P}.
\end{align*}
\]

(10)

where a large number of phase bits \( P \) is assumed. Comparing to the spur magnitude, we conclude that: 1) if there is only one spur, the spur power reaches the maximum and the resultant
NSR is minimum, and 2) however, if there are an infinite numbers of spurs, the spur power is minimum and the resultant NSR is maximum.

D. Quantization Noise Due to Finite Phase and Amplitude Resolutions

As shown, the discrete phase accumulation process with finite phase bits generates spurious tones that are worsened by the phase truncation. Unlike phase truncation, the effect of finite amplitude word length generates random quantization noise. The integrated carrier-to-quantization noise ratio due to finite word length, $D$, is given by $2/3 \cdot 2^{-2D} = (-6,12 \ast D - 1,78)$ in units of dB. Considering the quantization noise due to both the finite phase bits and the finite amplitude bits, the worst-case spur magnitude at the DDFS output is given by

$$\xi_{\text{max}}[\text{dBc}] = 10 \log \left( \frac{\pi^2}{4} \cdot 2^{-2P} + \frac{2}{3} \cdot 2^{-2D} \right). \quad (11)$$

Note that the phase truncation error causes peak output spurs that are generally 3.92 dB + 1.78 dB = 5.7 dB above the quantization noise floor due to the finite amplitude bit effect. Therefore, we need to choose $P = D + 1$ for DDFS designs such that both the phase spurs and the amplitude noise floor reach about the same quantization noise level.

III. DDFS Architecture Using $\Delta\Sigma$ Noise Shaping to Remove Phase Truncation Error

It was shown that the phase truncation process associated with the conventional DDFS architecture introduces quantization error and spurs. To avoid aliasing during data conversion, the synthesized frequency is required to be smaller than one-half of the DDFS clock frequency. Thus, oversampling is always encountered in DDFS, allowing noise-shaping techniques to be used to shift the phase quantization error to a higher frequency band where the noise can be eventually removed by the de-glitch filter after the DAC.

This work proposes novel $\Delta\Sigma$ modulators that can be used to reduce spurs. $\Delta\Sigma$ modulation can be implemented in both the frequency and phase domains in a DDFS. The frequency domain $\Delta\Sigma$ modulation gains advantages of increased dynamic range due to constant input and reduced accumulator size from frequency control word truncation in the frequency domain. However, the truncated frequency error will be accumulated in the phase domain, which is the drawback of the frequency domain $\Sigma\Delta$ noise shaping.

The proposed phase domain noise-shaping technique can be used to either increase the DDFS resolution for high performance applications or to reduce the ROM size for low cost applications. Using the $\Delta\Sigma$ interpolator to remove the phase truncation error, we can build a larger accumulator (e.g., $n > 32$ bits) to achieve finer resolution with low quantization noise. Alternatively, without degrading the output spectral purity, we can truncate even more phase bits to address a much smaller ROM size.

A. DDFS Architecture Using Phase Domain $\Delta\Sigma$ Noise Shapers

Various $\Delta\Sigma$ topologies can be used to reduce the phase truncation errors in the proposed DDFS architecture. In Fig. 2(a) we

---

**Fig. 2.** Proposed DDFS architecture with a $k$th order phase domain $\Delta\Sigma$ noise shaper to reduce phase truncation error. (a) DDFS with phase domain feedback $\Delta\Sigma$ noise shaper (implemented in this work). (b) DDFS with phase domain feedforward $\Delta\Sigma$ noise shaper.
illustrate the presented DDFS architecture using a phase domain feedback $\Delta \Sigma$ modulator. The detailed implementation of the feedback $\Delta \Sigma$ modulator will be presented in Section IV. We also illustrate an alternate DDFS architecture using a feedforward $\Delta \Sigma$ modulator, while a maximum $N$ bits are allowed at the output of the feedforward $\Delta \Sigma$ modulator, when a maximum $P \in [1, N]$ bits are allowed at the output of the feedback $\Delta \Sigma$ modulator. Therefore, a feedback $\Delta \Sigma$ modulator is more suitable for a large number of discarded bits $N + 1 - P$.

Analyzing the feedback $\Delta \Sigma$ modulator shown in Fig. 2(a), we obtain the noise-shaped phase word as

$$
\Phi[n] = \Phi[n] + \frac{1}{2} \left( 1 - (1 - \frac{1}{2}) k \right) - E_P[n] 
$$

(12)

where $\Phi[n]$ is the original phase word before noise shaping and $E_P[n]$ is the truncated phase error. It can be seen that the phase error $E_P[n]$ is high-pass filtered by the $\Delta \Sigma$ interpolator before the phase-to-amplitude conversion via the look-up-table. For both phase domain feedback and feedforward $\Delta \Sigma$ noise shapers with the noise transfer function of $1 - (1 - \frac{1}{2}) k$, the resulting NCO output can be expressed as

$$
S[n] = \sin \left( 2\pi \frac{\Phi[n] - E_P[n] \cdot (1 - \frac{1}{2}) k}{2^N} \right) 
$$

(13)

Further manipulation of the above expression yields

$$
\text{SNR} = 6.02 N + 1.76 + 3(2k+1) \log_2 \text{OSR} - 10 \log_{10} \left( \frac{\pi^{\frac{2k}{2k+1}}}{\frac{\pi}{2k+1}} \right). 
$$

(18)

The lowered quantization noise due to oversampling and noise shaping leads to an increased effective number of quantizer bits

$$
N_{\text{effective}} \approx N + \frac{2k+1}{2} \log_2 \text{OSR} - 1.66 \cdot \log_{10} \left( \frac{\pi^{\frac{2k}{2k+1}}}{\frac{\pi}{2k+1}} \right). 
$$

(19)

For an oversampled system with a $k$th order $\Delta \Sigma$ modulator, the in-band rms quantization noise power is given by [7]

$$
P_N = \frac{\Delta^2}{12} \frac{\pi^{2k}}{2k+1} \left( \frac{1}{\text{OSR}} \right)^{2k+1} 
$$

(16)

where the oversampling ratio (OSR) is assumed larger than 2. Thus, for a $k$th-order sigma-delta modulator, the in-band rms noise falls $3(2k+1)$ dB for every doubling of the oversampling ratio, resulting in $k + 0.5$ extra bits of resolution. It is evident that adding $\Delta \Sigma$ feedback strengthens the oversampling effect. On the other hand, the $\Delta \Sigma$ modulator would not greatly benefit the noise reduction if the OSR were not sufficiently high, and the $\Delta \Sigma$ noise shaping would increase the noise when the OSR was less than 2.

The performance improvement due to oversampling and $\Delta \Sigma$ noise shaping can be characterized by the number of effective bits, which leads to the same SNR over a fixed bandwidth. The SNR with oversampling and noise shaping can be expressed in terms of the signal power and the noise power given above:

$$
\text{SNR} = 10 \log \left( \frac{P_S}{P_N} \right) 
$$

$$
\approx 10 \log \left( \frac{3}{2} 2^N \text{OSR}^{2k+1} \left( \frac{\pi^{2k}}{2k+1} \right)^{-1} \right). 
$$

(17)

B. ROM Size Reduction Using $\Delta \Sigma$ Noise Shaping

Without degrading the output SNR, $\Delta \Sigma$ noise shaping can also be used to reduce the ROM size, which often takes the majority of the DDFS area. In an $N$-bit sampled system, if the quantizer has $2^N$ quantization levels equally spaced by $\Delta$, then the maximum peak-to-peak amplitude is given by

$$
u_{\text{max}} = (2^N - 1)\Delta. 
$$

(14)

If the signal is sinusoidal, its normalized power can be calculated as

$$
P_S = \frac{1}{8} (2^N - 1)^2 \Delta^2. 
$$

(15)
modulator can have a single-bit or multi-bit output depending on the desired quantization noise floor. The multi-loop architecture normally has a single-bit output per accumulator and the order of the $\Delta\Sigma$ interpolator is equal to the number of accumulators. For DDFS applications, the number of phase bits at the output of the $\Delta\Sigma$ interpolator is determined by the affordable ROM size and the system SFDR requirement. It is obviously inconvenient to use multi-loop $\Delta\Sigma$ topology in DDFS. We propose a generic single-loop high-order $\Delta\Sigma$ architecture, which can provide multi-bit output for various phase truncation requirements. In this section, we present the implementation of a generic high-order pipelined single-stage $\Delta\Sigma$ modulator used in the proposed DDFS.

Conceptually, if we can insert a block with transfer function $H(z) = 1 - H_m(z)$ in an accumulator as shown in Fig. 3, the accumulator output becomes

$$Y(z) = X(z) + A(z)H(z) - A(z) = X(z) - E(z)H_m(z)$$  \hspace{1cm} (21)

where $Y(z)$ is taken from $p$-bits of the MSB of the adder output $B(z)$, and $A(z)$ is obtained from the rest of $(n+1-p)$ bits of the MSB of the adder output. It is evident that the input signal $X(z)$ is not affected by the modulator, while the quantization noise $E(z)$, which is a truncated word $A(z)$, is filtered by the noise transfer function (NTF) $H_m(z)$. If the NTF $H_m(z)$ is the high-pass transfer function $(1-z^{-1})^k$, the feedback transfer function $H(z) = 1 - (1-z^{-1})^k$, and the single-stage modulator is equivalent to a multi-stage noise shape (MASH) modulator with $Y(z) = X(z) - E(z)(1-z^{-1})^k$.

If input frequency word $X(z)$ has $n$ bits, $B(z)$ should have $(n+1)$ bits to protect the carry-out, and $A(z)H(z)$ cannot exceed $n$ bits. The modulator output $Y(z)$ can be of any number of bits, offering flexibility in choosing number of output bits. However, the maximum number of bits for $A(z)H(z)$ should be carefully calculated to prevent overflow of the adder.

The above conceptual single-stage $\Delta\Sigma$ modulator was first introduced for a fractional-$N$ frequency synthesis in [11] and was chosen for the implementation of the phase domain $\Delta\Sigma$ modulator for DDFS applications in this work. For $k = 2$, $H(z) = 2z^{-1} - z^{-2}$, which is a simple implementation. For $k = 3$, $H(z) = z^{-1}(3 - 3z^{-1} + z^{-2})$. Implementation of the third-order single-stage $\Delta\Sigma$ modulator is given in Fig. 4. Multiplication by 3 is implemented using a left shift operation (x2) followed by an addition, namely, $3z^{-1} = 2z^{-1} + z^{-1}$. If $n + 3 - p < n$, sign extension is performed by extending the MSB of the $(n+3-p)$-bit word to obtain an $n$-bit word. As we can see, a condition of $n + 3 - p < n$ must be avoided to prevent the first adder with $(n+1)$-bit output from losing the overflow bits. Hence, the minimum number of output bits of the third-order modulator is 3. In other words, the number of output bits for the given single-stage $\Delta\Sigma$ accumulator should be equal to or larger than the order of the modulator $k$.

The implementation of fourth-order [11] and fifth-order single-stage $\Delta\Sigma$ modulators are illustrated in Figs. 5 and 6, respectively. The fourth-order single-stage $\Delta\Sigma$ modulator uses a transfer function $H(z) = 4z^{-1} - 6z^{-2} + 4z^{-3} - z^{-4}$, while the fifth-order single-stage $\Delta\Sigma$ modulator has a transfer function of $H(z) = 5z^{-1} - 10z^{-2} + 10z^{-3} - 5z^{-4} + z^{-5}$. To speed up the circuits, these transfer functions are implemented in a pipelined manner. To avoid using multipliers, which can be area and speed bottlenecks, the transfer function $H(z)$ is manipulated such that only shifting operations are involved.

The fourth-order single-stage $\Delta\Sigma$ modulator shown in Fig. 5 was the architecture implemented in this work. The speed of the single-stage $\Delta\Sigma$ modulator topology is limited by the delay times associated with the additions to calculate the transfer function $H(z)$. The higher the order of the $\Delta\Sigma$ modulator, the longer the delay.

In order to prove the concept, we simulated the single-stage $\Delta\Sigma$ noise shaper in MATLAB. The conventional DDFS is also simulated as a comparison. Fig. 7(a) plots the spectrum after the phase truncation in a conventional DDFS, while Fig. 7(b) shows the spectrum for the proposed DDFS architecture using a fourth-order $\Delta\Sigma$ noise shaper. Fig. 7(b) clearly demonstrates the high-pass noise-shaping effect of the fourth-order $\Delta\Sigma$ interpolator with 80 dB/dec slope. In the simulation, the number of accumulator bits $n = 18$, the phase bits $P = 12$, and the oversampling rate of the DDFS is 300.

V. IMPLEMENTATION OF THE 12-BIT CURRENT-STEERING DAC

Current-steering DACs are based on an array of matched current cells organized in unary encoded or binary weighted elements that are steered to the DAC output depending on the digital input code. A segmented architecture is used to combine high conversion rate and high resolution as shown in Fig. 8. In this architecture, the LSBs steer binary weighted current sources, while the MSBs are thermometer encoded and steer a unary current source array.
A. DAC Architecture

Consider an \( N \)-bit current-steering segmented DAC with a unit current source \( I \): the \( N_1 \) MSBs control \( 2^{N_1} - 1 \) equal current sources with value \( 2^{N_1} I \), and the \( N_2 \) LSBS control \( N_2 \) binary weighted current sources with value of multiple of \( I \). A simple estimate for the integral nonlinearity (INL) is found by adding the variances of \( 2^{N} - 1 \) uncorrelated current sources \([12]\). A one-sigma confidence value for the INL is given by

\[
\text{INL} \approx \sqrt{2^{N_1} \left( \frac{\sigma}{I} \right)^2} \text{LSB}
\]

(22)

where \( \sigma/I \) is the unit current source relative standard deviation. Note that the INL is independent of the segmentation used and is only a function of the required accuracy. The worst differential nonlinearity (DNL) is defined in the transition from the binary weighted LSBS to the unity decoded MSBS. A one-sigma confidence value for the DNL is given by

\[
\text{DNL} \approx \sqrt{2^{N_2} \left( \frac{\sigma}{I} \right)^2} \text{LSB}.
\]

(23)

The INL related yield specification imposed a maximum constraint on the allowed mismatch of the unit current source. This constraint results in a minimum channel area dimension for the transistor and is given by \([13]\)

\[
WL = \frac{I^2}{2\sigma^2} \left[ A_\beta^2 + \frac{4A_{VT}^2}{(V_{GS} - V_T)^2} \right]
\]

(24)

where \( A_{\beta} \) and \( A_{VT} \) are mismatch technology parameters, and \( (V_{GS} - V_T) \) is the gate overdrive voltage of the current source transistor. To achieve good DNL and INL specifications, the number of bits implemented in the binary weighted part of the DAC has to be small. For every extra bit implemented in the
unity decoded part, however, the number of control lines needed to select the current sources doubles, and the decoding logic complexity increases significantly. Equally important is the fact that the area used by the decoding inside the matrix increases, and consequently the process and electrical systematic errors become more difficult to compensate. A direct consequence is often a reduction in the maximum operating speed. In addition, the area occupied by interconnections inside the decoding circuit quickly increases.

B. DAC Static Performance

In the unit decoded matrix, it is difficult to make current sources identical due to issues such as layout mismatches, insufficient output impedance of the current source and switch, edge effects, thermal gradients, doping gradients, oxide thickness variations, and variations of the supply and reference voltages. The proposed 12-bit segmented current DAC employs the quad quadrantal \( Q^2 \) switching scheme [14] to minimize degradation of integral linearity caused by mismatches in the current sources. The switching sequence of the unit current cells in the matrix for 8 MSBs is illustrated in Fig. 9. The 256 current sources are divided into 16 centro-symmetric regions, and the current sources in every region are divided into 16 centro-symmetric regions. Since the 16 current sources in every region do not have exactly the same residue, there is a remaining small second-order residue. By “random walking” through the 256 current sources, the residual error is not accumulated but randomized, the so-called \( Q^2 \) Random Walk switching scheme. By segmentation, only 255 current sources are required for the DAC function in which one current source is used as the biasing circuit.

C. DAC Dynamic Performance

The dynamic performance of the current-steering DAC is limited by three factors: 1) voltage fluctuation at the output nodes of the current sources due to improper switch timing; 2) digital signal feed-through through the gate-drain capacitance from the current switches to the output; 3) imperfect synchronization of the control signals of the switching transistors. Fig. 10 illustrates the unit current cell of the current-steering DAC where the parasitic capacitance \( C_P \) is indicated. The unit current cell consists of pMOS switching transistors \( M_{SL} \) and \( M_{S2} \), and the pMOS current source transistor \( M_{C} \). Use of pMOS devices can decrease high-frequency noise generated by the common substrate of an N-well process. During switching, the discharge and charge of parasitic capacitance \( C_P \) leads to deterioration of the dynamic performance of the DAC. In order to prevent the two switching transistors from being simultaneously in the off-state for a short period of time, the time of the switching transistors in the on-state must be shortened. Therefore, the cross-point of the control signals needs to be carefully adjusted. Note that \( I_C = I_1 + I_2 + I_{CP} \). If the two pMOS switching transistors are in the on-state, their currents can be approximately expressed as

\[
\begin{align*}
I_1 &= K(V_{GS1} - V_T)^2 \\
I_2 &= K(V_{GS2} - V_T)^2
\end{align*}
\]

where \( K = (\mu C_{ox}/2)(W/L) \) is the device transconductance parameter. When the switching control signals are at the cross-over point, \( V_{GS1} = V_{GS2} = V_{GS(CP)} \). Thus, we obtain

\[V_{GS(CP)} = V_T + \sqrt{K/C} \]

When one pMOS switching transistor is in the on-state and another is in the off-state, we have

\[V_{GS(ON)} = V_T + \frac{I}{K} \]

To minimize the feed-through to the output, the drain of the switching transistors is further isolated from the outputs by adding the cascode transistors shown in Fig. 10. In order to minimize the skew between the row and column select signals, the DAC employs extra digital latches right before the unit current cells to synchronize the digital inputs.
VI. MEASURED RESULTS

The proposed DDFS with a fourth-order phase domain $\Delta\Sigma$ modulator was implemented in 0.35-$\mu$m CMOS technology with two poly and four metal layers [18]. A 16-bit accumulator is designed and its 8 MSBs are used for addressing the look-up ROM. The 12-bit current-steering DAC is integrated to convert the NCO output to an analog signal. For 12-bit amplitude resolution in a conventional DDFS without $\Delta\Sigma$ modulator, at least 12 phase bits should be used, which requires a look-up ROM with $2^{12} \times 12$ bits. As discussed in Section III, the use of a fourth-order $\Delta\Sigma$ noise shaper effectively reduces the required number of phase bits by 4. Thus, we use only 8 phase bits to address the ROM, which reduces the ROM size by a factor of $2^4$ or 16 times compared to that of a conventional DDFS without a $\Delta\Sigma$ modulator. Notice that our ROM size is reduced merely due to the $\Delta\Sigma$ noise-shaping effect. A ROM size reduction can be further achieved using ROM compression algorithms.

The proposed pipelined single-stage fourth-order $\Delta\Sigma$ interpolator is illustrated in Fig. 5. The architecture implements the noise-shaping high-pass transfer function of $1 - (1 - \tilde{Z}^{-1})^4$ for the truncated $(n + 1 - p)$ LSB bits, which is labeled with “A” in the figures. The output phase word of the $P$ MSB bits is used as the look-up table address. To avoid using multipliers, we manipulate the function such that only shifting operations are involved. For instance, left shift by 2 bits achieves multiplying by 4 and left shift by 1 bit achieves multiplying by 2. The pipelined fourth-order $\Delta\Sigma$ interpolator has 16 input bits and 8 truncated phase bits.

The digital portion of the DDFS, including the accumulator, the sine ROM and the $\Delta\Sigma$ noise shaper, is designed using digital synthesis, while the DAC design followed the analog design flow. During layout of the DAC current source matrix, Cadence Skill language was used to help the sorting and routing of the unit current sources, which greatly improves the design efficiency. The digital encoder was placed on the top of the chip, far away and well shielded from the sensitive analog parts. Different power supplies have been used for different parts of the DDFS circuit to reduce noise coupling through the power supply to the sensitive analog blocks. Isolation rings using substrate contacts are also placed around sensitive blocks. Finally, in very few exceptions where digital signals cross sensitive analog lines, a cleanly biased metal line is used as a shield. The DDFS clock is distributed through a clock tree network, to ensure low skew. The clock buffers which drive the digital encoder and the analog latches in the switch array have been added to guarantee synchronization. The clock tree is routed on the top level metal layer that has lowest resistance.

The die photo of the fabricated CMOS DDFS prototype chip is shown in Fig. 11. The total die area is $2.2 \times 2$ mm$^2$, in which the DDFS active core area is 1.11 mm$^2$ including the DAC, and the rest of the die area is used for pads and ESD diodes. The 16-bit phase accumulator and the fourth-order $\Delta\Sigma$ modulator occupy $0.3 \times 0.2$ mm$^2$ die area. The $2^8 \times 12$-bit ROM occupies only $0.3 \times 0.3$ mm$^2$, which would be 16 times larger without the $\Delta\Sigma$ noise shaper. In a conventional DDFS, the ROM normally takes the majority of the die area, whereas the ROM takes only a small portion of the total area in this DDFS implementation, which clearly demonstrates the advantage of using high-order $\Delta\Sigma$ noise shaping in DDFS designs.

Table I compares the active area and power consumption of a few published CMOS DDFS to this work. Employing the proposed phase domain fourth-order $\Delta\Sigma$ interpolator, we achieved 12 effective phase bits by using a small ROM with size 0.09 mm$^2$. Note that the NCO implemented in [1] does not include a DAC. Considering that a 12-bit DAC was included in this work, the total active area of the implemented DDFS is relatively small compared to the prior art CMOS DDFS.

![Image](https://example.com/image1)

**Fig. 10.** Cascode current switching cell.

![Image](https://example.com/image2)

**Fig. 11.** Die photo of the DDFS prototype.

**TABLE I**

<table>
<thead>
<tr>
<th>Technology</th>
<th>[1]</th>
<th>[15]</th>
<th>[16]</th>
<th>[17]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>ROM phase</td>
<td>16 bits</td>
<td>12 bits</td>
<td>19 bits</td>
<td>9 bits</td>
<td>8 bits</td>
</tr>
<tr>
<td>ROM</td>
<td>14 bits</td>
<td>10 bits</td>
<td>16 bits</td>
<td>8 bits</td>
<td>12 bits</td>
</tr>
<tr>
<td>Active area [mm$^2$]</td>
<td>0.12</td>
<td>3.9</td>
<td>12</td>
<td>0.9</td>
<td>1.1</td>
</tr>
<tr>
<td>Power [W]</td>
<td>0.1</td>
<td>0.6</td>
<td>1.4</td>
<td>0.01</td>
<td>0.2</td>
</tr>
</tbody>
</table>
The 12-bit current-steering DAC occupies 0.6×1.6 mm² and consumes 82-mW power. The total power consumption of the DDFS chip is 200 mW from a 3.3-V power supply. The DAC uses an external pull-down resistor of 50Ω to achieve a maximum single-ended analog output swing of 1 V with 20-mA output current. The clock frequency for the DAC was measured up to 300 MHz limited by the test equipment. Fig. 12 shows the measured DAC output spectrum with an 8-MHz output signal at the sampling rate of 300 MHz. The measured SFDR is about 60 dB, representing about 10 effective bits at the maximum clock frequency.

Fig. 13 shows the measured output spectrum after the DAC and before the de-glitch filter. The fourth-order δΣ interpolator has a 60-dB/dec high-pass noise-shaping slope without the DAC. The noise-shaping slope after the DAC becomes 40 dB/dec for a fourth-order δΣ interpolator due to the additional pole added by the DAC. Due to the DACs sample-and-hold effect, attenuation occurs at frequencies larger than 0.5f_clk, which causes the two sides of the noise-shaping curve to be unsymmetrical. The δΣ noise shaper moves close-in phase truncation spurious components to a higher frequency band where they can be easily removed by the LPF.

Fig. 14 gives the measured DDFS output spectra after a fifth-order Butterworth LPF with corner frequency at 5 kHz. The synthesized frequency is at f_CWS (f_CWS = 15 × 20 MHz = 457.8 kHz) with the clock frequency of 20 MHz. For this test, the oversampling ratio is 2184. When the fourth-order δΣ noise shaper is turned off, the measured SFDR is about 58 dBC as shown in Fig. 14(a). With the fourth-order δΣ noise shaper, the DDFS provides clean output with SFDR more than 78 dBC as given in Fig. 14(b). It demonstrates about 20-dB SFDR improvement using the proposed δΣ noise-shaping scheme for DDFS applications.

**VII. CONCLUSION**

This paper presents a CMOS direct digital frequency synthesizer with a 16-bit accumulator, a fourth-order phase domain single-stage δΣ noise shaper, and a 300-MS/s 12-bit current-steering DAC using a Q² Random Walk switching scheme. The proposed single-stage δΣ modulator can be implemented in both feedback and feedforward topologies for any orders. The
use of $\Delta \Sigma$ noise shaping can effectively reduce the phase truncation error and the ROM size while maintaining large SFDR and fine resolution.

ACKNOWLEDGMENT

The authors would like to thank X. Geng for his contribution to the DAC design.

REFERENCES


Weining Ni received the B.S. and M.S. degrees in electrical engineering and control engineering from the China Petroleum University, Shandong, China, in 2000 and 2003, respectively. He is currently working toward the Ph.D. degree in microelectronics at the Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China. His research interests include high-speed analog and mixed-signal VLSI circuit design.

Richard C. Jaeger (F’86) was born in New York, NY, on September 2, 1944. He received the B.S. and M.E. degrees in electrical engineering in 1966 and the Ph.D. degree in 1969, all from the University of Florida, Gainesville.

From 1969 to 1979, he was with the IBM Corporation working on precision analog design, PL, microprocessor architecture and low temperature MOS device and circuit behavior. He holds three patents and received two Invention Achievement Awards from the IBM Corporation. In 1979, he joined Auburn University, Auburn, AL, where he is Distinguished University Professor in Electrical and Computer Engineering and served as Director of the Alabama Microelectronics Science and Technology Center from 1984 through 2000. During 2001–2004, he led implementation of Auburn University’s new Bachelor of Wireless Engineering degree program, an interdisciplinary effort of the ECE and CSSE departments. He has published over 200 technical papers.
and articles, and authored or co-authored *Introduction to Microelectronic Fabrication* (2E), *Microelectronic Circuit Design* (2E), and *Computerized Circuit Design Using SPICE Programs*.

Dr. Jaeger received the IEEE Education Society Jacob Millman/McGraw-Hill Award for outstanding textbook development in 1998 and received the IEEE Undergraduate Teaching Award in 2004. He was a member of the IEEE Solid-State Circuits Council from 1984-1991, serving the last two years as Council President. He is a past Editor of the IEEE *Journal of Solid-State Circuits*, a member of the Solid-State Circuits Society AdCom, and is now President of the Society. He was Program Chairman for the 1993 International Solid-State Circuits Conference, and Chairman of the 1990 VLSI Circuits Symposium. From 1980 to 1982 he served as founding Editor-in-Chief of *IEEE Micro*, and subsequently received an Outstanding Contribution Award from the IEEE Computer Society for development of that magazine. He was selected as one of the IEEE Computer Society’s “Golden Core” and received an IEEE Third Millennium Medal. He was elected Fellow of the IEEE in 1986 and was appointed to the Distinguished University Professorship by Auburn University in 1990. He received the Birdsong Merit Teaching Award from the College of Engineering in 1991. In 1993, he was chosen as the Outstanding EE Faculty Member by the undergraduate students. In 1995, he was selected as the Distinguished Graduate Faculty Lecturer. He is a member of Sigma Xi, Phi Kappa Phi, Tau Beta Pi, Sigma Tau, a Licensed Professional Engineer, and was first listed in *Who’s Who in America* in 1990.