Computing Large DFTs Using Small FFTs

Rick Lyons●June 23, 2008●21 comments

Tips and Tricks

It is possible to compute N-point discrete Fourier transforms (DFTs) using radix-2 fast Fourier transforms (FFTs) whose sizes are less than N. For example, let's say the largest size FFT software routine you have available is a 1024-point FFT. With the following trick you can combine the results of multiple 1024-point FFTs to compute DFTs whose sizes are greater than 1024.

The simplest form of this idea is computing an N-point DFT using two N/2-point FFT operations. Here's how the trick works for computing a 16-point DFT, of a 16-sample x(n) input sequence, using two 8-point FFTs; we execute the following steps:

Perform an 8-point FFT on the x(n) samples where n = 0, 2, 4, ..., 14. We'll call those FFT results X₀(k).
Store two copies of X₀(k) in Memory Array 1 as shown in Figure 1.
Next we compute an 8-point FFT on the x(n) samples where n = 1, 3, 5, ..., 15. We call those FFT results X₁(k).
Store two copies of X₁(k) in Memory Array 3 in Figure 1.
In Memory Array 2 we have stored 16 samples of one cycle of the complex exponential e^-j2^π^m^/N, where N = 16, and 0 ≤ m ≤ 15.
Finally we compute our desired 16-point X(m) samples by performing the arithmetic shown in Figure 1 on the horizontal rows of the memory arrays. That is,

X(0) = X₀(0) + e^-j2^π^0/16·X₁(0),

X(1) = X₀(1) + e^-j2^π^1/16·X₁(1),

. . .

X(15) = X₀(7) + e^-j2^π^15/16·X₁(7).

The final desired X(m) DFT results are stored in Memory Array 4.

Figure 1: 16-point DFT using two 8-point FFTs.

We describe the above process, algebraically, as

X(k) = X₀(k) + e^-j2^π^k^/16 · X₁(k) (1)

and

X(k + 8) = X₀(k) + e^-j2^π^(k+8)/16 · X₁(k) (1')

for k in the range 0 ≤ k ≤ 7.

Notice that we did nothing to reduce the size of Memory Array 2 due to redundancies in the complex exponential sequence e^-j2^π^m^/N. As it turns out, for an N-point DFT, only N/4 complex values need be stored in Memory Array 2. The reason for this is because

(2)

which involves a simple sign change on e^-j2^π^m^/N. In addition,

(2')

which is merely swapping the real and imaginary parts of e^-j2^π^m^/N plus a sign change of the resulting imaginary part. So Eqs. (2) and (2') tell us that only the values e^-j2^π^m^/N for 0 ≤ m ≤ N/4-1 need be stored in Memory Array 2. With that reduced storage idea aside, to be clear regarding exactly what computations are needed for our 'multiple-FFTs' technique we leave Memory Array 2 unchanged from what is in Figure 1.

The neat part of this 'multiple-FFTs' scheme is that our DFT length, N, is not restricted to be an integer power of two. We can use computationally efficient radix-2 FFTs to compute DFTs whose lengths are any integer multiple of an integer power of two. For example, we can compute an N = 24-point DFT using three 8-point FFTs. To do so, we perform an 8-point FFT on the x(n) samples, where n = 0, 3, 6, ..., 21, to obtain X₀(k). Next we compute an 8-point FFT on the x(n) samples, where n = 1, 4, 7, ..., 22 to yield X₁(k). And then we perform an 8-point FFT on the x(n) samples, where n = 2, 5, 8, ..., 23, to obtain an X₂(k) sequence. Finally we compute our desired 24-point DFT results using

X(k) = X₀(k) + e^-j2^π^k^/24 · X₁(k) + e^-j2^π^(2k)/24 · X₂(k) (3)

X(k + 8) = X₀(k) + e^-j2^π^(k+8)/24 · X₁(k) + e^-j2^π^[2(k+8)]/24 · X₂(k) (3')

and

X(k + 16) = X₀(k) + e^-j2^π^(k+16)/24 · X₁(k) + e^-j2^π^[2(k+16)]/24 · X₂(k). (3'')

for k in the range 0 ≤ k ≤ 7. The memory-array depiction of this process is shown in Figure 2, with our final 24-point DFT results residing in Memory Array 6. Memory Array 2 contains N = 24 samples of one cycle of the complex exponential e^-j2^π^m^/24, where 0 ≤ m ≤ 23. Memory Array 4 contains 24 samples of two cycles of the complex exponential e^-j2^π^(2m)/24.

Figure 2 24-point DFT using three 8-point FFTs.

To conclude this blog, we mention that the larger the size of the FFTs the more computationally efficient is this 'multiple-FFTs' spectrum analysis technique. This behavior is illustrated in Figure 3 where we show the number of complex multiplies required by the 'multiple-FFTs' algorithm versus the desired DFT size (N). The top bold curve is the number of complex multiplies required by the standard (inefficient) DFT algorithm, and the bottom dashed curve is the number of complex multiplies required by a single N-point radix-2 FFT. The curves in the center of the figure show the number of complex multiplies required by the 'multiple-FFTs' algorithm when various FFT sizes (P) are used to compute an N-point DFT.

Figure 3 Number of complex multiplies versus N.

You might also like... (promoted content)

Register EARLY and Save - DSP for Wireless Communications

Comments

Comments
Write a Comment

Select to add a comment

[ - ]

Comment by DonGateley●March 9, 2014

For some reason, no matter how often or where the question is asked, no one ever posts the full solution to this problem, so here it is:

/////////////////////////////////////////////////////////////////////////////////////////////////////
function Y = fftx(X)
% Y = fftx(X)
%Extended fft by combining smaller fft's recursively.
%URL: http://www.mathworks.com/company/newsletters/articles/faster-finite-fourier-transforms-matlab.html
%Modified to end recursion at max fft size by Don Gateley

%Maximum allowed fft size (use whatever is appropriate for you.)
max = 512;

X = X(:);
n = length(X);

if(n <= max)
%Down to chunks and end of recursion. Use native fft() for chunks or less.
Y = fft(X);
else
% Recursive divide and conquer.
w = exp(2 * 1i * pi * (0: n / 2 - 1) / n)';
u = fftx(X(1: 2: n-1));
v = fftx(X(2: 2: n)) .* w;
Y = [u + v; u - v];
end
end
/////////////////////////////////////////////////////////////////////////////////////////////////////

And the inverse:

/////////////////////////////////////////////////////////////////////////////////////////////////////
function T = ifftx(X)
%ifftx() using fftx()
n = size(X, 1);
T = fftx(imag(X) + 1i * real(X));
T= imag(T) + 1i * real(T);
T = T / n;
end
/////////////////////////////////////////////////////////////////////////////////////////////////////

[ - ]

Comment by RamasamyJasmin●April 21, 2020

Hi,

Thanks for this article. It is really helpful for my application for computing larger FFT using split & combine.

Again, i need to perform inverse FFT with splitting the FFT data? Please let me know how to do this?

-Ram

[ - ]

Comment by Rick Lyons●April 21, 2020

Hello Ram.

If you need to compute inverse fast Fourier transforms (inverse FFTs) but you only have forward FFT software available to you, please see the following web page:

https://www.dsprelated.com/showarticle/800.php

[ - ]

Comment by RamasamyJasmin●April 22, 2020

Hi Rick Lyons,

Thanks for your reply. I will check the link for IFFT.

Actually, In my DSP application due to some constraints, I would like to perform IFFT(Computing small IFFT from Large one) similar to FFT(computing Large FFT using small FFT size).Kindly suggest how to handle this problem.

Regards,

Ram

[ - ]

Comment by kaushik l●June 28, 2008

nice blog but I didn't quite understand why an FFT of larger size makes the evaluation of DFT more computationaly efficient. Isnt a large FFT, say 16-point FFT obtained in turn from multiple 4-point FFTs?

[ - ]

Comment by Rick Lyons●June 28, 2008

Hi, Perhaps I didn't make my point very well. What the last part of the blog is trying to say is that if you want to perform a 24-pt DFT then it's better to use three 8-pt FFTs rather than use six 4-pt FFTs. [-Rick-]

[ - ]

Comment by franchouze●August 26, 2008

Hi, Is this approach still valid for 2D DFT by: -replacing 1D arrays by 2D arrays and taking one sample over n in each dimension (n is the number of FFTs used) -replacing the memory array 2 of figure 1 by the product of itself with its transpose to obtain a square array ? FranÃ§ois

[ - ]

Comment by Rick Lyons●August 26, 2008

Hello Francois, I'm sorry to say, I've no experience in 2-dimensional (2D) signal processing. But if your 2D processing involves performing 1D FFTs, then the above approach should work. Signal processing software modeling should answer your questions. Good Luck, [-Rick-]

[ - ]

Comment by dnevins●October 23, 2008

I'm not clear on how this works. It seems like a decimation in time variant but I don't see some of the parts. For instance, the first example shows that for X(0) you have the same result as a radix-2 butterfly computation. However, for the X(8) case you are missing the minus sign (I think). The larger cases have even more issues. Could you give me a hint as to what I'm missing?

[ - ]

Comment by Rick Lyons●October 24, 2008

Hello dnevins, Whew! You didn't give me enough information so that I can help. You wrote: "I'm not clear on how this works. It seems like a decimation in time variant but I don't see some of the parts." I'm not sure what you have in mind with the above sentence. "For instance, the first example shows that for X(0) you have the same result as a radix-2 butterfly computation." No, X(0) in Figure 1 is not the result of a single butterfly operation. X(0) is the first sample of an 8-point FFT. "However, for the X(8) case you are missing the minus sign (I think)." I don't know of any sign errors regarding sample X(0). I wonder,... where do you think there's a missing minus sign? "The larger cases have even more issues." I have no clue what the word "issues" means. "Could you give me a hint as to what I'm missing?" Given no additional information from you, there's little I can do. Dnevins, I suggest you model the 'multiple FFTs' algorithm with your favorite sig proc software. If you use MATLAB, then please check out the following code that I prepared for you. (It models the processing in Figure 1). Good Luck, [-Rick-] --------- clear, clc N = 16; % Number of points in FFT n = 0:N-1; Fs = 8000; Ts = 1/Fs; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Define the "Input" sequence %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Input = sin(2*pi*1000*n*Ts)+ 0.5*sin(2*pi*2000*n*Ts + 3*pi/4); figure(1) plot(Input,':bo') title('FFT Input sequence') grid on SpecTrue = fft(Input); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Unzip "Input" sequence to obtain decimated input sequences %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Input_0 = Input(1:2:end); % Starts with the x(0) sample Input_1 = Input(2:2:end); % Starts with the x(1) sample Spec_of_0 = fft(Input_0); Spec_of_1 = fft(Input_1); % Implement the algorithm Spec(1) = Spec_of_0(1) + exp(-j*2*pi*0/N)*Spec_of_1(1); Spec(2) = Spec_of_0(2) + exp(-j*2*pi*1/N)*Spec_of_1(2); Spec(3) = Spec_of_0(3) + exp(-j*2*pi*2/N)*Spec_of_1(3); Spec(4) = Spec_of_0(4) + exp(-j*2*pi*3/N)*Spec_of_1(4); Spec(5) = Spec_of_0(5) + exp(-j*2*pi*4/N)*Spec_of_1(5); Spec(6) = Spec_of_0(6) + exp(-j*2*pi*5/N)*Spec_of_1(6); Spec(7) = Spec_of_0(7) + exp(-j*2*pi*6/N)*Spec_of_1(7); Spec(8) = Spec_of_0(8) + exp(-j*2*pi*7/N)*Spec_of_1(8); Spec(9) = Spec_of_0(1) + exp(-j*2*pi*8/N)*Spec_of_1(1); Spec(10) = Spec_of_0(2) + exp(-j*2*pi*9/N)*Spec_of_1(2); Spec(11) = Spec_of_0(3) + exp(-j*2*pi*10/N)*Spec_of_1(3); Spec(12) = Spec_of_0(4) + exp(-j*2*pi*11/N)*Spec_of_1(4); Spec(13) = Spec_of_0(5) + exp(-j*2*pi*12/N)*Spec_of_1(5); Spec(14) = Spec_of_0(6) + exp(-j*2*pi*13/N)*Spec_of_1(6); Spec(15) = Spec_of_0(7) + exp(-j*2*pi*14/N)*Spec_of_1(7); Spec(16) = Spec_of_0(8) + exp(-j*2*pi*15/N)*Spec_of_1(8); % Plot results figure(2) subplot(2,1,1) plot(abs(SpecTrue),':bo', 'markersize', 4) hold on plot(abs(Spec),'rs', 'markersize', 10) hold off ylabel('Spec Mag'), grid on title('True spec = blue circles, Algo result = red squares') grid on subplot(2,1,2) plot(abs(SpecTrue -Spec),':bo') ylabel('Algo Error'), grid on

[ - ]

Comment by dnevins●October 24, 2008

Thank you very much for your rapid and excellent response! I will examine the code you gave me thoroughly. I'm sorry I wasn't clear earlier and I'll try again. For the first example the equation for X(0) is X(0) = X_0(0) + e-j2Ï0/16 * X1(0). This, as far as I can tell, looks just like a radix-2 butterfly for X(0). However, looking at X(8) = X_0(0) + e-j2Ï8/16 * X1(0). This, to me, looks like a radix-2 butterfly for X(8) but without the minus sign in front of the exponential term. Am I mistaken in my view of the trick? Lastly, I want to thank you for such an excellent trick. This will allow me to expand the use of the highly optimized FFT code supplied by many DSP manufacturers! BTW, your code runs under Octave as well as MATLAB.

[ - ]

Comment by Rick Lyons●October 25, 2008

Hi dnevins, Ah ha. Now I see what you mean. For the case of Figure 1, the lower half of Memory Array 2 can be replaced with the negative of the upper half of Memory Array 2. Because: exp(-j2p8/16) = -exp(-j2p0/16) exp(-j2p9/16) = -exp(-j2p1/16) exp(-j2p10/16) = -exp(-j2p2/16) ... exp(-j2p15/16) = -exp(-j2p7/16) I mentioned this notion in my Eq. (2). So, as far as I can tell dnevins, your interpretation of the Figure 1 process is correct. Thanks for telling me that this 'multiple FFTs' process is of use to you. I didn't invent (or discover) this process, and for the life of me I can't remember where I learned of this technique. Shame on me for not being able to give credit where credit is due. Regards, [-Rick-]

[ - ]

Comment by mahesh663●January 4, 2009

good very useful for me.

[ - ]

Comment by d_sundararajan●December 7, 2009

The advantage claimed is that non power-of-two length DFT can be computed using power-of-two length DFT, which is correct. It is to be noted that DFT of sequences of length, such as 24, 48, 96, etc., can be computed by PM radix-2 DFT algorithms [1] much more efficiently. It is also pointed out that the larger the length of the power-of-two DFT, the more efficient is the computation, which is also correct. This implies that the most efficient DFT algorithms are of sequence length that is a power-of-two, a well-known fact. Therefore, except in situations where a non power-of-two length DFT is absolutely essential, a power-of-two length DFT is the natural choice. As most signals to be analyzed are aperiodic, they can be zero-padded and the power-of-two length constraint is not difficult to comply in practice. [1] Sundararajan, D. (2001) Discrete Fourier Transform, Theory, Algorithms, and Applications, pp. 138-142, World Scientific, Singapore.

[ - ]

Comment by gdal●February 16, 2012

Very useful! I was wondering if a non-interleaved input version is also possible? thanks

[ - ]

Comment by Rick Lyons●February 19, 2012

Hello gdal, Assuming I understand your question, then my answer is no. I don't recall a way to compute large DFTs, using small FFTs, without processing interleaved time-domain samples. [-Rick-]

[ - ]

Comment by ben1512●March 16, 2014

Hi,
I'd like to ask if you know any way of combining two sequential FFTs into a larger one efficiently,
and how is it possible to do so.
Thanks,
Ben

[ - ]

Comment by MimSaad●May 30, 2017

As always great!

[ - ]

Comment by cjskipper●August 7, 2020

Thanks Rick, this is really useful.

Do you know if the theory behind this article has been published in a peer-reviewed journal or book? I would like to reference it in a scientific paper.

Thanks

[ - ]

Comment by Rick Lyons●August 7, 2020

Hi cjskipper. I don't remember how I learned, many years ago, the topic of this blog. I just now flipped through my collection of FFT-related journal papers. But, sadly, I didn't find any scientific paper discussing the computation of larger-sized FFTs using multiple smaller-sized FFTs. Sorry I couldn't be of more help to you.

If you discuss the topic of this blog in a journal paper, I believe it's perfectly reasonable to reference this engineering blog post of mine as:

[x] R.Lyons, "Computing Large DFTs Using Small FFTs", dsprelated.com, June 23, 2008. Available online: https://www.dsprelated.com/showarticle/63.php

[ - ]

Comment by cjskipper●August 7, 2020

Thank you Rick, I shall do as you suggest.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers:

Choose a Username

E-Mail (Work, School or ieee)

First Name

Last Name

Employer

Job Title

Country

State

Password

Confirm Password

By checking this box, I agree with the terms of use and privacy policy By checking this box, I consent to receive occasional emails from the *Related sites and their partners. I understand that these emails will only contain relevant information and that I can unsubscribe at any time.

Computing Large DFTs Using Small FFTs

Sign in

You might also like...

About Rick Lyons

Books by Rick Lyons

Popular Posts by Rick Lyons

Blogs - Hall of Fame

Free PDF Downloads

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group