A Query about "Improvements to the Sliding DFT Algorithm"

Hi all!
My first post here! Long time lurker first time poster.
This one is kinda specifically directed at @RickLyons but happy for anyone to answer.
I am reading: https://www.researchgate.net/publication/352874961... and for specifically the topology of Figures 4(a) and 5(a) I have implemented them in code:
template < typename CF > struct SlidingNUDFT2 { // @param: freq frequency in hertz // @param: N DFT size in samples // @param: T inverse of sample rate __forceinline void setup(float freq, float N, float inv_N, float T) { const float k = freq * N * T; const float twopik = 2.f * PI * k; const float twopikinvN = twopik * inv_N; coeff1 = exp(-1if * twopik); // exp(-i2πk) coeff2 = exp(1if * twopikinvN); // exp(i2πk/N) cosCoeff = 2.f * cosf(twopikinvN); // 2cos(2πk/N) } __forceinline CF step(float xn, float xn_minus_N) { #if 0 // Figure 5(a) // need to contact authors, as this doesn't seem to work correctly.... // stage 1 - complex comb 1 const CF comb = (xn * coeff1) - xn_minus_N; // stage 2 - real resonator const CF rr = comb + (rrprev * cosCoeff) - rrprev2; // stage 3 - complex comb 2 const CF comb2 = (rr * coeff2) - rrprev; // state update rrprev2 = rrprev; rrprev = rr; return comb2; #else // Figure 4(a) // stage 1 - complex comb 1 const CF comb = (xn * coeff1) - xn_minus_N; // stage 2 - complex comb 2 const CF rr = (comb + rrprev) * coeff2; // state update rrprev = rr; return rr; #endif } float cosCoeff; CF rrprev = 0.f; CF rrprev2 = 0.f; CF coeff1, coeff2; };
And the Blackman-Harris filtered version:
template < typename CF > struct SlidingNUDFT2_Filtered { __forceinline void setup(float freq, float N, float inv_N, float T) { f[0].setup(freq, N, inv_N, T); f[1].setup(freq - 1.f / (N * T), N, inv_N, T); f[2].setup(freq + 1.f / (N * T), N, inv_N, T); f[3].setup(freq - 2.f / (N * T), N, inv_N, T); f[4].setup(freq + 2.f / (N * T), N, inv_N, T); f[5].setup(freq - 3.f / (N * T), N, inv_N, T); f[6].setup(freq + 3.f / (N * T), N, inv_N, T); } __forceinline CF step(float xn, float xn_minus_N) { return (1.f / 0.35875f) * (0.35875f * f[0].step(xn, xn_minus_N) - 0.5f * 0.48829f * (f[1].step(xn, xn_minus_N) + f[2].step(xn, xn_minus_N)) + 0.5f * 0.14128f * (f[3].step(xn, xn_minus_N) + f[4].step(xn, xn_minus_N)) - 0.5f * 0.01168f * (f[5].step(xn, xn_minus_N) + f[6].step(xn, xn_minus_N))); } SlidingNUDFT2<CF> f[7]; };
Where CF = std::complex.
You can see there is an issue with the result however, with the "marginally stable" version of 4(a) looking like this:
and the "guaranteed stable" version of 5(a) looking like this:
Note that the 5(a) version has artifacts on the bottom right that seem like a stability issue.
Note that in my code listing above for 5(a) I have swapped the order of the second complex filter and the real resonator, but that isn't an issue since all the operations are linear and hence order independent, and I got exactly the same issue in my results using the exact same flow order as the paper.
Any suggestions about what might be going wrong here are welcome.
Thank you,
Luke

Hello,
This looks very interesting but I can't be the only one who is a bit lost.
What is it you are doing here?
I don't know how to interpret these images.
Mark Napier

Hi Mark,
I am using the SlidingDFT to perform a log-space spectrogram of a short drum loop.
The image is a log-space 20Hz-20kHz up the vertical axis (which is precisely why the paper was so attractive to me, since the k value doesn't have to be an integer, you can do log-space graphs, which for audio is super useful), and time along the x axis.
The luminance of the greyscale image is the amplitude in dB from -120dB is black to 0dB is white.
I hope that helps.

Hello Luke.
I'm not able to say whether or not my "Non-integer k Sliding DFT" (Figure 5) is applicable to your application. With that being said, you can test your Figure 5 code by applying an impulse input (a single one-valued sample followed by more than N zero-valued samples) to your Figure 5 code. The correct impulse response is a complex sinusoid whose length is N samples.
The DFT of the correct impulse response (after padding it with lots of zeros) should have a sin(x)/x spectral magnitude with a peak value of N.
I'm not able to interpret your code but I must say your definition of variable 'k' troubles me. The variable k should be a dimensionless number between zero and integer N. It looks to me like your k is a value of time measured in seconds, which is not correct. I may be wrong. Luke, that's about all I can say to you at this point.
Good Luck.
[-Rick-]

Hi @RickLyons thanks so much for responding!
Here is what I get from an impulse input when I do a 64 sample SlidingDFT for a range of different k values:
Which looks right to me. Though I note with your "guaranteed stable" version, after the N samples, there is a bit of content left over, i.e.
61, 0.301741600037, -0.953390121460 62, 0.806765913963, -0.590871930122 63, 1.000000357628, -0.000000238419 64, 0.000000417233, 0.000000000000 65, 0.000000298023, 0.000000208616 66, 0.000000149012, 0.000000357628 67, -0.000000096858, 0.000000357628 68, -0.000000268221, 0.000000238419 69, -0.000000357628, 0.000000029802 70, -0.000000298023, -0.000000178814 71, -0.000000119209, -0.000000357628 72, 0.000000115484, -0.000000417233 73, 0.000000327826, -0.000000238419 74, 0.000000417233, 0.000000000000 75, 0.000000357628, 0.000000283122 76, 0.000000119209, 0.000000476837 77, -0.000000207685, 0.000000476837 78, -0.000000447035, 0.000000238419 79, -0.000000596046, -0.000000089407 80, -0.000000417233, -0.000000387430 81, -0.000000119209, -0.000000536442 82, 0.000000202679, -0.000000536442 83, 0.000000476837, -0.000000298023 84, 0.000000536442, 0.000000044703 85, 0.000000417233, 0.000000372529 86, 0.000000119209, 0.000000596046 87, -0.000000257045, 0.000000596046 88, -0.000000596046, 0.000000357628 89, -0.000000655651, -0.000000059605 90, -0.000000476837, -0.000000447035 91, -0.000000089407, -0.000000655651 92, 0.000000329688, -0.000000596046 93, 0.000000625849, -0.000000298023 94, 0.000000655651, 0.000000104308 95, 0.000000417233, 0.000000461936 96, 0.000000089407, 0.000000596046 97, -0.000000253320, 0.000000536442 98, -0.000000506639, 0.000000298023 99, -0.000000596046, -0.000000074506 100, -0.000000417233, -0.000000417233
This happens for both the "marginally stable" and "guaranteed stable" versions. Though I guess this is to be expected. Float isn't perfect precision.
As for the value of k, hopefully this interactive spreadsheet will clear up my use-case:
https://www.desmos.com/calculator/u2upssovok
This shows the absolute value of the transfer function plotted for equation 7 of your paper.
This shows that k, varying between 0 and N varies the position of a complex bandpass at a peak frequency of 2*pi*k/N. Therefore I am effectively setting k as the frequency at which to position the bandpass filter.
This allows me to arbitrarily position the SlidingDFT frequencies wherever I want them to be, thus allowing log-space SlidingDFT as a variant to the linear-space SlidingDFT typically used.
Is that not a correct interpretation? It certainly seems to be given the magnitude response.
I therefore basically interpret your paper as a re-interpretation of the SlidingDFT as a fractionally-settable complex band-pass filter.
Thanks, I hope that helps you understand my usage of your awesome paper!
Thank you,
Luke

Hello Luke. First, I take back what I wrote about your software definition of variable 'k'. My mistake. Your software definition for 'k' is fine.
That interactive spreadsheet demo is *beautifully* informative. Good job!
[-Rick-]

OK good to know thanks @RickLyons.
My main point of confusion is still why the guaranteed stable version begins to introduce noise in the low frequency SlidingDFT after a few hundred samples, whereas the marginally stable version doesn't.
I still cannot figure out if this is an issue with my implementation or the algorithm itself.
I'd be happy to post all my code and any more details in order to resolve it.
Thanks,
Luke