Instantaneous frequency direction of human speech

Phase delay of the imaginary cascade relative to the real one doesn’t stay constant. It changes, speed of change is the instantaneous frequency.
Question: from the theory of IIR all-pass filters on BiQuads, can human speech phasor temporary change the direction of rotation?
[1] Theodor A. Prosch, "A Minimalist Approximation of the Hilbert Transform", Sept/Oct 2012 QEX, pp 25-31.
[2] Rick Lyons, “Quadrature Signals: Complex but Not Complicated”
https://www.dsprelated.com/showarticle/192.php

Human speech as for any real signal would have both positive and negative frequency components of the same magnitude and opposite phase (so equal positive and negative rotation components. It’s after we convert it to that analytic signal that it becomes only positive frequency components. The analystic signal for x(t) is given as:
$$x_a(t) = x(t) + j \hat{x}(t)$$
where \( \hat{x}(t)\) s the Hilbert Transform.
This is approximated by the filters given with the result that the output of the two filters are in a phase relationship of 90 degrees. Note that it is only a portion of the band where this result is achieved and thus any frequency components outside of this band won’t be the analytic signal and therefore won’t be one sided on frequency. If the voice signal is band filtered to ensure only components are in this band, and if we ignore the initial transition of the filter, then the output should be one sided up to the suppression performance of the filter.

Hi, it's been a while since I thought about this. If anyone is interested I can try to recreate it more carefully.
Your equation suggests that x(t) is the original real signal and is also the real part of the analytic signal, but the imaginary part of the analytical signal is another real sequence, the Hilbert transform. You then suggest using two all-pass digital filters whose phase difference is a good approximation to 90 degrees over most of the band. I worked on that phase splitter approach in the 1960s by translating an analog design, based on elliptic functions, into a digital equivalent, two all pass filters. The two filters, which I'll call A(z) and B(z), are recursive and the total number of poles (and also of zeroes) is N, where N is an integer large enough to make the phase difference small enough. Because these are all-pass, the poles and zeroes are all reciprocals of one another, and they are all real.
But many years later I wanted to have ONE all-pass filter so that x_n could simply be taken directly and the sampled Hilbert approximation could be computed. The idea is to pass x_n through a different all-pass filter with transfer function A(z)B(1/z). It's the same A(z) and B(z) as in the original all-pass pair.
The problem, of course, is that B(z) is stable and therefore B(1/z) is unstable.
But if the output of X(z)A(z) is time reversed, and sent through B(z), and the output is time reversed again, you get a very efficient implementation of what you wanted. The count of adds and multiplications is exactly the same as for the phase splitter approach.
The seeming problem is that time reversing an indefinitely long signal cannot be achieved, let alone doing it twice. Then I discovered a fantastic way to do that in a practical way, using a finite delay, breaking the input signal into blocks, time reversing the blocks, filtering each time reversed block with B(1/z) - a real coefficient stable recursive filter - and time reversing again. Since B(1/z) dies out exponentially, each time reversed and filtered block has a short tail beyond the block length, after which the impulse response fades away to insignificance and can be dropped, so the final concatenation of filtered blocks get added together in the place they overlap.
I wrote that up and published it with Leland Jackson, in IEEE Trans Circuits and Systems, too long ago for me to remember, but if anyone is interested I can recreate it from my saved records.

Very interesting Charlie! Thank you for the comment. It sounds like a very efficient Hilbert implementation. By the way, I am a big fan of your “coupled form” 2nd order IIR which, as I’m sure you’re aware but related to this post, turns out to be the equivalent of the “analytic filter” in that it is the result of a single complex pole and zero with a real and complex output.

Charlie, I actually tried to reach you unsuccessfully (it may have been over a year ago); would you mind emailing me at my email address: boschen at loglin dot com

Another thing I should mention. If you start with an analog signal that's real, the Nyquist sampling frequency is twice the highest frequency - because the bandwidth includes both positive and negative complex frequencies. But when you add together the signal and j times its Hilbert transform, you eliminate the negative frequencies so the bandwidth is half what it was. That means you can get away with computing every other sample - more precisely, you don't care about aliasing in the parts of the band where you don't have a good approximation to 90 degree phase so you can halve the Nyquist frequency for the part of the bandwidth you care about. Anyway, having to compute only every other sample is another saving.

Hi K Man,
From a higher level perspective ... human speech is complex enough, in the presence of background noise, other human speakers (or various sounds in the 1-3 kHz range), just about anything can happen, and your phasor tip might point in any instantaneous direction, not to mention rapidly change. If there is something specific you want to detect I would ditch the "old school algorithm" approach and use CNNs.
These days speech recognition models like Whisper and Kaldi can handle anything. The only situation we've found where they fail is when we constrain them to run on small form-factor platforms (1 to 4 x86 or Arm cores), in that case they tend to produce sound-alike word errors, which can be rectified by post-processing with a small language model (SLM).
-Jeff
FAI here is a sound-alike error example
https://www.reddit.com/r/MachineLearning/comments/...

Thank you all for the comprehensive answers. My question is much simpler and very practical: if phase shift between I and Q channels is always decreasing, then the speech instantaneous frequency is always negative. When I add RF carrier frequency and send the value to DDS (Direct Digital Synthesis) chip, I will get Lower Side Band modulated signal. If I subtract negative speech instantaneous frequency from RF carrier frequency, I will get Upper Side Band modulated signal. It all works great.
..until I found that sometimes the phase shift between I and Q channels changes direction, temporary increasing in the example above. Speech instantaneous frequency becomes positive. It messes up the RF signal, because the opposite side band appears.
It usually happens when speech envelope is close to zero. I know that atan2(y,x) is undefined at x = 0 y = 0, that’s taken care of. In my radio lab all signals I Q are Real, no Imaginary. I may mess up the hardware… This is why I need to be sure that the human speech phasor of IIR all-pass filters above should always rotate in one direction – can somebody confirm this please