Harmonic Distortion From FIR Filter

Hello Everybody!
I'm looking for somebody to help me out of my grief! I'm new to the wonders of DSP so I have dived in head first and made myself a custom PCB with an STM32F4 and an audio CODEC with the aim of teaching myself DSP for audio effects.
I've been having lots of fun but I'm now onto my first low pass FIR filter and I've hit a bit of a wall...
Up to this point I have had faithful reproductions of the audio out of my system, and I have also had success with some IIR filters, but my FIR filter is producing a crazy amount of harmonic distortion. I've stepped through my code and thought it through for 3 days now and I'm admitting defeat (it's now Sunday evening and I have got nowhere)!
Above is a screenshot of my distortion on a Spectrum Analyser. The blue trace is showing my 3KHz sine wave input (it's hard to see but it is in the middle), and the red trace is the resulting output
I have used the online tool - tfilter to generate my coefficients which I have seen many people do online without any issue. I feel as though it could be some sort of quantising error though as I can set the coefficients to a 1 followed by a trail of zeros and the distortion will disappear. I have also tried to rule out processor speed by requesting that the processor does the maths but then ultimately returns the unmodified input and this seems fine as well...
Here is the designed filter in tfilter.com and the measured response from my system
As you can see, there is some resemblance, but not all that much!
I have uploaded some relevant code snippets below. Apologies for the scrappy nature of the it; It has been torn apart and had all sorts of ideas applied to it. Hopefully it should still make sense! Any help would be greatly appreciated, and thanks in advance!
Main FIR code
double filter_taps[FIR_TOTAL_TAPS] = { -0.0022239156993465827, -0.004821451779131147, -0.005368363506383406, -0.0020290178408416144, 0.00329244685033691, 0.005057819765491018, 0.0006372855563732268, -0.005884731610419017, -0.006276193602732038, 0.0015562225216403382, 0.009590742597950844, 0.006958585615734854, -0.005660926342537551, -0.014253061180101586, -0.0060214494111808575, 0.012535149682218472, 0.019484637123576138, 0.001995296012309013, -0.023356622926686454, -0.024674467939037846, 0.007757304684799037, 0.040932174667143544, 0.029110931590741095, -0.030734591476373224, -0.07696206074138462, -0.032097029845201505, 0.11762745961397253, 0.29019377759355963, 0.3664840252007224, 0.29019377759355963, 0.11762745961397253, -0.032097029845201505, -0.07696206074138462, -0.030734591476373224, 0.029110931590741095, 0.040932174667143544, 0.007757304684799037, -0.024674467939037846, -0.023356622926686454, 0.001995296012309013, 0.019484637123576138, 0.012535149682218472, -0.0060214494111808575, -0.014253061180101586, -0.005660926342537551, 0.006958585615734854, 0.009590742597950844, 0.0015562225216403382, -0.006276193602732038, -0.005884731610419017, 0.0006372855563732268, 0.005057819765491018, 0.00329244685033691, -0.0020290178408416144, -0.005368363506383406, -0.004821451779131147, -0.0022239156993465827 // 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; //initialise filter variables void init_fir_filter(fir_filter_t *filter) { filter->total_taps = FIR_TOTAL_TAPS; //filter->coeffs = &filter_taps[0]; //point the pointer to our array of coefficients for(int x = 0; x == filter->total_taps - 1; x++ ) { filter->samples[x] = 0.0f; } filter->index = 0; } //process the signal float process_fir_filter(fir_filter_t *filter, float latest_sample) { double acc = 0.0f; static double debug = 0.0023f; // float *coefficients = filter->coeffs; uint8_t temp_index; //adjust the sample index as we are adding a new sample filter->index++; //make sure the index is not out of range by wrapping around! if(filter->index == filter->total_taps) { filter->index = 0; } //add the latest sample to the data array at the updated index filter->samples[filter->index] = latest_sample; //duplicate the index so we can manipulate it later temp_index = filter->index; for(int n = 0; n < FIR_TOTAL_TAPS; n++ ) { debug = filter_taps[n]; acc += filter_taps[n] * (double)filter->samples[temp_index]; if(temp_index > 0) { temp_index--; } else { temp_index = FIR_TOTAL_TAPS - 1; } } return (float)acc; }
Filter.h file
#ifndef INC_FIR_FILTER_H_ #define INC_FIR_FILTER_H_ #define FIR_TOTAL_TAPS 57 typedef struct { float samples[FIR_TOTAL_TAPS]; //array to hold all the historic input samples //float *coeffs; //pointer to the FIR filter coefficients uint8_t total_taps; //total taps of the filter uint8_t index; //value to keep track of the latest sample (so we can use ring buffering) } fir_filter_t; void init_fir_filter(fir_filter_t *filter); float process_fir_filter(fir_filter_t *filter, float latest_sample);
Function that calls the filter (from main.c)
void processData(void) { static float leftIn, leftOut; //, rightIn, rightOut; audio_update_lockout_flag = 1; for(uint8_t n = 0; n < (AUDIO_BUFFER_SIZE/2) - 1; n += 2) { //left channel data leftIn = INT16_TO_FLOAT * ((float) codecInBuff_p[n]); //leftIn = (float)codecInBuff_p[n]; if(leftIn > 1.0f) { leftIn -= 2.0f; //make sure data is within 1/-1 } //modify the data here leftOut = process_fir_filter(&anti_aliasing_filter, leftIn); // leftOut = leftIn; //debug //convert back to signed int and transfer to DAC (via pointer) codecOutBuff_p[n] = (int16_t) (FLOAT_TO_INT16 * leftOut); ////////////////////// //right channel data (not used in current hardware) /* rightIn = (float)codecInBuff_p[n+1]; if(rightIn > 1.0f) { rightIn -= 2.0f; //make sure data is within 1/-1 } //modify the data here rightOut = rightIn; //convert back to signed int and transfer to DAC (via pointer) codecOutBuff_p[n] = (int16_t) rightOut; */ } audio_update_lockout_flag = 0; audioDataReadyFlag = 0; return; }
#fir #filter #stm32 #audio #harmonic_distortion

Aside from numerous issues with your code, like not being consistent with the usage of floats and doubles, i.e.
double acc = 0.0f;
Should instead be:
double acc = 0.0;
To avoid making your compiler generate a bunch of pointless type-conversion code, there is one major potential issue here.
This is totally wrong (and totally unneccessary since float isn't limited in range like integers are):
if(leftIn > 1.0f) { leftIn -= 2.0f; //make sure data is within 1/-1 }
Remove this code, do your convolution, then at the END of your code do this:
codecOutBuff_p[n] = (int16_t) (FLOAT_TO_INT16 * clamp(leftOut, -1.f, 1.f);
Where clamp is a function defined as float clamp(float x, float min, float max) that if x > max returns max and if x < min returns min.
I hope that fixes it for you.

Hi etheory, thanks for your reply!
Yes, you have picked up some good points about the code. I had been messing around with the data types a lot as I was certain that this was my issue over the weekend, and it looks like I had made a mess of it during my frustration! I have gone through and changed everything back to floats as this is what I would prefer to use I think
About the scaler - yes, you are right, it doesn't make sense does it! I had taken that one from a tutorial I was following and it hadn't caused me issues in those projects so it had become part of the furniture to me. I have taken your suggestion on with the clamp function, but one reason why I wasn't suspicious of this code is due to the fact that my distortion remained relative, even for very low input amplitudes.
After making your good suggestions my issue still remained unfortunately.
The good news is, whilst formulating a reply to you it made me think about something...
It turns out that it actually was a processing speed issue after all! My input ring buffer was being updated with new samples before I could process the filter. For now I have dropped my sample rate from ~48K to ~32K and the issue has all but disappeared! I guess my test where I did the calcs but returned the input must have had some optimisation from the compiler that I missed, so that one is on me!
For anyone in the same pickle as me, here is how I finally realised what was going on. Below is the time domain of my filter, input in blue and output in red
After looking at it properly I noticed that it sort of looks like the data is being "shifted" around a bit. i.e. there is missing data.
I expect that there is something better I can do other than dropping the sample rate, but for now it at least shows me where the issue was
Here are the new outputs for anyone interested
There is still a bit of noise lurking around but perhaps that's expected with a digital system...?
Anyway, thanks so much for your help etheory! It was great to get a new pair of eyes on the problem :D
Edit - grammar mistakes

Wonderful news I'm glad you fixed it!