ags wrote:This 0.5 second buffering could explain some of delay I am seeing between the visual representation and the audio output. Though in my application it is not really important it is annoying. I'm fairly new to the Java sound API and have much to learn. Time for me to go back to the Javadoc.
Thanks for the great response. I think I've found the problem - but have more to resolve. First, the apparent solution to the immediate problem: I found that by decreasing the size of the buffer used by the TargetDataLine, I can reduce the delay dramatically. It seems that by default, the buffer is sized to hold 0.5 sec of audio data. While that might not be strictly true, it was for the few test cases I tried. It turns out that I had just about 0.5 sec of unexplainable delay. I am assuming that the TDL waits to fill (some portion of) its buffer before it can be read. At least that's how the behavior seems from my recent experience.
I am in a different situation than you in that I cannot intercept the audio stream (and then pass it on, delayed appropriately to sync the visual and audio streams). I'm at the end-of-the-line of a split audio stream; one branch comes to me (live, streaming) and the other is sent to audio equipment (for immediate playback). So unless this changes (and that is unlikely) I will have to make sure everything I do from reading, through processing and graphical rendering is fast enough that the delay is not noticeable compared to the other audio stream being played back by separate equipment.I started with that architecture but my need to convert sound samples to doubles meant it was better if I first converted all sound samples to big-endian 2 byte signed values. I can then easily convert these two byte signed values doubles and also pass them to the TDL with better control of playback (I need to pause and stop playback with minimal delay).
Because of that, I've had to make some compromises. My display is not meant to be diagnostic (not precise) - it's more for visual aesthetic than a measurement tool.Ditto. I just display in real time the audio data, it's spectrum and amplitude. Charts are never really precise and are just a visualisation.
I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT.I have just checked my performance charts and I can do a 256 point real FFT in around 10 micro seconds - a factor of 500 slower than your 20 nano-seconds. Are you sure you mean 20 nano seconds and not 20 micro seconds?
Now that I have found some more processing time, I'll see how much I can raise that towards full bandwidth & resolution without unacceptable sync delay. If I could get to full bandwidth (44.1k or 48k) sampling and 10Hz resolution that would be great.From my limited experience I have found that 10 Hz resolution is OK for voice but not so good for music. I can't justify this but 5 Hz or smaller seems to be needed for orchestral music.
You mention a few steps that I'd appreciate more detail on (if you please).For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?
2) Right now I'm just iterating through the array of (SIGNED_PCM) int samples and converting to doubles and copying into a new array. Do you know of a faster method?I do much the same except that I first of all convert all sound samples regardless of format (ULaw, ALaw, single byte PCM etc) first to 2 byte big-endian signed PCM and then convert these values to doubles.
3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.
Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies.This will cause all frequencies between half your sample rate (your effective Nyquist frequency) and the Nyquist frequency of the original samples (half the original sample rate) to be folded down into the range up to your effective Nyquist frequency. Sub-sampling is normally a very very very bad idea without first performing some low pass filtering of the original samples.
My understanding is that the Hamming window will reduce the spectral amplitude but sharpen the bands (reducing spectral bleedthrough).In essence yes.
Should I also expect to see a reduction of the bias towards the low frequency bands?It is not really a bias and without windowing one spreads/smears energy from one tone/frequency equally to both sides of the tone and not just to low frequencies.
Do you use a Hamming window or other?I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more?Pretty much.
Does that differ in some way from what you refer to in the next step as "spectral density estimates"?I don't really understand what you are asking.
I'm not using any charting package, just java.awt.Graphics2D. I only require course visualization of roughly 3-5k "pixels' represented by simple Rectangles.I first tried to use JFreeChart for this but even though it is a brilliant library and I use if for most charting it is a dog when displaying real time data so I created a simple very fast Swing charting library without all the bells and whistles of JFreeChart but much much much faster.
From the content of your reply it seems you are very knowledgeable in this area.Spectral estimation - yes - but Java sound - no -.
I understand the mathematics pretty well, but have not implemeneted something like this before. Thanks for your help.You seem to have helped yourself.
Correct me if I'm wrong, but wouldn't it be more effective to run a high-pass filter over the data in the frequency domain to discard the low frequencies?1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
Whenever you run a Fourier transform, it assumes that your data is peroidic... so essentially what ends up happing is that your data gets tiled, and the frequency information gets distorted where the end wraps back to the beginning.3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.
My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
Hanning is a simplier window function so it can be calculated faster... in either domain.Do you use a Hamming window or other?I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
He tends to do that if you don't watch him pretty carefully...I understand the mathematics pretty well, but have not implemeneted something like this before. Thanks for your help.You seem to have helped yourself.
captfoss wrote:A straight line in the time domain actually has some higher frequency components since it is assumed periodic so looks like a sawtooth. Yes, you can get some advantage from a high pass filter in the frequency domain but for the work I used to do where 1/f device noise was the problem it was more effective to remove the best straight line. My limited knowledge of sound means I don't know which is the best approach for sound.Correct me if I'm wrong, but wouldn't it be more effective to run a high-pass filter over the data in the frequency domain to discard the low frequencies?1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
Which is what the windowing reduces but at the cost of decreasing the resolution one can obtain for a given number of samples.Whenever you run a Fourier transform, it assumes that your data is peroidic... so essentially what ends up happing is that your data gets tiled, and the frequency information gets distorted where the end wraps back to the beginning.3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.
My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
To prevent what is called an "edge effect", the normal recommendation is to apply a hamming window, which essentially fades your data out towards the edges... thus where the edges meet, they meet in a faded way so the edge effects are drastically reduced.Hanning in the time domain is 0.5 * (1-cos(2*pi*i/N)) which in the frequency domain is just the addition and subtraction of neighbouring values Y(n) = -X(n-1) + X(n) + X(n) - X(n+1) which can be performed without any multiplications. When I had to use fixed point arithmetic done in software this made a big difference but today with floating point processesors it makes little difference.
Hanning is a simpler window function so it can be calculated faster... in either domain.Do you use a Hamming window or other?I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
sabre150 wrote:I'm not sure I'm following completely. Nonetheless, perhaps making the byte[] buffer associated with the SDL used for playback will reduce the playback delay. Worth a try maybe.
I started with that architecture but my need to convert sound samples to doubles meant it was better if I first converted all sound samples to big-endian 2 byte signed values. I can then easily convert these two byte signed values doubles and also pass them to the TDL with better control of playback (I need to pause and stop playback with minimal delay).
Wow! What a stupid mistake on my part. 3 orders of magnitude off... I need to return all my certifications. I suppose I had the "System.nanoTime()" call in mind when I pulled that out of the hat. With normal loading, the 256 point FFT takes about 10uSec, and can take as long as 20uSec if the machine is very heavily loaded. I currently have it running in it's own thread, with normal priority. It is a standard Cooley-Tukey FFT implementation that is pretty ubiquitous, without further flair added by me.I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT.I have just checked my performance charts and I can do a 256 point real FFT in around 10 micro seconds - a factor of 500 slower than your 20 nano-seconds. Are you sure you mean 20 nano seconds and not 20 micro seconds?
256 points at 25k samples per second will mean you collect and process about 0.01 seconds at a time so you cannot resolve to better than 100Hz. If you then use a Hamming Window your resolution will degenerate to about 140 Hz. I work with the full sample rate of the audio and allow the user to define the collection period. Experimentation shows that a collection period of 0.2 seconds, a Hamming data Window and processing pairs of overlapping 0.2 second segments means I can resolve to better than 3 Hz which seem to be an acceptable compromise between visual update response and resolution.I'll look into the overlap to see the resultant display. I can see that by just holding two half-sized arrays (logically - they coud be one array) and filling one then the other, I can avoid extra copying.
If you look at the bell shape of the Hamming window you will note that much of the sound is discarded. By overlapping segments by 50% one gets back much of this discarded data and improves the resolution.
That's sounds like what I'm seeing - far too much low frequency bias - beyond what I would expect as "typical" musical tending to have greater low-frequency content. (There is a lot of drum/percussion in typical samples I'm using). Would you point me to some information on how to use this techique? I can try it and discard it if not needed - but I will have learned something in either case, which I enjoy.1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
Yes, I understand folding. I'm thinking that typical music will have lower power in the higher frequencies (particularly above 12kHz, where folding is occurring in my current implementation. Folding is a good description; the (actual) bands just above 12kHz will fold to just below 12kHz in my output. It would be the (actual) sound at 20+kHz that gets folded down to add to the low-frequency bias. The just-above-midrange content would fold to actually make the low-frequency bias seem less, relative to higher spectrum (not sure if I did a good job describing that clearly). So, I'm assuming what I'm seeing (that particular artifact) is not necessarily due to the folding caused by my half-sampling-rate. I agree that the lower sampling rate is not a good thing, it was (until now) a forced compromise. It's not obvious to me how I can add any anti-aliasing to avoid the folding without dealing with the full sample rate. I can't filter the audio before I get it, and once I get it the delay is a concern. I will try increasing my sampling rate to see what the delay looks like after the buffer size changes I've made.3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.
My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies.This will cause all frequencies between half your sample rate (your effective Nyquist frequency) and the Nyquist frequency of the original samples (half the original sample rate) to be folded down into the range up to your effective Nyquist frequency. Sub-sampling is normally a very very very bad idea without first performing some low pass filtering of the original samples.
I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.Big question: I precalculate the twiddle factors for use in the FFT. Are there frequency-domain coefficients available representing the different windowing schemes that I could just multiply by the existing twiddle factors for what would effectively be a "free lunch"? That would be great, but those free lunches rarely seem to be real... The only way I know to implement windowing is by an additional step multiplying the time domain samples by the windowing values.
You used the term "spectral density" and then "spectral density estimates". When dealing with finite sampling periods I realize that the FFT results (not because of the FFT algorithm, which has identical results as the naiive DFT - but because of the finite sampling and non-periodic content in sample size<sampling rate) are estimates, but I wanted to be sure there isn't more meaning in what you've written than I am understanding.4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more?Pretty much.
Does that differ in some way from what you refer to in the next step as "spectral density estimates"?I don't really understand what you are asking.
:-)Wow! What a stupid mistake on my part. 3 orders of magnitude off... I need to return all my certifications. I suppose I had the "System.nanoTime()" call in mind when I pulled that out of the hat. With normal loading, the 256 point FFT takes about 10uSec, and can take as long as 20uSec if the machine is very heavily loaded.I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT.I have just checked my performance charts and I can do a 256 point real FFT in around 10 micro seconds - a factor of 500 slower than your 20 nano-seconds. Are you sure you mean 20 nano seconds and not 20 micro seconds?
I currently have it running in it's own thread, with normal priority. It is a standard Cooley-Tukey FFT implementation that is pretty ubiquitous, without further flair added by me.I have three major background threads running in my application; none explicitly created by me. The implicit Swing event thread, the implicit audio playback thread provided by Java sound and then the implicit main() thread in which I process all the data. When I started working with Java sound I used separate threads for just about everything but as I started to understand the architecture I realised that I didn't need more than these three threads since all my signal processing was synchronous with reading the sound samples. Your application may need extra threads but be sparing when spawning new threads because the synchronisation may cost you more than you gain.
If I could perform a 256 point FFT in 20 nSec (on a typical desktop), I'd be sending my resume to that organization in Norway that hands out the cool awards each year...I don't think you will actually gain speed by doing this overlapping; copying is very fast compared to the other operations you are doing. If fact, since each transform is now twice as big, you will probably lose out.
256 points at 25k samples per second will mean you collect and process about 0.01 seconds at a time so you cannot resolve to better than 100Hz. If you then use a Hamming Window your resolution will degenerate to about 140 Hz. I work with the full sample rate of the audio and allow the user to define the collection period. Experimentation shows that a collection period of 0.2 seconds, a Hamming data Window and processing pairs of overlapping 0.2 second segments means I can resolve to better than 3 Hz which seem to be an acceptable compromise between visual update response and resolution.I'll look into the overlap to see the resultant display. I can see that by just holding two half-sized arrays (logically - they coud be one array) and filling one then the other, I can avoid extra copying.
If you look at the bell shape of the Hamming window you will note that much of the sound is discarded. By overlapping segments by 50% one gets back much of this discarded data and improves the resolution.
I just use the standard 'least squares' approach. Since it is orthogonal to the rest of the signal processing it is easy enough to add but don't get hung up on this. Remember that when it comes to sound frequency the ear is logarithmic so one might do better to accumulate the spectral density estimates into sub-octaves so that the one displays the sound energy in a sub-octave and then the dominance of the low frequency might disappear. I am currently working on this (using the musical note sub-octave of 1/12) but it is early days and I have no clear-cut advice to give.That's sounds like what I'm seeing - far too much low frequency bias - beyond what I would expect as "typical" musical tending to have greater low-frequency content. (There is a lot of drum/percussion in typical samples I'm using). Would you point me to some information on how to use this techique? I can try it and discard it if not needed - but I will have learned something in either case, which I enjoy.1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
I don't think you can frig the FFT to include the windowing though I have not ever considered it since I want to keep a separation of concerns and I doubt if there will be more than a marginal time saving.Yes, I understand folding. I'm thinking that typical music will have lower power in the higher frequencies (particularly above 12kHz, where folding is occurring in my current implementation. Folding is a good description; the (actual) bands just above 12kHz will fold to just below 12kHz in my output. It would be the (actual) sound at 20+kHz that gets folded down to add to the low-frequency bias. The just-above-midrange content would fold to actually make the low-frequency bias seem less, relative to higher spectrum (not sure if I did a good job describing that clearly). So, I'm assuming what I'm seeing (that particular artifact) is not necessarily due to the folding caused by my half-sampling-rate. I agree that the lower sampling rate is not a good thing, it was (until now) a forced compromise. It's not obvious to me how I can add any anti-aliasing to avoid the folding without dealing with the full sample rate. I can't filter the audio before I get it, and once I get it the delay is a concern. I will try increasing my sampling rate to see what the delay looks like after the buffer size changes I've made.3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.
My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies.This will cause all frequencies between half your sample rate (your effective Nyquist frequency) and the Nyquist frequency of the original samples (half the original sample rate) to be folded down into the range up to your effective Nyquist frequency. Sub-sampling is normally a very very very bad idea without first performing some low pass filtering of the original samples.
I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.Big question: I precalculate the twiddle factors for use in the FFT. Are there frequency-domain coefficients available representing the different windowing schemes that I could just multiply by the existing twiddle factors for what would effectively be a "free lunch"? That would be great, but those free lunches rarely seem to be real... The only way I know to implement windowing is by an additional step multiplying the time domain samples by the windowing values.
'spectral density' is a property of an ensemble and one estimates the 'spectral density' from a single member of that ensemble (a sound sample). I suspect this distinction is irrelevant to this discussion and can be ignored.You used the term "spectral density" and then "spectral density estimates". When dealing with finite sampling periods I realize that the FFT results (not because of the FFT algorithm, which has identical results as the naiive DFT - but because of the finite sampling and non-periodic content in sample size<sampling rate) are estimates, but I wanted to be sure there isn't more meaning in what you've written than I am understanding.4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more?Pretty much.
Does that differ in some way from what you refer to in the next step as "spectral density estimates"?I don't really understand what you are asking.
In the time domain both Hanning and Hamming can be implemented as an array of pre-computed multipliers and once the array is computed the application of the window takes the same time for both.But computing the array of precompiled multipliers can be done faster for a Hanning window than a Hamming... so it's still faster in both domains, just not "as faster" in the spacial domain ;-)
In the frequency domain one can make use of the simple additive implementation for the required convolution so Hanning is slightly cheaper BUT today's floating point processors make this micro optimisation pretty much unnecessary.AFAIK, people use Hamming windows for the same reason they use the US standard measurement system... they just do.