This discussion is archived
8 Replies Latest reply: Mar 25, 2011 8:58 AM by captfoss RSS

Does java.sound add delay to "live" audio streams (vs. native code)?

845492 Newbie
Currently Being Moderated
First, before things get off track: the question isn't about (nor am I implying) Java being slow.

I am developing an application that listens to a "live" audio feed (by "live", I mean it is not stored, but being streamed in real time - as opposed to being a Clip or File that I can manage directly) and performing some processing, driving something like a visualization. I've measured (using eyes, ears, and stopwatch) that there is a constant ~ 0.65sec delay from the "sound event" to the "displayed event". I presumed it was my code being slow. It is unacceptably distracting for the intended use.

I've spent quite some time analyzing, refining, and restructuring my code, without significant improvement. I did a quick "instrumentation" of my code using System.nanoTime() for reference. I realize it's not dead-on accurate, but it should give some good ideas of where time is being spent. The FFT is taking about 20nSec. The graphical rendering (the "paint" method) takes about 1uSec. Loading the process down by throwing about 500 different operations at my processing code results in a total delay (including FFT, painting the window, and my processing) of less than 5 mSec. That delay would be fine.

I was about to give up, thinking that I would not be able to succeed when I found a (commercial) application that does something similar to what I'm doing. I can run it at the same time as my app, listening to the same audio line, and with side-by-side visual comparison I see that the response of the other app is without noticeable delay compared to the audio source, but at least 0.5 sec "ahead" of my grahical output. I don't have source code for the other app, and I don't think it's written in Java (probably native Win32 I'm guessing). However, it does demonstrate that the goal is achievable.

At this point, there are two (java.sound) explanations I can come up with (and one Swing-related explanation):
1) To provide the very simple stream interface (just filling an array from a TargetDataLine and then processing it) java.sound is adding some pipeline delay in the audio stream.
2) I should be listening to some other Line (although there is no other Line that I can find that makes sense - unless I can listen directly to a Port).
3) The delay isn't actually in the audio section, but in the graphic rendering section. I realize that's beyond the scope of this sub-forum. All I can think of is that Swing is introducing an enormous delay between the time I (indirectly - through a <customJPanel>.repaint() call for a redraw of my window. I can measure the actual time required for the my custom paint() method to complete, but not the latency between the time the repaint() call is made and the Swing thread dispatches that call to the actual paint() method which does the work.

I would appreciate any help others might be able to share with this. Thanks.

[Addendum: I just tried replacing the JPanel.repaint() call (deferred to the Swing EDT) with JPanel.paintImmediately() and there was no change in the delay.]

Edited by: ags on Mar 23, 2011 4:14 PM

Edited by: ags on Mar 23, 2011 4:15 PM
  • 1. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    sabre150 Expert
    Currently Being Moderated
    I do something similar to you but I don't see your problem; if anything I see the reverse. The stages I go though are -

    1) Collect a buffer of sampled data from the line. Typically I collect 0.2 seconds of data at a time so that I can resolve to about 5Hz .
    2) Process the samples - convert the samples to doubles, remove the best straight line, apply a window, perform the FFT, compute the spectral density.
    3) Create the graphical plot data model from the spectral density estimates.
    4) Create the graphical plot data model for the time samples.
    5) 'swing' the plot data models (a 'swing' buffer approach is essential here) and send the repaint() to get the charts to be re-drawn.
    6) Write the buffer of audio samples to the output line.

    All this is done in a single separate thread but note that the last thing I do is write the audio samples to the output line. The charting is very fast but is still the slowest single part of the process. Which charting package are you using? I don't have any real timing for this but I find that if anything the graphical charting display leads the audio output by about 0.2 seconds. In my application this is acceptable since I'm reading audio files and not taking a live feed.

    Your FFT seems very very fast; I don't know how many points you are transforming but 20 nano secs seem very very very short for any number of points that would make sense to me. Is this being done in Java or are you using JNI to invoke FFTW ? I use my own Java mixed radix FFT that can do an 8000 point real FFT in about 0.5 mSec - a long way from 20 nano secs but not the limiting factor in my application!

    The point of all this is that you might want to change the position in the processing in which you write the samples to the output line.

    P.S. Don't send more sound samples to the output line than you are processing at a time.
  • 2. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    845492 Newbie
    Currently Being Moderated
    Thanks for the great response. I think I've found the problem - but have more to resolve. First, the apparent solution to the immediate problem: I found that by decreasing the size of the buffer used by the TargetDataLine, I can reduce the delay dramatically. It seems that by default, the buffer is sized to hold 0.5 sec of audio data. While that might not be strictly true, it was for the few test cases I tried. It turns out that I had just about 0.5 sec of unexplainable delay. I am assuming that the TDL waits to fill (some portion of) its buffer before it can be read. At least that's how the behavior seems from my recent experience.

    I am in a different situation than you in that I cannot intercept the audio stream (and then pass it on, delayed appropriately to sync the visual and audio streams). I'm at the end-of-the-line of a split audio stream; one branch comes to me (live, streaming) and the other is sent to audio equipment (for immediate playback). So unless this changes (and that is unlikely) I will have to make sure everything I do from reading, through processing and graphical rendering is fast enough that the delay is not noticeable compared to the other audio stream being played back by separate equipment.

    Because of that, I've had to make some compromises. My display is not meant to be diagnostic (not precise) - it's more for visual aesthetic than a measurement tool. I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT. Now that I have found some more processing time, I'll see how much I can raise that towards full bandwidth & resolution without unacceptable sync delay. If I could get to full bandwidth (44.1k or 48k) sampling and 10Hz resolution that would be great.

    You mention a few steps that I'd appreciate more detail on (if you please).

    1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?

    2) Right now I'm just iterating through the array of (SIGNED_PCM) int samples and converting to doubles and copying into a new array. Do you know of a faster method?

    3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side. Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies. My understanding is that the Hamming window will reduce the spectral amplitude but sharpen the bands (reducing spectral bleedthrough). Should I also expect to see a reduction of the bias towards the low frequency bands? Do you use a Hamming window or other?

    4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more? Does that differ in some way from what you refer to in the next step as "spectral density estimates"?

    I'm not using any charting package, just java.awt.Graphics2D. I only require course visualization of roughly 3-5k "pixels' represented by simple Rectangles.

    From the content of your reply it seems you are very knowledgeable in this area. I understand the mathematics pretty well, but have not implemeneted something like this before. Thanks for your help.
  • 3. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    sabre150 Expert
    Currently Being Moderated
    ags wrote:
    Thanks for the great response. I think I've found the problem - but have more to resolve. First, the apparent solution to the immediate problem: I found that by decreasing the size of the buffer used by the TargetDataLine, I can reduce the delay dramatically. It seems that by default, the buffer is sized to hold 0.5 sec of audio data. While that might not be strictly true, it was for the few test cases I tried. It turns out that I had just about 0.5 sec of unexplainable delay. I am assuming that the TDL waits to fill (some portion of) its buffer before it can be read. At least that's how the behavior seems from my recent experience.
    This 0.5 second buffering could explain some of delay I am seeing between the visual representation and the audio output. Though in my application it is not really important it is annoying. I'm fairly new to the Java sound API and have much to learn. Time for me to go back to the Javadoc.

    >
    I am in a different situation than you in that I cannot intercept the audio stream (and then pass it on, delayed appropriately to sync the visual and audio streams). I'm at the end-of-the-line of a split audio stream; one branch comes to me (live, streaming) and the other is sent to audio equipment (for immediate playback). So unless this changes (and that is unlikely) I will have to make sure everything I do from reading, through processing and graphical rendering is fast enough that the delay is not noticeable compared to the other audio stream being played back by separate equipment.
    I started with that architecture but my need to convert sound samples to doubles meant it was better if I first converted all sound samples to big-endian 2 byte signed values. I can then easily convert these two byte signed values doubles and also pass them to the TDL with better control of playback (I need to pause and stop playback with minimal delay).


    >
    Because of that, I've had to make some compromises. My display is not meant to be diagnostic (not precise) - it's more for visual aesthetic than a measurement tool.
    Ditto. I just display in real time the audio data, it's spectrum and amplitude. Charts are never really precise and are just a visualisation.
    I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT.
    I have just checked my performance charts and I can do a 256 point real FFT in around 10 micro seconds - a factor of 500 slower than your 20 nano-seconds. Are you sure you mean 20 nano seconds and not 20 micro seconds?

    256 points at 25k samples per second will mean you collect and process about 0.01 seconds at a time so you cannot resolve to better than 100Hz. If you then use a Hamming Window your resolution will degenerate to about 140 Hz. I work with the full sample rate of the audio and allow the user to define the collection period. Experimentation shows that a collection period of 0.2 seconds, a Hamming data Window and processing pairs of overlapping 0.2 second segments means I can resolve to better than 3 Hz which seem to be an acceptable compromise between visual update response and resolution.

    If you look at the bell shape of the Hamming window you will note that much of the sound is discarded. By overlapping segments by 50% one gets back much of this discarded data and improves the resolution.
    Now that I have found some more processing time, I'll see how much I can raise that towards full bandwidth & resolution without unacceptable sync delay. If I could get to full bandwidth (44.1k or 48k) sampling and 10Hz resolution that would be great.
    From my limited experience I have found that 10 Hz resolution is OK for voice but not so good for music. I can't justify this but 5 Hz or smaller seems to be needed for orchestral music.

    >
    You mention a few steps that I'd appreciate more detail on (if you please).

    1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?
    For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.

    >
    2) Right now I'm just iterating through the array of (SIGNED_PCM) int samples and converting to doubles and copying into a new array. Do you know of a faster method?
    I do much the same except that I first of all convert all sound samples regardless of format (ULaw, ALaw, single byte PCM etc) first to 2 byte big-endian signed PCM and then convert these values to doubles.

    >
    3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.
    Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.

    My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
    Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies.
    This will cause all frequencies between half your sample rate (your effective Nyquist frequency) and the Nyquist frequency of the original samples (half the original sample rate) to be folded down into the range up to your effective Nyquist frequency. Sub-sampling is normally a very very very bad idea without first performing some low pass filtering of the original samples.
    My understanding is that the Hamming window will reduce the spectral amplitude but sharpen the bands (reducing spectral bleedthrough).
    In essence yes.
    Should I also expect to see a reduction of the bias towards the low frequency bands?
    It is not really a bias and without windowing one spreads/smears energy from one tone/frequency equally to both sides of the tone and not just to low frequencies.
    Do you use a Hamming window or other?
    I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.

    >
    4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more?
    Pretty much.
    Does that differ in some way from what you refer to in the next step as "spectral density estimates"?
    I don't really understand what you are asking.

    >
    I'm not using any charting package, just java.awt.Graphics2D. I only require course visualization of roughly 3-5k "pixels' represented by simple Rectangles.
    I first tried to use JFreeChart for this but even though it is a brilliant library and I use if for most charting it is a dog when displaying real time data so I created a simple very fast Swing charting library without all the bells and whistles of JFreeChart but much much much faster.

    >
    From the content of your reply it seems you are very knowledgeable in this area.
    Spectral estimation - yes - but Java sound - no -.
    I understand the mathematics pretty well, but have not implemeneted something like this before. Thanks for your help.
    You seem to have helped yourself.
  • 4. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    captfoss Pro
    Currently Being Moderated
    1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?
    For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
    Correct me if I'm wrong, but wouldn't it be more effective to run a high-pass filter over the data in the frequency domain to discard the low frequencies?
    3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.
    Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.

    My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
    Whenever you run a Fourier transform, it assumes that your data is peroidic... so essentially what ends up happing is that your data gets tiled, and the frequency information gets distorted where the end wraps back to the beginning.

    To prevent what is called an "edge effect", the normal recommendation is to apply a hamming window, which essentially fades your data out towards the edges... thus where the edges meet, they meet in a faded way so the edge effects are drastically reduced.
    Do you use a Hamming window or other?
    I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
    Hanning is a simplier window function so it can be calculated faster... in either domain.
    I understand the mathematics pretty well, but have not implemeneted something like this before. Thanks for your help.
    You seem to have helped yourself.
    He tends to do that if you don't watch him pretty carefully...
  • 5. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    sabre150 Expert
    Currently Being Moderated
    captfoss wrote:
    1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?
    For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
    Correct me if I'm wrong, but wouldn't it be more effective to run a high-pass filter over the data in the frequency domain to discard the low frequencies?
    A straight line in the time domain actually has some higher frequency components since it is assumed periodic so looks like a sawtooth. Yes, you can get some advantage from a high pass filter in the frequency domain but for the work I used to do where 1/f device noise was the problem it was more effective to remove the best straight line. My limited knowledge of sound means I don't know which is the best approach for sound.

    >
    3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.
    Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.

    My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
    Whenever you run a Fourier transform, it assumes that your data is peroidic... so essentially what ends up happing is that your data gets tiled, and the frequency information gets distorted where the end wraps back to the beginning.
    Which is what the windowing reduces but at the cost of decreasing the resolution one can obtain for a given number of samples.

    >
    To prevent what is called an "edge effect", the normal recommendation is to apply a hamming window, which essentially fades your data out towards the edges... thus where the edges meet, they meet in a faded way so the edge effects are drastically reduced.
    Do you use a Hamming window or other?
    I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
    Hanning is a simpler window function so it can be calculated faster... in either domain.
    Hanning in the time domain is 0.5 * (1-cos(2*pi*i/N)) which in the frequency domain is just the addition and subtraction of neighbouring values Y(n) = -X(n-1) + X(n) + X(n) - X(n+1) which can be performed without any multiplications. When I had to use fixed point arithmetic done in software this made a big difference but today with floating point processesors it makes little difference.

    Hamming in the time domain is 0.53836 - 0.46164 * cos(2*pi*i/N) which in the frequency domain is Y(n) = -0.4054 * X(n-1) + 0 0.9846 * X(n) -0.4054 * X(n+1) which needs some multiplications.

    In the time domain both Hanning and Hamming can be implemented as an array of pre-computed multipliers and once the array is computed the application of the window takes the same time for both.

    In the frequency domain one can make use of the simple additive implementation for the required convolution so Hanning is slightly cheaper BUT today's floating point processors make this micro optimisation pretty much unnecessary.
  • 6. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    845492 Newbie
    Currently Being Moderated
    sabre150 wrote:
    I started with that architecture but my need to convert sound samples to doubles meant it was better if I first converted all sound samples to big-endian 2 byte signed values. I can then easily convert these two byte signed values doubles and also pass them to the TDL with better control of playback (I need to pause and stop playback with minimal delay).
    I'm not sure I'm following completely. Nonetheless, perhaps making the byte[] buffer associated with the SDL used for playback will reduce the playback delay. Worth a try maybe.

    >
    I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT.
    I have just checked my performance charts and I can do a 256 point real FFT in around 10 micro seconds - a factor of 500 slower than your 20 nano-seconds. Are you sure you mean 20 nano seconds and not 20 micro seconds?
    Wow! What a stupid mistake on my part. 3 orders of magnitude off... I need to return all my certifications. I suppose I had the "System.nanoTime()" call in mind when I pulled that out of the hat. With normal loading, the 256 point FFT takes about 10uSec, and can take as long as 20uSec if the machine is very heavily loaded. I currently have it running in it's own thread, with normal priority. It is a standard Cooley-Tukey FFT implementation that is pretty ubiquitous, without further flair added by me.

    If I could perform a 256 point FFT in 20 nSec (on a typical desktop), I'd be sending my resume to that organization in Norway that hands out the cool awards each year...
    256 points at 25k samples per second will mean you collect and process about 0.01 seconds at a time so you cannot resolve to better than 100Hz. If you then use a Hamming Window your resolution will degenerate to about 140 Hz. I work with the full sample rate of the audio and allow the user to define the collection period. Experimentation shows that a collection period of 0.2 seconds, a Hamming data Window and processing pairs of overlapping 0.2 second segments means I can resolve to better than 3 Hz which seem to be an acceptable compromise between visual update response and resolution.

    If you look at the bell shape of the Hamming window you will note that much of the sound is discarded. By overlapping segments by 50% one gets back much of this discarded data and improves the resolution.
    I'll look into the overlap to see the resultant display. I can see that by just holding two half-sized arrays (logically - they coud be one array) and filling one then the other, I can avoid extra copying.
    1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?
    For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
    That's sounds like what I'm seeing - far too much low frequency bias - beyond what I would expect as "typical" musical tending to have greater low-frequency content. (There is a lot of drum/percussion in typical samples I'm using). Would you point me to some information on how to use this techique? I can try it and discard it if not needed - but I will have learned something in either case, which I enjoy.
    3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.
    Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.

    My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
    Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies.
    This will cause all frequencies between half your sample rate (your effective Nyquist frequency) and the Nyquist frequency of the original samples (half the original sample rate) to be folded down into the range up to your effective Nyquist frequency. Sub-sampling is normally a very very very bad idea without first performing some low pass filtering of the original samples.
    Yes, I understand folding. I'm thinking that typical music will have lower power in the higher frequencies (particularly above 12kHz, where folding is occurring in my current implementation. Folding is a good description; the (actual) bands just above 12kHz will fold to just below 12kHz in my output. It would be the (actual) sound at 20+kHz that gets folded down to add to the low-frequency bias. The just-above-midrange content would fold to actually make the low-frequency bias seem less, relative to higher spectrum (not sure if I did a good job describing that clearly). So, I'm assuming what I'm seeing (that particular artifact) is not necessarily due to the folding caused by my half-sampling-rate. I agree that the lower sampling rate is not a good thing, it was (until now) a forced compromise. It's not obvious to me how I can add any anti-aliasing to avoid the folding without dealing with the full sample rate. I can't filter the audio before I get it, and once I get it the delay is a concern. I will try increasing my sampling rate to see what the delay looks like after the buffer size changes I've made.

    >
    I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
    Big question: I precalculate the twiddle factors for use in the FFT. Are there frequency-domain coefficients available representing the different windowing schemes that I could just multiply by the existing twiddle factors for what would effectively be a "free lunch"? That would be great, but those free lunches rarely seem to be real... The only way I know to implement windowing is by an additional step multiplying the time domain samples by the windowing values.

    >>
    4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more?
    Pretty much.
    Does that differ in some way from what you refer to in the next step as "spectral density estimates"?
    I don't really understand what you are asking.
    You used the term "spectral density" and then "spectral density estimates". When dealing with finite sampling periods I realize that the FFT results (not because of the FFT algorithm, which has identical results as the naiive DFT - but because of the finite sampling and non-periodic content in sample size<sampling rate) are estimates, but I wanted to be sure there isn't more meaning in what you've written than I am understanding.

    Thanks again for the responses.

    Edited by: ags on Mar 24, 2011 12:36 PM
    That's what I get for being sidetracked after starting a reply at 7 AM. Well, the content is still on-topic, sorry for being late to the party, though.
  • 7. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    sabre150 Expert
    Currently Being Moderated
    >

    <snip>

    >>
    I've gotten the FFT time down so small by reducing the sampling rate to 24kS/s (throwing away half the bandwidth) and I'm only using 256 points for the FFT.
    I have just checked my performance charts and I can do a 256 point real FFT in around 10 micro seconds - a factor of 500 slower than your 20 nano-seconds. Are you sure you mean 20 nano seconds and not 20 micro seconds?
    Wow! What a stupid mistake on my part. 3 orders of magnitude off... I need to return all my certifications. I suppose I had the "System.nanoTime()" call in mind when I pulled that out of the hat. With normal loading, the 256 point FFT takes about 10uSec, and can take as long as 20uSec if the machine is very heavily loaded.
    :-)

    I use my own mixed radix 2,3 and 5 implementation developed over many many many years. Over these years I 'tuned' the implementation trying to get it faster but I have stopped since I was losing out to Moore's Law. The only optimisations I now make are to pre-compute as much as possible and to use the fact that since it is conjugate symmetric a transform of N real data points can be converted to a transform of N/2 complex data points with N/2 extra add-multiply pairs. This can save a factor of 2 over just converting the N real points to N complex points by using an array of zeros for the imaginary parts.

    Why mixed radix? A radix 2 implementation allows one to transform 2,4,8,16,32,64,128... points. A Mixed radix 2,3 and 5 implementation allows one to transform 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27,30, 32, 36, 40, 45, 48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125, 128, ... . Much less restrictive. I know that there is now a technique to allow one to transform an arbitrary N point DFT into a sequence of mixed radix 2 transforms but I haven't yet found time to investigate this. MATLAB, which I think uses FFTW behind the scenes, seems to use this.
    I currently have it running in it's own thread, with normal priority. It is a standard Cooley-Tukey FFT implementation that is pretty ubiquitous, without further flair added by me.
    I have three major background threads running in my application; none explicitly created by me. The implicit Swing event thread, the implicit audio playback thread provided by Java sound and then the implicit main() thread in which I process all the data. When I started working with Java sound I used separate threads for just about everything but as I started to understand the architecture I realised that I didn't need more than these three threads since all my signal processing was synchronous with reading the sound samples. Your application may need extra threads but be sparing when spawning new threads because the synchronisation may cost you more than you gain.

    >
    If I could perform a 256 point FFT in 20 nSec (on a typical desktop), I'd be sending my resume to that organization in Norway that hands out the cool awards each year...
    256 points at 25k samples per second will mean you collect and process about 0.01 seconds at a time so you cannot resolve to better than 100Hz. If you then use a Hamming Window your resolution will degenerate to about 140 Hz. I work with the full sample rate of the audio and allow the user to define the collection period. Experimentation shows that a collection period of 0.2 seconds, a Hamming data Window and processing pairs of overlapping 0.2 second segments means I can resolve to better than 3 Hz which seem to be an acceptable compromise between visual update response and resolution.

    If you look at the bell shape of the Hamming window you will note that much of the sound is discarded. By overlapping segments by 50% one gets back much of this discarded data and improves the resolution.
    I'll look into the overlap to see the resultant display. I can see that by just holding two half-sized arrays (logically - they coud be one array) and filling one then the other, I can avoid extra copying.
    I don't think you will actually gain speed by doing this overlapping; copying is very fast compared to the other operations you are doing. If fact, since each transform is now twice as big, you will probably lose out.

    >
    1) I've not heard of the best straight line removal step. Would you explain more? From the sequencing of the steps you provided, it looks like that's still in the time domain, is that correct? What does it accomplish?
    For the most part, with sound samples, the removal of the best straight line accomplishes little and I really only used it because of my experience in another life when I was using spectral analysis for less peaceful purposes. In essence it can reduce some of the apparent low frequency caused by the finite collection period. I have only noticed any effect when music has a load of drum content.
    That's sounds like what I'm seeing - far too much low frequency bias - beyond what I would expect as "typical" musical tending to have greater low-frequency content. (There is a lot of drum/percussion in typical samples I'm using). Would you point me to some information on how to use this techique? I can try it and discard it if not needed - but I will have learned something in either case, which I enjoy.
    I just use the standard 'least squares' approach. Since it is orthogonal to the rest of the signal processing it is easy enough to add but don't get hung up on this. Remember that when it comes to sound frequency the ear is logarithmic so one might do better to accumulate the spectral density estimates into sub-octaves so that the one displays the sound energy in a sub-octave and then the dominance of the low frequency might disappear. I am currently working on this (using the musical note sub-octave of 1/12) but it is early days and I have no clear-cut advice to give.

    One area I have just started to look at is that of factoring into the spectral density display the response of an average ear. This does not appear to be straightforward but my main problem is finding a definitive guide. I suspect that when I can apply this the apparent dominance of the low frequency will diminish.

    >
    3) I planned on using a Hamming window, but when the prototype delay became a problem, I focused elsewhere. I now see that the spectral output (from FFT) is very biased towards the very low frequency side.
    Applying a window to the data will not change this bias since I believe that it is fundamental to music. Without using a window a single tone will present itself in the spectral estimates as significant values well away from the actual tone (a sin(2*pi*f/N)/f relationship). A window will concentrate this smear back into the neighbourhood of the single tone and limit the majority of the smear to the estimates immediately adjacent to the tone. Many years ago when I first worked on spectral estimation there was a load of literature detailing the advantages of one window over another. I did a project where I investigated the effect of windows and I showed that for the sort of application I was considering then a Chebyshev windows was pretty much optimal but that the difference between using a Chebyshev window and a Hamming window was small. This result seem to apply also to analysis of sound.

    My limited experience with sound indicates that a window is essential. I have implemented Hamming, Hanning and Triangular but Hamming seems to the best of these.
    Most music has more power in the lower bands, but this is more than I expected. I also realize I'm getting some aliasing since I'm sampling at 1/2 the Nyquist frequency - but I'd expect that effect to be not so concentrated in the low frequencies.
    This will cause all frequencies between half your sample rate (your effective Nyquist frequency) and the Nyquist frequency of the original samples (half the original sample rate) to be folded down into the range up to your effective Nyquist frequency. Sub-sampling is normally a very very very bad idea without first performing some low pass filtering of the original samples.
    Yes, I understand folding. I'm thinking that typical music will have lower power in the higher frequencies (particularly above 12kHz, where folding is occurring in my current implementation. Folding is a good description; the (actual) bands just above 12kHz will fold to just below 12kHz in my output. It would be the (actual) sound at 20+kHz that gets folded down to add to the low-frequency bias. The just-above-midrange content would fold to actually make the low-frequency bias seem less, relative to higher spectrum (not sure if I did a good job describing that clearly). So, I'm assuming what I'm seeing (that particular artifact) is not necessarily due to the folding caused by my half-sampling-rate. I agree that the lower sampling rate is not a good thing, it was (until now) a forced compromise. It's not obvious to me how I can add any anti-aliasing to avoid the folding without dealing with the full sample rate. I can't filter the audio before I get it, and once I get it the delay is a concern. I will try increasing my sampling rate to see what the delay looks like after the buffer size changes I've made.
    I allow my users to select the windowing but default to Hamming. If you are overly concerned with performance then Hanning (note the spelling difference) can be quicker than Hamming when implemented in the frequency domain.
    Big question: I precalculate the twiddle factors for use in the FFT. Are there frequency-domain coefficients available representing the different windowing schemes that I could just multiply by the existing twiddle factors for what would effectively be a "free lunch"? That would be great, but those free lunches rarely seem to be real... The only way I know to implement windowing is by an additional step multiplying the time domain samples by the windowing values.
    I don't think you can frig the FFT to include the windowing though I have not ever considered it since I want to keep a separation of concerns and I doubt if there will be more than a marginal time saving.

    >
    >>>
    4) For the step you call "compute the spectral density" - is that finding the modulus of the FFT real and imaginary values, or something more?
    Pretty much.
    Does that differ in some way from what you refer to in the next step as "spectral density estimates"?
    I don't really understand what you are asking.
    You used the term "spectral density" and then "spectral density estimates". When dealing with finite sampling periods I realize that the FFT results (not because of the FFT algorithm, which has identical results as the naiive DFT - but because of the finite sampling and non-periodic content in sample size<sampling rate) are estimates, but I wanted to be sure there isn't more meaning in what you've written than I am understanding.
    'spectral density' is a property of an ensemble and one estimates the 'spectral density' from a single member of that ensemble (a sound sample). I suspect this distinction is irrelevant to this discussion and can be ignored.
  • 8. Re: Does java.sound add delay to "live" audio streams (vs. native code)?
    captfoss Pro
    Currently Being Moderated
    In the time domain both Hanning and Hamming can be implemented as an array of pre-computed multipliers and once the array is computed the application of the window takes the same time for both.
    But computing the array of precompiled multipliers can be done faster for a Hanning window than a Hamming... so it's still faster in both domains, just not "as faster" in the spacial domain ;-)
    In the frequency domain one can make use of the simple additive implementation for the required convolution so Hanning is slightly cheaper BUT today's floating point processors make this micro optimisation pretty much unnecessary.
    AFAIK, people use Hamming windows for the same reason they use the US standard measurement system... they just do.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points