This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Sep 27, 2012 1:33 AM by EJP RSS

DatagramPacket content

898586 Newbie
Currently Being Moderated
I'm wondering if, and then how, it could be determined whether a DatagramPacket for voice is carrying any useful payload, or whether it is simply a packet with obtained targetdataline input, but input where the person was not actively speaking - so the packet in human terms can be considered "empty", whilst still being otherwise the same as packets with material in them.
  • 1. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    Has to be a subjective assessment ! I can't see why this would be necessary in the normal way of things but I would just check for the amplitude of ALL samples less than some value.
  • 2. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    sabre150 wrote:
    Has to be a subjective assessment ! I can't see why this would be necessary in the normal way of things but I would just check for the amplitude of ALL samples less than some value.
    I'm honoured to see you commenting in my question, as I know you have a big background in this area. OTOH I'm somewhat nervous that what is likely to be a less than informed approach to the whole question on my part, may not go down very well. :/

    Anyway . . . I thought that it would be a straightforward matter to measure the relative amplitudes of the packets, and simply drop those which did not meet 'some' threshold. Whilst a conversation with only two interlocutors is of very acceptable quality, a third entering the conversation trashes the continuity and understandability. (At times one machine will not output anything at all, whilst the other one renders a perfect audio stream, then, sometimes, both give out fragments of the conversation simultaneously, and so on and so on, but never a clear rendering between any two of the 3 machines at the same time).

    The class I put together to make the judgement on those levels inside the packets effectively 'samples' packets every now and again, as they are being sent, and takes an average. The sending routine then looks at the "actual" packet it is dealing with, and compares its amplitude to the one being obtained from the monitoring class, and sends the packet only if it meets that threshold. I thought this sounds close to what you are referring to above . . . but no doubt you mean something else . . . ?
  • 3. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    895583 wrote:
    I'm honoured to see you commenting in my question, as I know you have a big background in this area. OTOH I'm somewhat nervous that what is likely to be a less than informed approach to the whole question on my part, may not go down very well. :/
    Please don't butter me up and denigrate yourself!

    I don't "have a big background in this area". I actually have quite a small backgound in sound and Java sound but have a strong background in general signal processing. In this instance you are not concerned with either signal processing nor Java sound so I have to fall back to common sense.

    Common sense tells me that a packet is irrelevant if it contains little or no sound samples that can be heard by the listener and my first reply indicated how I would tackle this.

    >
    Anyway . . . I thought that it would be a straightforward matter to measure the relative amplitudes of the packets, and simply drop those which did not meet 'some' threshold.
    I don't see how the relative amplitude comes into this. I would have thought only the absolute amplitude matters. The ear may have some "automatic gain control" feature that makes this relative amplitude significant but as I said - I know little about sound (and the ear).
    Whilst a conversation with only two interlocutors is of very acceptable quality, a third entering the conversation trashes the continuity and understandability. (At times one machine will not output anything at all, whilst the other one renders a perfect audio stream, then, sometimes, both give out fragments of the conversation simultaneously, and so on and so on, but never a clear rendering between any two of the 3 machines at the same time).
    I suspect I am missing some background information that makes this understandable.

    >
    The class I put together to make the judgement on those levels inside the packets effectively 'samples' packets every now and again, as they are being sent, and takes an average. The sending routine then looks at the "actual" packet it is dealing with, and compares its amplitude to the one being obtained from the monitoring class, and sends the packet only if it meets that threshold.
    I still don't see how a comparison with other packets matters. Can you provide some reasoning for this approach?
    I thought this sounds close to what you are referring to above . . . but no doubt you mean something else . . . ?
    Definitely not what I was referring to since my thresholds are absolute. Of course I could have totally the wrong idea of what you are trying to achieve.

    Edited by: sabre150 on Sep 26, 2012 2:48 PM
  • 4. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    Please don't butter me up and denigrate yourself!
    Margarine instead?
    I don't "have a big background in this area".
    Sorry. Looked like that to me from your previous submissions on the forum. Genuinely sorry.
    Common sense tells me that a packet is irrelevant if it contains little or no sound samples that can be heard by the listener and my first reply indicated how I would tackle this.
    I agree only with the common sense part of the statement : that's what I thought, but common sense in this case is evidently either irrelevant, or is being mis-applied. By me. The idea of this question is partly to determine which.
    I don't see how the relative amplitude comes into this. I would have thought only the absolute amplitude matters. The ear may have some "automatic gain control" feature that makes this relative amplitude significant but as I said - I know little about sound (and the ear).
    The machines I test this on are of varying specs. Raw output to screen of the levels picked up by the mics vary considerably. It would be next to useless to think that a global, absolute figure could ever cater for a range like that. Moreover, simply think of a person moving nearer or closer to the mic - this increases the volume. So does turning the volume up. I'm not anywhere near dealing with those issues yet, and the starting point has to be to control the inputs and keep them within tolerances, and try to get a steady throughput first on that basis.

    I suspect I am missing some background information that makes this understandable.
    That sounds like the same background that I am missing then.
    I still don't see how a comparison with other packets matters. Can you provide some reasoning for this approach?
    A comparison matters for the reasons I have given above. AND it also matters "locally", because I'm trying to decide what is speech, and what isn't.
    Definitely not what I was referring to since my thresholds are absolute. Of course I could have totally the wrong idea of what you are trying to achieve.
    You tell me how to obtain a global absolute, and I'll bite your hand off for it in eager gratitude.

    Edited by: 895583 on 26-Sep-2012 07:05
  • 5. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    895583 wrote:
    You tell me how to obtain a global absolute, and I'll bite your hand off for it in eager gratitude.
    I'm talking 'absolute' in terms of the sample values and not a global absolute! One has no measure of global absolute unless one knows details of all the transformations (attenuation due to distance between source and microphone, microphone sound to electrical analogue, AtoD converter) that take place between a sound source and the sampler. Similarly on playback!
  • 6. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    That won't work. There always has to be a range of acceptable values. If a motorbike goes by the window, the algo would stop sending all subsequent voice packets because the max would just have elevated them out of reach.
  • 7. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    895583 wrote:
    That won't work. There always has to be a range of acceptable values. If a motorbike goes by the window, the algo would stop sending all subsequent voice packets because the max would just have elevated them out of reach.
    You have totally lost me. What won't work? How does a motorbike going by the window stop the sending of subsequent voice packets?

    Are you trying to implement some form of automatic gain control?
  • 8. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    I'm lost too.

    Say I println() the values in the packets (given by taking an average of the bytes' contents). If I get a series of high-low-high - (meaning speech, no speech, speech) - if that last high was when the bike passed the window, then the next series of high-low-high better see the two new highs match the bike high, or else the new packets won't get sent.
  • 9. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    895583 wrote:
    Say I println() the values in the packets (given by taking an average of the bytes' contents). If I get a series of high-low-high - (meaning speech, no speech, speech) - if that last high was when the bike passed the window, then the next series of high-low-high better see the two new highs match the bike high, or else the new packets won't get sent.
    From this I gather that you are still talking about comparing frame to frame! You do seem to be trying to implement some form of adaptive gain control system and from what I can see you need to compute these high-lo values and use some form of filter to average them over a number of frames. This way the motorbike will not dominate unless it is sat outside revving it's engine for a long time.

    I would expect that the average should be taken over about 20 seconds worth of data but you would have to tune this value. A simple lag filter

    x(n+1) = alpha * x(n) + (1 - alpha) * u(n) ( the value of alpha ( alpha in [0,1) ) would depend on the frame length)

    would be the easiest to implement but you might find a moving average filter more understandable.
  • 10. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    Would you think it worth looking at the chunk of code I use to do that assessment?
  • 11. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    895583 wrote:
    Would you think it worth looking at the chunk of code I use to do that assessment?
    Only if you can explain in detail what the object of the "assesment" is because unless it is some form of adaptive control of volume I am really lost.
  • 12. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    The assessment : each packet's 5120 bytes are summed and averaged to obtain a median value which expresses the packet in one figure. So a 'typical' high value of this kind might be 70 when speech is taking place. When there is relative silence, that figure could be 14, 0, 8, 11, or some such ball-park figure. Based on this figure, the packet is either dropped or sent for consumption.

    In order to help assure variable valid high values, a separate thread monitors these figures to set the 'pass' level - in case, as I said before, for example, the speaker moves away from his mic, or some other stochastic enters the frame.
  • 13. Re: DatagramPacket content
    sabre150 Expert
    Currently Being Moderated
    I really don't understand the problem. You have a method of deciding whether or not to send a packet but you don't like the fact that some external sound can void the decision. Unless you can filter out the external sound (most unlikely) you can do nothing about this. Viewing the code will not help since the problem is not with the code but with your expectation as to what the code should be able to do.
  • 14. Re: DatagramPacket content
    898586 Newbie
    Currently Being Moderated
    I see the problem as this :

    If I display (on each local machine) the byte values in a packet when only two people are speaking, I get a) very decent voice quality, b) a stream of numbers on the screen. The variation in those numbers reflects whether the person on the particular machine is speaking or not at any point. Low numbers are indicative of him not speaking. If I do the same thing as I've just described with 3 people in the conversation, a) suffers enormously, and the pattern of b) persists. So I introduced a thread that blocks transmission of low-value packets. This new thread has no noticeable effect on the 3-person conversation. I do not understand why that is so.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points