This discussion is archived
1 2 Previous Next 29 Replies Latest reply: Dec 16, 2010 4:45 PM by EJP Go to original post RSS
  • 15. Re: Collection Question in ArrayList
    824049 Newbie
    Currently Being Moderated
    Hi,

    I need only the 2nd highest. But one thing I forgot to mention is that , the student records can go up to 1 million of data and repetation of marks also should be there.

    Request all of you to suggest me something in programmetic way because its very difficult for me to understand and implement the words.

    Best Regards...
  • 16. Re: Collection Question in ArrayList
    818890 Newbie
    Currently Being Moderated
    Hi there,

    Might I suggest that you are unlikely to learn anything by just copy-pasting code.
  • 17. Re: Collection Question in ArrayList
    PhHein Guru Moderator
    Currently Being Moderated
    Kate Libby wrote:
    Hi there,

    Might I suggest that you are unlikely to learn anything by just copy-pasting code.
    He's also unlikey to be spoon fed here :)
  • 18. Re: Collection Question in ArrayList
    EJP Guru
    Currently Being Moderated
    Thanks for ur quick response.Cud u plz give ur code
    I've named the class and I've also named one of the methods you have to call. That plus the Javadoc should be quite sufficient. Could you please make the effort to write in standard English without the ambiguous abbreviations. Your employers and customers won't put up with that and neither will I.
    millions
    If you really have millions of data items, all bets are off: I would certainly use a database rather than trying to do it all in memory with any data structure.
    as for me its very difficult to understand the vocabulary.
    So learn it. If you can't understand the standard techical vocabulary yet, you need to make more of an effort. This is what it will be like for the rest of your career. And specifically I suggest you get used to the idea that nobody will do your work for you, at least not unless you pay them the money you're being paid to do it. And if this is a course assignment, the idea is that you learn something amd exhibit that understanding in your assignmments. You wouldn't want to hand in someone else's work as your own, would you?
  • 19. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    But we're not processing N elements.
    How do you figure? We have to process every element to know its rank before we can find out which ones are in the first two ranks. Our last two elements in the original unsorted list could be the two with the second highest score, and we can't know that until we've added them all to the PQ, which means we've done N number of insertions at a cost of log(N) each--i.e. O(N log(N)).

    Unless the PQ doesn't determine the ordering until we extract an item from it? But that would mean we'd have to look at each element every time we extract one, to determine which one is next, so the overall operation becomes O(N * M).
  • 20. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    In any case we are talking about small amounts of data.
    and in any case the OP asked for the optimal solution. Personally I would probably just use Collections.sort() as jverd suggested, as I personally wouldn't care about the optimal solution over a small dataset. But that doesn't answer the OP's question.
    And of course, OPs always ask the right question. ;-)
  • 21. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    821046 wrote:
    Hi,

    Thanks for ur immediate reply. Cud u plz give ur code as for me its very difficult to understand the vocabulary.
    Speaking of hard to understand, when you write using non-words like "plz" and "ur" and "cud", it makes you look like an idiot and makes it very annoying to read your posts.

    I'm not going to repeat everything I said while trying to guess at what you're having trouble understanding. If you need a clarification, be more specific about what's giving you difficulty.
  • 22. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    jverd wrote:
    EJP wrote:
    But we're not processing N elements.
    How do you figure? We have to process every element to know its rank before we can find out which ones are in the first two ranks. Our last two elements in the original unsorted list could be the two with the second highest score, and we can't know that until we've added them all to the PQ, which means we've done N number of insertions at a cost of log(N) each--i.e. O(N log(N)).

    Unless the PQ doesn't determine the ordering until we extract an item from it? But that would mean we'd have to look at each element every time we extract one, to determine which one is next, so the overall operation becomes O(N * M).
    So I wrote a little test. I don't want to post it here yet because I don't want to spoonfeed the OP. Here are the results though.

    For 1 million students, the sort approach took around 320 ms, and the PQ approach took around 190 ms.
    For 10 million students, the sort approach took around 3.3 sec. and the PQ approach took around 2.0 sec.

    These numbers are averages over several runs in one VM execution, with each run doing 30 passes. The times didn't deviate much from the averages, except that the very first run always took significantly longer, presumably because the vacuum tubes in the VM were warming up.

    Obviously there's a certain lack of statistical rigor here, but these numbers to suggest a couple of things to me:

    1. The PQ approach is consistently faster the the sorting approach by a more or less constant factor.

    2. Both approaches appear to grow at about the same rate.

    3. The raw numbers and the performance difference would not be enough for me to use the PQ approach unless that particular operation became a bottleneck, AND the improvement by that factor of 2:3 or so would make a difference in the big picture.

    4. Trying to determine any more concretely the best approach for the OP without actual requirements and use cases is futile speculation at best.
  • 23. Re: Collection Question in ArrayList
    EJP Guru
    Currently Being Moderated
    It doesn't sound like you guys know how a PriorityQueue works. Stop me if you've heard this before, but a PriorityQueue is a 'priority heap', which is not a memory-management-type heap but a binary-tree structure:

    1. A heap does a partial ordering when you insert, such that
    heap[i] <= heap[i*2] <= heap[i*2+1]
    for every i. This process takes O(log N) time per insertion, i.e. O(N log N) over the entire dataset, but because it's only doing a partial ordering it is much faster than a full O(N log N) sort (lower constant factor).

    This guarantees that the lowest element is at the first position (or the highest, depending on your comparator). It doesn't say anything about where the next item is except that you can find it in O(log N) time.

    2. When you remove that element, it then does another process to ensure that the constraint above holds again. This is also O(log N) per removal, with a low constant factor, and again it is much faster than a full O(N log N) sort.

    In the OP's situation, he has to do the partial ordering part over all N elements, but he only has to do the removal/re-partial ordering part over the first M elements that satisfy his constraint, plus 1, where M <= N.

    So the algorithmic complexity is O(N log N) with a low constant factor plus O(M log N), where M <= N, again with a low constant factor.

    How this stacks up against say Collections.sort(), which is also O(N log N), comes down to how the constant factors stack up, quality of implementation, etc.
  • 24. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    It doesn't sound like you guys know how a PriorityQueue works. Stop me if you've heard this before, but a PriorityQueue is a 'priority heap', which is not a memory-management-type heap but a binary-tree structure:

    1. A heap does a partial ordering when you insert, such that
    heap[i] <= heap[i*2] <= heap[i*2+1]
    for every i. This process takes O(log N) time per insertion, i.e. O(N log N) over the entire dataset,
    I'm with you so far. That's what I stated initially, except you added some detail.
    but because it's only doing a partial ordering it is much faster than a full O(N log N) sort (lower constant factor).
    Here's where you lose me. How is this any faster? We have to insert the entire data set, so it's the O(N log N) I stated previously and you confirmed just above.

    Or are you saying that it is still O(N log N), but just with a lower constant?
    2. When you remove that element, it then does another process to ensure that the constraint above holds again. This is also O(log N) per removal, with a low constant factor, and again it is much faster than a full O(N log N) sort.
    This part I was not aware of. Or forgot. I thought all ordering came at insertion time.

    In the OP's situation, he has to do the partial ordering part over all N elements, but he only has to do the removal/re-partial ordering part over the first M elements that satisfy his constraint, plus 1, where M <= N.

    So the algorithmic complexity is O(N log N) with a low constant factor plus O(M log N), where M <= N, again with a low constant factor.
    Okay, I'll buy that, and it's consistent with what little experimental data I have.

    Thanks for clarifying.

    Edited by: jverd on Dec 16, 2010 3:20 PM
  • 25. Re: Collection Question in ArrayList
    EJP Guru
    Currently Being Moderated
    Or are you saying that it is still O(N log N), but just with a lower constant?
    That's exactly what I did say. 'Lower constant factor'.
  • 26. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    Or are you saying that it is still O(N log N), but just with a lower constant?
    That's exactly what I did say. 'Lower constant factor'.
    Yes, you did say "lower constant factor." However, you did not say (until later in the post) that it is still O(N log N). This phrase tripped me up:
    because it's only doing a partial ordering it is much faster than a full O(N log N) sort (lower constant factor).
    The "faster than a full O(N log N)" at first sounded to me like you were saying that it's not a "full" O(N log N). I don't know what a "full" O(N log N) is vs. any other kind, but you seemed to be contrasting this approach with O(N log N), which in turn implies that this approach is not in fact O(N log N), but some O(something less). Possibly in addition to having a smaller constant? I don't know. It wasn't clear to me. That's why I asked.

    I just wanted clarification that you weren't actually saying that it was O(less than N log N), but only that the constant was lower. I got that confirmation later in your post, but didn't bother to go back and retract the question.
  • 27. Re: Collection Question in ArrayList
    EJP Guru
    Currently Being Moderated
    Yes, you did say "lower constant factor." However, you did not say (until later in the post) that it is still O(N log N).
    Bzzt. I quote: 'This process takes O(log N) time per insertion, i.e. O(N log N) over the entire dataset, but because it's only doing a partial ordering it is much faster than a full O(N log N) sort (lower constant factor).'.
    I don't know what a "full" O(N log N) is
    Nor do I. I was talking about a full sort.
  • 28. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    Yes, you did say "lower constant factor." However, you did not say (until later in the post) that it is still O(N log N).
    Bzzt. I quote: 'This process takes O(log N) time per insertion, i.e. O(N log N) over the entire dataset, but because it's only doing a partial ordering it is much faster than a full O(N log N) sort (lower constant factor).'.

    Note also that I said 'full O(N log N) sort'.
    Yes, I am perfectly aware of what you said. I noticed and understood "O(N log N) over the entire dataset," but I also noticed that it was immediately followed by "but because it's only doing a partial ordering it is much faster than a full O(N log N)".

    I initially considered that the contrast presented in the "but" part might have meant that, in toto, you were saying that it would be O(N log N) over the entire dataset if we had to touch every element, but because we're doing a partial ordering, we wouldn't touch every element, and hence your approach would be O(less than N log N). (The presence or absence or the word "sort" there has no bearing on my interpreting it that way--or rather, considering that that might be a valid interpretation.)

    I realized later that that was not what you meant.

    So what I did not see spelled out clearly and unequivocally was "The entire operation is still O(N log N), but what makes it faster is that the constants are smaller," and it is that which I was simply trying to get confirmed.

    Please consider the possibility that I did not know what was in your head when you wrote it, that I am coming from a position of less knowledge than you about both the data structure in question and about exactly what you were trying to convey, that I was trying to understand what it all meant, that the understanding was tenuous, and that hence I was asking for clarification/confirmation.

    I can't believe I'm sitting here justifying my confusion.
  • 29. Re: Collection Question in ArrayList
    796440 Guru
    Currently Being Moderated
    EJP wrote:
    I don't know what a "full" O(N log N) is
    Nor do I. I was talking about a full sort.
    Ah. "full" modifies "sort," not "O(N log N)." That makes much more sense.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points