Forum Stats

  • 3,839,341 Users
  • 2,262,485 Discussions
  • 7,900,938 Comments

Discussions

Efficiently loading data from large pcap file into TableView

Banzai
Banzai Member Posts: 4
edited Apr 24, 2017 4:16AM in JavaFX 2.0 and Later

Hi there!

I'm working on a bit of software to visualize packets from a .pcap file in a TableView, very similar to Wireshark (columns for source ip, destination ip, protocol, etc.).

Try 1:

At first I used this tutorial: http://code.makery.ch/library/javafx-8-tutorial/part1/

That way I simply loaded all data into an Observable List and bound it to the TableView. But as I work with 100mb+ pcap files, that ate a lot of RAM (~1.25gb), so I scrapped it.

Try 2:

Then I scanned the pcap file for the starting bytes of each packet using mappedbytebuffers. So I have an arrayList containing these start bytes of each packet. Then I adjusted the Property getters to instead navigate to that byte and decode from there. So instead of simply reading the string from an object, it reads from the file, meaning the actual observablelist contains just a bunch of objects that contain nothing more than a number, which I use to identify the packet. This worked nicely for visualizing the data and used very litte ram, but filtering doesn't work anymore. As described in the guide, I wrap the observablelist in a filteredlist which is then wrapped in a sortedlist. When i type something into the textarea I use for filtering, the app freezes and usually crashes after a few minutes, sometimes taking eclipse with it. Note that that only happens with a lot of rows in the tableview.

Try 3:

Similar to Try 2, I made custom cellfactories for each column which call the decoding methods. I'm still trying things here, but right now the tableview stays completely empty.

What do you think is the best approach for what I am trying to do? If you want to see specific parts of my code or need further explanation, do tell me. I wasn't sure what part of the code to post as it's a lot.

Thank you!

Message was edited by: e7a881fb-9a9a-48b1-849e-c045412edb47 reason: typo

Banzai

Best Answer

  • jsmith
    jsmith Member Posts: 2,856
    edited Apr 21, 2017 6:58PM Answer ✓

    ObservableList is an interface, so you could create your own concrete implementation that is backed by something like a google Guava Iterable rather than the default concrete in-memory collection of an ArrayList.  The iterable based implementation would load the data currently visible to the user on demand.  Unfortunately, this kind of approach would likely be difficult to implement, especially for sorting and filtering as classes such as FilteredList are concrete and based upon ObservableListBase, thus I assume that those implementations assume access to all of the data in memory (I could be wrong there), which is what I think you are trying to avoid.

    Google's definition of an Iterable seems similar to what you want, just a shame that the idea is natively implemented into the core JavaFX classes:

    Whenever possible, Guava prefers to provide utilities accepting an Iterable rather than a Collection. Here at Google, it's not out of the ordinary to encounter a "collection" that isn't actually stored in main memory, but is being gathered from a database, or from another data center, and can't support operations like size() without actually grabbing all of the elements.

    Pagination sample:

    JavaFX TableView Paginator - Stack Overflow

    Though I understand that isn't what you are looking for either...

    So, I think you are probably on the right track with your Try #2 approach of having an ObservableList representing just an index to every table row, and having the cell value factories extract the values for each cell on demand from the file.  Care would still need to be taken to ensure that the file access is efficient (perhaps the Guava libraries could help there, perhaps not, I'm not sure as I haven't used them a lot).  But if it is working OK without sorting and filtering, then maybe the performance of what you already have is fine and you don't need to invest a lot of extra work there.

    What will cause issues, as you have found, is sorting is filtering.  100mb files are just a lot of data to sort and filter.  You will probably need concurrent facilities to do it efficiently, especially if you don't want to load everything into main memory.  The javafx concurrent libraries may help.  The in-built filtering and sorting facilities via FilteredList probably won't help as I think they assume everything is available in memory, so you will likely need to implement your own logic to handle sorting and filtering concurrent to the UI thread so that the UI thread is not blocked and main memory consumption remains low (probably a non-trival task).

    If sorting and filtering is a big, important feature and you can run a separate server to support it, then it might even be worth investing time into importing the pcap data into elastic search and using that for the sorting and filtering back end functions (also a non-trivial task).

    BanzaiBanzai

Answers

  • bouye-JavaNet
    bouye-JavaNet Member Posts: 394 Silver Badge
    edited Apr 19, 2017 8:09PM

    You may want to tackle on a page approach like when you are querying a large DB or doing a Google search or using an online shop/catalog. Obviously having a single ObservableList with the whole content in it is not an option in this case.

    Banzai
  • Banzai
    Banzai Member Posts: 4
    edited Apr 20, 2017 2:23AM

    I'm sorry, but could you elaborate? How would I go about paging my pcap file? I'm a little out of my depth here.

  • bouye-JavaNet
    bouye-JavaNet Member Posts: 394 Silver Badge
    edited Apr 20, 2017 7:41AM

    Is there a way for you to know the number of objets in your file without reading it fully? And to extract a subset of the objects from the file as well?

    If so, display the 1st, let says 100 values in the TableView, and provide a facility (buttons) for the user to switch to the next set of 100 objects, etc. Knowing the total number of objects and the number of object per view will allow you to compute the number of page to display.

    Filtering, ordering and searching will require a more backend work though, especially if you have to do it on the whole set of objects.

    Also in #2 you mention the app freezing when accessing values in the file. Do you use Service/Task for background job or are you doing the extraction directly in the JavaFX thread?

  • Banzai
    Banzai Member Posts: 4
    edited Apr 21, 2017 6:11AM

    No. I don't think I can know the number of packets without scanning it completely, as some packets are longer than others etc.

    I played around with the idea of only showing 100 rows at a time, but ultimately decided against it. The user needs to be able to scroll back and forth very quickly for comparisons and it's much easier to use when they're all in one place.

    I don't know much about threads. I didn't manually open a new one to handle the filtering if that's what you're asking.

  • jsmith
    jsmith Member Posts: 2,856
    edited Apr 21, 2017 6:58PM Answer ✓

    ObservableList is an interface, so you could create your own concrete implementation that is backed by something like a google Guava Iterable rather than the default concrete in-memory collection of an ArrayList.  The iterable based implementation would load the data currently visible to the user on demand.  Unfortunately, this kind of approach would likely be difficult to implement, especially for sorting and filtering as classes such as FilteredList are concrete and based upon ObservableListBase, thus I assume that those implementations assume access to all of the data in memory (I could be wrong there), which is what I think you are trying to avoid.

    Google's definition of an Iterable seems similar to what you want, just a shame that the idea is natively implemented into the core JavaFX classes:

    Whenever possible, Guava prefers to provide utilities accepting an Iterable rather than a Collection. Here at Google, it's not out of the ordinary to encounter a "collection" that isn't actually stored in main memory, but is being gathered from a database, or from another data center, and can't support operations like size() without actually grabbing all of the elements.

    Pagination sample:

    JavaFX TableView Paginator - Stack Overflow

    Though I understand that isn't what you are looking for either...

    So, I think you are probably on the right track with your Try #2 approach of having an ObservableList representing just an index to every table row, and having the cell value factories extract the values for each cell on demand from the file.  Care would still need to be taken to ensure that the file access is efficient (perhaps the Guava libraries could help there, perhaps not, I'm not sure as I haven't used them a lot).  But if it is working OK without sorting and filtering, then maybe the performance of what you already have is fine and you don't need to invest a lot of extra work there.

    What will cause issues, as you have found, is sorting is filtering.  100mb files are just a lot of data to sort and filter.  You will probably need concurrent facilities to do it efficiently, especially if you don't want to load everything into main memory.  The javafx concurrent libraries may help.  The in-built filtering and sorting facilities via FilteredList probably won't help as I think they assume everything is available in memory, so you will likely need to implement your own logic to handle sorting and filtering concurrent to the UI thread so that the UI thread is not blocked and main memory consumption remains low (probably a non-trival task).

    If sorting and filtering is a big, important feature and you can run a separate server to support it, then it might even be worth investing time into importing the pcap data into elastic search and using that for the sorting and filtering back end functions (also a non-trivial task).

    BanzaiBanzai
  • Banzai
    Banzai Member Posts: 4
    edited Apr 24, 2017 4:16AM

    Thank you, some very good suggestions. It especially helps to hear that something I've tried (try #2) isn't completely dumb, as I'm really working with things here that go quite a bit beyond what I've ever done before.

    I'll keep the options for filtering in mind, but for now I had a new idea which I want to try: Since the scanning of the file for packet starting points is fairly quick, I'll create an option window for the user to enter all filtering criteria and then rescan the file while only remembering the starting points of packets matching the criteria. Sounds somewhat simple to do in my head for now.

    ps: This is my first thread on this site, is there an "Accept as Answer" button I'm just not seeing?

    edit: nevermind, I thought "Correct answer" meant that I could literally correct the answer.

    Thank you all!

This discussion has been closed.