This discussion is archived
4 Replies Latest reply: Apr 3, 2013 1:06 PM by jschellSomeoneStoleMyAlias RSS

How to store webpage content into sqlserver

1000390 Newbie
Currently Being Moderated
HI All,

I would like to thank all guys who are helping here.

I need some help I will be very happy if guys can help me I am sure you will helping as the question which I am about to ask is nothing to the seniors here. My query goes here...

I have a page called www dot abc dot com and there is some content in that page now I want to store this content in an
sqlserver.

I heard that it can be achieved using DOM can you guys help in this. If not in DOM please suggest me the possible ways.

Bunch of thanks in advance.Please help me ASAP as it is very urgent to my project.

Sowmya
  • 1. Re: How to store webpage content into sqlserver
    gimbal2 Guru
    Currently Being Moderated
    997387 wrote:
    I heard that it can be achieved using DOM can you guys help in this. If not in DOM please suggest me the possible ways.
    Using DOM you can parse (X)HTML data into a tree structure that you can then query for information; if the page is very badly formatted HTML that may not even work however. It is not a magic tool to rip a page off of the internet, you'll have to do plenty of programming to be able to load the content, interpret it and then get all the additional resources that a page links to (images, javascript files, stylesheets). There are pre-built tools that can do it for you, but that's about as much as I am going to say on this subject - it is not a common business requirement to do this. For all I know you want to use this for illegal purposes.
  • 2. Re: How to store webpage content into sqlserver
    aksarben Journeyer
    Currently Being Moderated
    How you approach this depends on what kind of "content" you mean. If you're referring to the HTML, simply store it as a string (not familiar with sqlserver, but I assume it can handle variable length strings). On the other hand, if "content" means images, videos, sound files, etc., then just store them as binary objects (again, assuming your database lets you do that).

    But as the other other poster noted, this is a very odd requirement. What are you really trying to accomplish? And, what has this got to do with Java? Web pages aren't tied to a programming language.
  • 3. Re: How to store webpage content into sqlserver
    1000815 Newbie
    Currently Being Moderated
    Hi

    A simple solution for your problem, would be parse the HTML page on plain text format (String) and then save in your Data Base in this same format. Finaly, to get back the html page only print this text (String) on HTML page in the same way you print the information from DB. Would be search the next tips "Convert a string of HTML into DOM objects with jQuery"

    Goodbye!
  • 4. Re: How to store webpage content into sqlserver
    jschellSomeoneStoleMyAlias Expert
    Currently Being Moderated
    997387 wrote:
    HI All,

    I would like to thank all guys who are helping here.

    I need some help I will be very happy if guys can help me I am sure you will helping as the question which I am about to ask is nothing to the seniors here. My query goes here...

    I have a page called www dot abc dot com and there is some content in that page now I want to store this content in an
    sqlserver.

    I heard that it can be achieved using DOM can you guys help in this. If not in DOM please suggest me the possible ways.
    You CANNOT store "www dot abc dot com " any where because that is a site, not data.

    So FIRST you must first get the data from that site. How you do that depends on the site and the data you want. This might be a trivial step or in might be very, very complicated. This step has nothing to do with storing anything.

    If the sight is VERY simple or you have NO need to store images then you might end up with a single html string which can be stored. However that is very unlikely. If not then you must retrieve all of the different parts from the site, and determine how you want to store those parts. A simple mechanism would be to store the tree (because an extracted html site is going to be representable that way.)

    SECOND then you determine how you want to store it. Unless the page is very simple or you extract almost nothing from the page then you would need to either create an significant data structure in the database or you would need to store the entire tree as a blob. Doing that requires that you convert the tree into a single binary format and then store it. You can google for example for storing retrieving blobs.

    Finally you presumably have researched exactly what your legal right is to download the contents of that site and store it. Even if you own the site there might be limitations on the content (such as picitures) that disallow you from storing it.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points