2 Replies Latest reply: Apr 17, 2009 3:33 AM by 800282 RSS

    String parsing quetion

    807588
      Ok, I have this code,
      import java.io.BufferedReader;
      import java.io.IOException;
      import java.io.InputStreamReader;
      import java.net.MalformedURLException;
      import java.net.URL;
      import java.util.HashMap;
      
      
      public class YouttubeDownloader {
      
           /**
            * @param args
            */
           public static void main(String[] args) {
                // TODO Auto-generated method stub
                URL yahoo = null;
                try {
                     yahoo = new URL(
                               "http://www.youtube.com/watch?v=LhJAuE51CUE");
                } catch (MalformedURLException e1) {
                     // TODO Auto-generated catch block
                     e1.printStackTrace();
                }
                BufferedReader in = null;
                try {
                     in = new BufferedReader(new InputStreamReader(
                               yahoo.openStream()));
                } catch (IOException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
                }
      
                String inputLine;
                HashMap<String, String> map = null;
                boolean hdAvailable = false;
                try {
                     while ((inputLine = in.readLine()) != null){
                          if(inputLine.contains("var swfArgs")){
                               map = parseLine(inputLine);
                          }else if( inputLine.contains("var isHDAvailable =")){
                               String temp = inputLine;
                               temp = temp.replace("var isHDAvailable =", "");
                               temp = temp.replace(";", "");
                               temp = temp.replace(" " , "");
                               log(temp);
                          }
                     }
                } catch (IOException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
                }
           }
      
           private static HashMap<String, String> parseLine(String inputLine) {
                // TODO Auto-generated method stub
                HashMap<String, String> output = new HashMap<String, String>();
                String input = inputLine;
                input = input.replace("var swfArgs = {", "");
                input = input.replace("};", "");
                input = input.replaceAll(" ", "");
                String[] split = input.split(",");
                String t = "";
                for(String map : split){
                     String key = null;
                     String value = null;
                     map = map.replaceAll("\"", "");
                     log(map);
                     String[] keyAndValue = map.split(":");
                     try {
                          if(!keyAndValue[1].equals("")){
                          output.put(keyAndValue[0], keyAndValue[1]);
                          }else{
                               t = t + keyAndValue[0] +",";
                          }
                     } catch (java.lang.ArrayIndexOutOfBoundsException e) {
                          // TODO Auto-generated catch block
                          t = t + keyAndValue[0] +",";
                     }
                }
                if(t.endsWith(",")){
                     StringBuffer sb = new StringBuffer(t);
                     log(""+sb.lastIndexOf(","));
                     t = sb.deleteCharAt(sb.lastIndexOf(",")).toString();
                     log(t);
                }
                return null;
           }
      
           private static void log(String map) {
                // TODO Auto-generated method stub
                System.out.println(map);
           }
           
      }{code}
      What it does is it gets all the urls lines and checks if the line is containing a "var swfArgs" or a "var isHDAvailable =", if it see's the "var swfArgs" then it will take away the "var swfArgs = {" from it then the "};" from it. Then it splits the code by each comma. Then it splits by a ":". 
      The split normally looks like:
      {code}"video_id": "LhJAuE51CUE"{code}
      The key is the first one such as: 
      {code}"video_id"{code}
      the value being:
      {code}"LhJAuE51CUE"{code}
      But since I split using a "," and one of the splits look like:
      {code}"fmt_map": "18/512000/9/0/115,34/0/9/0/115,5/0/7/0/0"{code}
      I get an error since the value HAS comma's.
      
      *My question is, how can I work around that? Please help!*
      
      Edited by: CButz on Apr 16, 2009 10:27 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
        • 1. Re: String parsing quetion
          807588
          Sounds like now's a good time for you to learn [Regular Expressions|http://java.sun.com/docs/books/tutorial/essential/regex/].
          • 2. Re: String parsing quetion
            800282
            Use at own risk:
            Scanner htmlFile = new Scanner(new URL("http://www.youtube.com/watch?v=LhJAuE51CUE").openStream());
            while((match = htmlFile.findWithinHorizon("(?si)var\\s++(?:swfArgs|isHDAvailable)\\s++=\\s++\\{.*?};", 0)) != null) {
              match = match.replaceAll("^[^{]++\\{|};$", "");
              String[] keysValues = match.split(",\\s++(?=\")|:\\s++");
              System.out.println(match);
              for(int i = 1; i < keysValues.length; i+=2) {
                System.out.printf("%-30s -> %s\n", keysValues[i-1], keysValues);
            }
            }{code}

            (not properly tested!)