This discussion is archived
4 Replies Latest reply: Jun 30, 2009 11:23 PM by 807588 RSS

Unicode string serach problem

807588 Newbie
Currently Being Moderated
Hi,
I am new in Java. Problem is the Unicode string. I have a Unicode file where I am searching a few words/sentence. The search string is defined as
String StringToBeSearch = "Because the Agent software is already";
When I am opening the file and try to compare it with the file content, it don’t succeeds. If I am opening that Unicode text file the contents are there.
What print the every line of the txt file and observed that the every latter have one more space like word “Because” is printed as “B e c a u s e”. Hence the search has failed.

public static boolean ReadFileAndSearchString()
               {
               String FileName = "Install.txt";
               String StringToBeSearch = "Because the Agent software is already";

               boolean found = false;
                    
          File file = new File(FileName);
          FileInputStream fis = null;
          BufferedInputStream bis = null;
          DataInputStream dis = null;

          try {
          fis = new FileInputStream(file);

          // Here BufferedInputStream is added for fast reading.
          bis = new BufferedInputStream(fis);
          dis = new DataInputStream(bis);

          // dis.available() returns 0 if the file does not have more lines.
          while (dis.available() != 0) {

          // this statement reads the line from the file and print it to
          // the console.
               String line = dis.readLine();
               System.out.println(line);

               int ret = line.indexOf(StringToBeSearch);
               if ( ret >= 0 )
               {
                    System.out.println("Got The text");
                    found = true;
                    break;
               }
          }

          // dispose all the resources after using them.
          fis.close();
          bis.close();
          dis.close();

          } catch (FileNotFoundException e) {
          e.printStackTrace();
          } catch (IOException e) {
          e.printStackTrace();
          }          
                    
               return found;
          }

I also tried with the following code

FileInputStream fis = new FileInputStream(FileName);
InputStreamReader isr = new InputStreamReader(fis,               "UTF8");

but did not worked.
  • 1. Re: Unicode string serach problem
    807588 Newbie
    Currently Being Moderated
    Hi,

    I ran your code its working pretty fine in my machine.

    Can u explain me more precisely what the problem actually is?

    Otherwise I ran this code and try to find your keyword in my file.

    I got the message "Got The text".

    Regards,
  • 2. Re: Unicode string serach problem
    807588 Newbie
    Currently Being Moderated
    Hi,
    Thanks for your reply. We are searching in the Unicode file. And this forum doesn’t provide the attachment. Hence unable to provide the exact file. It’s working fine with ASCII file.
    Only problem with Unicode file. There is two character for one letter ( one extra space) is coming from the Unicode file. If you can provide your email then I can send that txt file.
  • 3. Re: Unicode string serach problem
    807588 Newbie
    Currently Being Moderated
    It sounds to me like you need to use a BufferedReader and to specify the character encoding of the file to be utf16. You can do that using
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(Filename),"utf16"));
    and then read the lines from the BufferedReader . There is no need for the DataInputStream.

    Also, don't use available(). It tells you nothing useful here. Just loop reading lines until you have reached your target or end-of-file which will be signified by the readLine() method returning 'null' .
  • 4. Re: Unicode string serach problem
    807588 Newbie
    Currently Being Moderated
    Thanks. It works. I was tested with UTF-8 but not with UTF-16.
    Thanks again.