Forum Stats

  • 3,827,149 Users
  • 2,260,746 Discussions
  • 7,897,182 Comments

Discussions

How to verify a string is a proper english word or not ?

user575089
user575089 Member Posts: 466
edited Jan 5, 2017 3:19PM in Java Programming

Hello ,

I have a  CSV file where almost 300000  strings present  in a column.

I want to read those strings and check which are proper meaningful english words available in dictionary.

Could you please tell how to verify if a string is a proper meaningful english word and available in dictionary. I'm stuck at this part.

basically , I'd throw out garbage strings and list only proper meaningful english strings .

Thanks

Answers

  • Unknown
    edited Dec 29, 2016 4:24PM
    Could you please tell how to verify if a string is a  proper meaningful english word and available in dictionary. I'm stuck at this part.

    You already know the answer to that - you just said it yourself.

    Look the word up in a dictionary. Obtain/create a dictionary of words that YOU accept as 'proper meaningful english word' and then look up each word to see if it is in there.

    Not sure how you expect any other answer.

  • user575089
    user575089 Member Posts: 466
    edited Dec 29, 2016 10:01PM

    >>>>>Look the word up in a dictionary. Obtain/create a dictionary of words that YOU accept as 'proper meaningful english word' and then look up each word to see if it is in there.

    Is there any  english dictionary API  in Java  ?

  • Unknown
    edited Dec 29, 2016 11:11PM

    No - you either need to create your own or find one one the internet.

    Even then you need to take the needed language(s) into account as well as any need to technical words that might not be in a standard dictionary.

  • rpc1
    rpc1 Member Posts: 1,503
    edited Dec 30, 2016 12:41AM

    I think you could try spell checker, it do the same things, checks that word exists and spell right.

    Best spell checking api for Java - Stack Overflow

    Regards,

    Dmitry

  • Radu Viorel-Oracle
    Radu Viorel-Oracle Member Posts: 6
    edited Jan 5, 2017 4:18AM

    Other alternative would be to use an online dictionary or have a dictionary downloaded locally.

    And could make yourself a dictionary checker.

    Something like this:

    import java.io.BufferedReader;import java.io.FileReader;import java.io.IOException;public class CheckWord {       private String word;  public CheckWord() {  // TODO Auto-generated constructor stub  }  public  boolean  wordCheck(String word) {        this.word = word;         try {                      BufferedReader in = new BufferedReader(new FileReader("C:/Users/Ezi/Downloads/pg29765.txt"));             String str;             while ((str = in.readLine()) != null) {                 if (str.indexOf(word) != -1) {                     return true;                 }             }             in.close();         } catch (IOException e) {         }         return false;  }}

    "C:/Users/Ezi/Downloads/pg29765.txt"  - is the windows location of the dictionary file

    http://www.gutenberg.org/cache/epub/29765/pg29765.txt

    import java.util.Scanner;public class MainClass {  public static void main(String[] args) {  // TODO Auto-generated method stub  Scanner scan = new Scanner(System.in);  String word = scan.nextLine();  CheckWord check = new CheckWord();  if(check.wordCheck(word)){  System.out.println(word + " is an English word");  }  else  {    System.out.println(word + " it is NOT an English word");  }  }}

    2017-01-05 11_17_26-workspace - Java - Dictionary_src_MainClass.java - Eclipse.png

    Hope this helps,

    Cheers

    Radu

  • morgalr
    morgalr Member Posts: 457
    edited Jan 5, 2017 3:19PM

    It's pretty much as everyone is saying. I worked this problem about 25 years ago and you just parse out the words you want to check and check them against the spell checker or dictionary. The problems you are going to run into is: what do you want to accept an an authoritative dictionary and do you want to check alternate (near miss spellings).

    The reason I worked the problem was for a game that I was playing--a word jumble. It was pretty straight forward to do the K! combination of letters for N jumbles and run them against the spell checker.  With 300K words per line though, you may be hitting some speed problems for your calling overhead, so consider a spell checker that you can put in local memory.

    Also do you want or need to do a review of the rejected words? Therein is an entirely different problem--how to check the words that are not in the work dictionary?

This discussion has been closed.