potty

6 posts

What is Java?

Java is an high-level programming language created by James Gosling from Sun Microsystems on 1991, released on 1995. Nowadays, Oracle Corporation has the steermanship for Java. Its license is GNU General Public License (GPL).

Is executed on more than 850 millions of devices on all over the world. Java applications are compiled to bytecode that can run on any Java Virtual Machine (JVM), regardless of the computer architecture. According to TIOBE Software, Java is the second most used programming language.

As stated by Sun Microsystems, there were five primary principles in the creation of Java:

  1. Simple, object-oriented and familiar.
  2. Robust and secure.
  3. Architecture-neutral and portable.
  4. High performance.
  5. Interpreted, threaded and dynamic.

Versions

These are the major releases of Java with their respective dates:

                                            
JDK 1.001/21/1996
JDK 1.102/19/1997
J2SE 1.212/08/1998
J2SE 1.305/08/2000
J2SE 1.402/06/2002
J2SE 5.009/30/2004
Java SE 612/11/2006
Java SE 707/28/2011
Java SE 803/18/2014

Since the release of JDK 5.0, Oracle changed the version numbering scheme for minor releases:

  • Limited Update Releases: Numbered in multiples of 20. These represent the release of new functionalities.
  • Critical Patch Updates: Continue to use odd numbers. It include fixes for security vulnerabilities.

Distributions

Java, as many programming languages, has suffered changes along its history. Today, Java have three main distributions with their particular features: 

J2SE

Java 2 Standard Edition. Oriented to the development of client-server applications. Does not include the technologies for web applications. It is the base for the other two distributions and is the most used platform.

J2EE

Java 2 Enterprise Edition. Oriented to the enterprises and the consolidation of their information systems. Do include the technologies for web applications (server and client side). Its base is J2SE.

J2ME

Java 2 Micro Edition. Oriented to small mobile devices (phones, tablets). Does not include the technologies for web applications. Its base is J2SE but it have extra features.

The following diagram shows the relation between each distribution:

JDK, JRE, JVM

These are the most confusing terms for Java new programmers. The following diagram explains:

JDK

Java Development Kit: Contains all the tools needed to compile Java source files and the JRE to execute them. You will need it if you want to write your own programs, compile and execute them. There are multiple versions, e.g. Eclipse Java Compiler, GNU Compiler for Java, OpenJDK, Oracle Java SDK.

JRE

Java Runtime Environment: Contains the JVM, class libraries and supporting files. It does not contain any development tools. You will need it if you want to execute any Java program.

JVM

Java Virtual Machine: Provides a platform-independent way of executing code. It is important to mention that the JVM itself is platform-dependent. It interprets the bytecode generated by the Java Compiler depending upon the underlying operating system and hardware combinations. There are many JVM, e.g. JRockit, JInitiator, IcedTea.

Development Process with Java

The source files are plain text documents. The programmer can take advantage of an Integrated Development Environment (IDE) for programming, e.g. Eclipse or Netbeans. At some point the programmer calls the Java compiler (

javac
). It creates the bytecode instructions. The compiled version is stored as a class file and can be executed by the Java Virtual Machine (using or not classpaths).

 

Syntax

The Java syntax is similar to C++ but it combines the syntax for multiple programming paradigms although it was built almost as an object-oriented programming language. In Java, everything is contained within class and everything is an object, except for primitive data types (performance is the main reason). Java is case sensitive but it does have naming conventions.

Test: Hello World

public class HelloWorld {

    public static void main(String[] args) {
        System.out.println("Hello, World");
    }

}

References

  1. TOBIE Software. TOBIE Index for January 2014. Link.
  2. Oracle Corporation. Java Platform Standard Edition 7 Documentation. Link.
  3. Oracle Corporation. Java SE - Change in Numbering Scheme. Link.

Introduction to Class Diagram

In any project's planning phase, the programmer should describe the behaviour of the different objects that are required to satisfy business use cases. However objects can't emerge from nothing, so you need a template that describe the type of objects that the system will have. As a result, you will need to show a diagram with all the classes involved and their relationships. This diagram is called Class Diagram.

According to Wikipedia, a Class Diagram "in the Unified Modeling Language (UML) is a type of static structure diagram that describes the structure of a system by showing the system's classes, their attributes, operations (or methods), and the relationships among objects". Although this definition is pretty clear, you have to understand that Class Diagram are part of the system model's Logical View.

On the following image you will see its relation with the other project views:

Let's Remember: OOP Basic Concepts

A class is a blueprint from which objects are generated. It includes the attributes and methods of that object. Now, in the real-life, we can find many individual objects of the same type. Let's say a car. There may be a lot of cars, but all with the same basic components. In this case, the concept of the car is the class (that can belong to a package) and the many cars in the world are instances of that class. This analogy applies to almost anything.

If you like to learn more about object-oriented programming concepts, I invite you to read my previous post.

Representing Packages in UML

In UML, a package is drawn by a folder with a tab. The name of the package is written inside the folder or in the tab. See this on the following diagram:

A package organize UML elements like classes and other packages. The following diagrams show the proper way to represent them:

Finally, it's common to see a deeply nested packages. For this case, you can use an alternate notation that can ease the work. You can represent the previous diagram with the alternate notation as follows:

Recommendation: The name of the package should be in lower case. In small projects it should be simple and meaningful, but in complex projects the name should be subdivided and specific.

Representing Classes in UML

In UML, a class is drawn by a three sections rectangle. The top section includes the name of the class, the middle section include the attributes and the bottom section include the methods. Moreover, there are four different ways of showing classes using the UML notation:

  • Class Name + Attributes + Methods
  • Class Name + Attributes
  • Class Name + Methods
  • Class Name

The name of the class should follow the CamelCase pattern. Also, try to use nouns because it represents a real-life object.

Visibility

To apply encapsulation, you have to limit the access to the attributes, methods and classes. This is achieved through visibility. There are four types of visibility. The following diagram represents an overview of them:

Public

Public the weakest visibility characteristic. Its notation is the plus symbol (+). Declaring a class attribute or method as public will make it accessible directly by any class.

Private

Private is the strongest visibility characteristic. Its notation is the minus symbol (-). Declaring a class attribute or method as private will make it only accessible for the class itself.

Protected

Protected is one the neutral visibility characteristic. Its notation is the hash symbol (#). Declaring a class attribute or method as protected will make it more visible than private attributes and methods, but less visible than public ones. In other words, protected elements can be accessed by a class that inherits from your class whether it is in the same package or not.

Package

Package is the other neutral visibility characteristic. Its notation is the tilde symbol (~). Declaring a class attribute or method as package will make it visible only to any class in the same package. It doesn't matter if other class from other packages inherits from a class within the main package.

Recommendations

  • Private is the most useful when you have an attribute or method that you want it to be independent from the whole system.
  • Attributes should always be private and only in extreme cases be protected, never be public.
  • Protected is crucial if you want allow access to an attribute or method in the base class without exposing the attribute or method to the whole system.
  • If you want to reuse methods between classes but not expose it to the whole system then you will need the Package visibiilty.
  • Protected and Package visibilities are not the same.

Attributes

An attribute can be represented on a class diagram by placing them inside the middle section of the class box or by association with another class. It is important that the attribute has its visibility characteristic, proper name and data type. See the following diagram for more details:

The name of the attribute should be in mixed case. The names should represent the value of what it represents. Avoid using short names.

Methods

A method can be represented on a class diagram by placing them inside the bottom section of the class box. It is important that the method has its visibility characteristic, proper name, parentheses for parameters and return type. See the following diagram for more details:

The parameters are used to specify the required input to allow the method to complete its job. Its notation is the following:name : datatype. If you are not using thevoid return type, you have to specify at least one parameter. In the scenario of multiple parameters, you should separate them with a comma.

The return type is specified after a colon at the end of the operation's signature. The name of the method should be in mixed case. Use verbs to describe what the method does.

Representing Static Elements in UML

In object-oriented programming the non-static attributes and methods are associated with each individual object that is created from a class. In the other hand, static attributes and methods are associated with the class. This allows to shared its values among all the objects of that class type.

An attribute or method is made static in UML by underlining it. See this on the following diagram:

Representing Class Relationships in UML

It is a fact that classes coexist and work together through different releationships. The strength of a class relationship depends on how involved is each class with the other: tightly coupled or loosely coupled. The following diagram shows the five different type of class relationships:

Dependency 

Declares that a class needs to know about another class to use objects of that class. Its representation is a dashed arrow.

Association 

Means that a class will contain a reference to an object of another class in the form of an attribute. Its representation is the straight line, the association name and the attributes (or Class) involved. An example is "Student --take--> Course".

Aggregation 

Indicate that a class owns but may share objects of another class. Its representation is by an empty diamond, a straight line, the aggregation name and the attribute shared. An example is "Person ?---- Address".

Composition 

It work similar as Aggregation but in a stronger way. Its representation is by a filled diamond, a straight line, the aggregation name and the attribute shared. An example is "Person -----? Name".

Generalization 

It is also called Inheritance. Used to describe a class that is a type of another class. Its representation is by an empty triangle and a straight line. An example is "Mammal -----? Dog".

Representing Abstract Classes in UML

An abstract class is a class that is extend by other classes. Unlike concrete classes, abstract classes can't be instantiated and only have methods. The main purpose of it is to have common code to use in subclasses. Its notation is similar as the Class but without attributes and the text is in italics.

Representing Interfaces in UML

An interface is a collection of operations that do not have corresponding implementation methods. Are much safer to use because they avoid the problems related to multiple inheritance. Think of it llike a contract.

In UML, an interface is drawn by a two sections rectangle. The top section includes the name of the interface with a guillemet (

Why do we need to program?

Technology is part of our life. The world is changing daily and everything is getting automated. Learn to program is the creative way we can take our ideas to the next level and express solutions to society. By designing programs, we learn several abilities like critical reading, analytical thinking and create synthesis. The programmer defines the problem, plans a solution, codes the program, test the proposal and documents the features. But we can't program all the solutions with the same method, that's why programming paradigms appears.

Overview of Programming Paradigms

According to Vasappanavara, a programming paradigm "it is the manner in which programming elements such as functions, objects and variables are exploited to produce the desired output". It is important to understand that programming paradigms are not programming languages.

The following are typical examples of programming paradigms (according to Bhave):

 

What is Object-oriented Programming (OOP)?

The object-oriented is a programming paradigm where the program logic and data are weaved. As stated by Phil Ballard, it is a way of conceptualizing a program's data into discrete "things" referred to as objects, each having its own properties and methods.

Let's see an example. Suppose your friend is a bank manager and he wants you to help improving their system. The first object you might design is the general-purpose

Account
. The 
Account
object has properties and methods. For each client your friend's bank have, you would have to create an 
Account
object.

 

Characteristics

As follows the most important features of object-oriented programming:

  • Encapsulation. Capability to hide data and instructions inside an object.
  • Inheritance. Ability to create one object from another.
  • Polymorphism. The design of new classes is based on a single class.
  • Message Passing. It is the way objects communicate with each other.
  • Garbage Collection. Automatic memory management that destroys objects that are no longer in use by the program.

Benefits

As follows some benefits of using object-oriented programming:

  • Re-usability. You can write a program using a previous developed code.
  • Code Sharing. You are able to standardize the way your are programming with your colleagues.
  • Rapid Modeling. You can prototype the classes and their interaction through a diagram.

Drawbacks

As follows the disadvantages of using object-oriented programming:

  • Size. Are larger than other programs, consuming more memory and disk space.
  • Effort. Require a lot of work to create, including the diagramming on the planning phase and the coding on the execution phase.

Basic Concepts

Class
Basic template that specifies the properties and behaviours of something (real life or abstract).

Object
Particular instance of a class that responds consequently to events.

Attribute
Characteristics of the class. Often called instance variables.

Method
Algorithm associate to an class that represent a thing that the object does.

Subclass
Class based on another class.

Inheritance
Process where the subclass gets the attributes and methods from its parent class.

Interface
Specific template that enforces certain attributes and methods of a class.

Package
Namespace that organizes a set of related classes and interfaces.

Event
Alert the application when there is a state change of the object.

References

  1. Vasappanavara, R. Object-oriented Programming Using C++ and Java. Chapter 1. Object-oriented Programming Basics. Section 1.3. Programming Paradigms. Pearson Education. India. May, 2011.
  2. Bhave, M. Object Oriented Programming with C++, Second Edition. Chapter 4. Object Orientation: An Introduction. Section 4.1. Programming Paradigms. Pearson Education. India. May, 2012.
  3. Ballard, P. Sams Teach Yourself JavaScript in 24 Hours, Fifth Edition. Hour 7. What is Object Oriented Programming (OOP)?. Pearson Education, Inc. United States. November, 2012.

Boyer-Moore (BM)

Overview

According to Wikipedia, the Boyer-Moore algorithm "is an efficient string searching algorithm that is the standard benchmark for practical string search literature. It was developed by Robert S. Boyer and J Strother Moore in 1977. The algorithm preprocesses the string being searched for (the pattern), but not the string being searched in (the text). It is thus well-suited for applications in which the text does not persist across multiple searches. The Boyer-Moore algorithm uses information gathered during the preprocess step to skip sections of the text, resulting in a lower constant factor than many other string algorithms. In general, the algorithm runs faster as the pattern length increases". 
The BM algorithm have two shifting functions: bad character rule (occurrence shift) and the good suffix rule (matching shift): 
  • The bad character rule keeps information about how pattern matches against shifts of itself. This information is to avoid useless shifts of the pattern.
  • Case #1. If mismatch is within pattern.
    Case #2. If mismatch is not within pattern.
  • The good suffix rule uses the prefix function, the pattern and the text as inputs to find the occurrence of the pattern within the text and returns the number of shifts after the first occurrence.
In conclusion, the BM algorithm has the following characteristics: 
  • The algorithm searches for a class of patterns given by a pattern description instead of searching for a fixed pattern.
  • The algorithm compares from right to left.
  • The algorithm preprocesses the pattern.
  • The preprocessing phase time efficiency is ?(m+?).
  • The searching phase time efficiency is ?(m*n).
  • The algorithm accomplish at most 3n text character comparisons.

Advantages

  • Most efficient algorithm to search for a pattern inside a string.
  • It is used in many search and replace operations in text editors.

Disadvantages

  • Not simple.
  • The preprocessing for the good suffix rule is difficult to understand and implement.

Java Code

public class BoyerMoore {
    private final int BASE;
    private int[] occurrence;
    private String pattern;

    public BoyerMoore(String pattern) {
        this.BASE = 256;
        this.pattern = pattern;

        occurrence = new int[BASE];
        for (int c = 0; c < BASE; c++)
            occurrence[c] = -1;
        for (int j = 0; j < pattern.length(); j++)
            occurrence[pattern.charAt(j)] = j;
    }

    public int search(String text) {
        int n = text.length();
        int m = pattern.length();
        int skip;
        for (int i = 0; i <= n - m; i += skip) {
            skip = 0;
            for (int j = m-1; j >= 0; j--) {
                if (pattern.charAt(j) != text.charAt(i+j)) {
                    skip = Math.max(1, j - occurrence[text.charAt(i+j)]);
                    break;
                }
            }
            if (skip == 0) return i;
        }
        return n;
    }
}


public class Test {

    public static void main(String[] args) {
        String text = "Lorem ipsum dolor sit amet";
        String pattern = "ipsum";
        BoyerMoore bm = new BoyerMoore(pattern);
        
        int first_occur_position = bm.search(text);
        System.out.println("The text '" + pattern + "' is first found after the " 
                                    + first_occur_position + " position.");
    }

}

References

  1. Charras, Christian & Lecroq, Thierry.Exact String Matching Algorithms. Rouen University. France. 1997. Link.
  2. Wikipedia, the free encyclopedia (User: Watcher).Boyer-Moore string search algorithm. May 28, 2004.Link.
  3. Bustos, Benjam

Knuth-Morris-Pratt (KMP)

Overview

According to Wikipedia, the Knuth-Morris-Pratt algorithm "searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters".

Being 'A' the portion of the pattern that fits with the text, 'B' is part of the text and 'f' is the length of 'A'. The KMP algorithm moves the pattern according to the characters of the pattern that fits with the text. The main components of the KPM algorithm are the prefix function (?) and the KMP matcher:

  • The prefix fuction (?) keeps information about how pattern matches against shifts of itself. This information is to avoid useless shifts of the pattern.
  • The KMP matcher uses the prefix function, the pattern and the text as inputs to find the occurrence of the pattern within the text and returns the number of shifts after the first occurrence.

In conclusion, the KMP algorithm has the following characteristics:

  • The algorithm compares from left to right.
  • The algorithm preprocess the pattern.
  • The preprocessing phase time efficiency is ?(m).
  • The searching phase time efficiency is ?(m+n).
  • The algorithm accomplish at most 2n - 1 comparisons.

Advantages

  • Fast.
  • Simple.
  • It is good for processing large files.
  • Avoid comparisons with elements of 'B' that have previously been involved in comparison with some element of the pattern.

Disadvantages

  • Chances of mismatch increases with big sized alphabets.

Java Code

public class KnuthMorrisPratt { public int[] prekmp(String pattern) { int[] next = new int[pattern.length()]; int i=0, j=-1; next[0]=-1; while (i=0 && pattern.charAt(i)!=pattern.charAt(j)) j = next[j]; i++; j++; next[i] = j; } return next; } public int kmp(String text, String pattern) { int[] next = prekmp(pattern); int i=0, j=0; while (i=0 && text.charAt(i)!=pattern.charAt(j)) j = next[j]; i++; j++; if (j==pattern.length()) return i-pattern.length(); } return -1; } } public class Test { public static void main(String[] args) { KnuthMorrisPratt k = new KnuthMorrisPratt(); String text = "Lorem ipsum dolor sit amet"; String pattern = "ipsum"; int first_occur_position = k.kmp(text, pattern); System.out.println("The text '" + pattern + "' is first found on the " + first_occur_position + " position."); } }

References

  1. Charras, Christian & Lecroq, Thierry. Exact String Matching Algorithms. Rouen University. France. 1997. Link.
  2. Wikipedia, the free encyclopedia (User: Almi). Knuth

String searching algorithm (SSA)

According to Wikipedia, string searching algorithms are "an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text". Some importantconcepts that you need to understand before reading this entry are pattern, string and alphabet
  • Pattern: Set of elements that represent a predictable manner.
  • Alphabet: Set of symbols. For example human alphabet (A~Z), binary alphabet (1,0) and DNA alphabet (A,C,G,T).
  • String: Finite set of elements chose from an alphabet.
There are multiple categories of SSA. They basically depend on the number of patterns you chose. For example, there is single pattern, finite set of patterns and infinite set of patterns algorithms. The most common single pattern SSA are: 
  • Brute-force searching algorithm.
  • Knuth-Morris-Pratt algorithm.
  • Boyer-Moore algorithm.
In this entry (and next two entries) I will expose all the occurrences of a pattern within a text using a single pattern Java code (tested using Oracle JDK and OpenJDK). For convention, I usedm as pattern size and nas text size

Brute-force searching algorithm (BFSA)

Overview

According to Wikipedia brute-force search is "a trivial but very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's statement". 
The BFSA aligns the first position of the pattern with the first position of the text and the algorithm compares their characters one by one until all pattern characters are found or a mismatch is detected. In conclusion, the BFSA has the following characteristics: 
  • The pattern is not preprocessed.
  • The algorithm compares from left to right character by character.
  • The worst time efficiency is ?(mn) comparisons.
  • The algorithm returns the first occurrence of the pattern.

Advantages

  • Simple.
  • Rapid approach.
  • It is widely used.

Disadvantages

  • Too slow.
  • Not scalable.

Java Code

public class BruteForceSearch {

    private char[] text;
    private char[] pattern;
    private int n;
    private int m;
    
    public void setString(String t, String p) {
        this.text = t.toCharArray();
        this.pattern = p.toCharArray();
        this.n = t.length();
        this.m = p.length();
    }
    
    public int search() {
        for (int i = 0; i < n-m; i++) {
            int j = 0;
            while (j < m && text[i+j] == pattern[j]) {
                j++;
            }
            if (j == m) return i;
        }
        return -1;
    }
}

class Test {

    public static void main(String[] args) {
        BruteForceSearch bfs = new BruteForceSearch();
        String text = "Lorem ipsum dolor sit amet";
        String pattern = "ipsum";
        
        bfs.setString(text, pattern);
        int first_occur_position = bfs.search();
        System.out.println("The text '" + pattern + "' is first found after the " + first_occur_position + " position.");
    }

}

References

  1. Charras, Christian & Lecroq, Thierry.Exact String Matching Algorithms. Rouen University. France. 1997. Link.
  2. Wikipedia, the free encyclopedia (User: Stevertigo).Brute-force search. October 13, 2002. Link.
  3. Wikipedia, the free encyclopedia (User: Kragen).String searching algorithm. October 30, 2001. Link.
  4. Levitin, Anany. Introduction to the Design & Analysis of Algorithms. Pearson Addison-Wesley. Third Chapter. Second Edition. 2007.

Filter Blog

By date: