DAX Sample Programs

Version 1

    DAX Sample Programs

     

    We have developed a set of C/Java/Python programs to demonstrate how developers can use the APIs provided by the vector library to use the DAX functionality of Oracle's SPARC M7 processor in a "platform-neutral" way.

     

    Note: You will have access to the code for these sample programs once you register at the SWiSdev.oracle.com/DAX site. After you register, log in to your account, and create your own DAX Developer VM. Then change to the /opt/DAXSamplePrograms directory.

     

    The key component of the APIs is an object type called vector, which is very similar to data frames in the R programming language; basically, it is an ordered set of values. Using the vector library APIs, a programmer can load data into memory and then perform operations such as getting a subset based on filtering criteria or finding the 90-percentile value, finding outliers in the data set, and so on. To get more details about vectors and the operations that can be performed on them, see vector.h, which is located in the same directory as the sample programs.

     

    The following sections describe the sample programs.

     

    Key-Value Pair Search

     

    Description

     

    This program reads an input file, which is in comma-separated values (CSV) format. The first field is an integer (the key) and the second field is a string (the value). The program converts input data into two vectors—one for the key and another for the value—using the vector library API. The program accepts a key (integer) and finds the matching value (string) using the vector library API. Then it prints them. The program is available in C and Python.

     

    Complex Key-Value Pair Search

     

    Description

     

    This program reads an input file, which is in CSV format. The first five fields are integers (keys) and the sixth field is a string (value). The program converts the input data into six vectors—five for the keys and another for the value—using the vector library API.

     

    The program accepts five key (integer) ranges, for example, 5–10 as one range, 15–20 as another, and so on. It uses the vector library API to find the matching value (string) for which at least four of the key fields are in the given ranges. Then it prints them.

     

    The number of ranges and the number of matches can be varied, so it is possible to do multiple ranges and select from a subset of successes. The program is available in C.

     

    Top-N Algorithm in Java

     

    Description

     

    This program reads an input file that contains N values, where N can be user-defined to be up to 10 million. It calculates the largest 1,000 numbers and prints them. Three versions—integer, floating point, and double numbers—are supported. This program uses the vector library API.

     

    Top-N Algorithm in Python

     

    Description

     

    This program reads a data file that contains between 1 million and 10 million numbers, calculates the largest N numbers (where N is user-defined), and displays them. Integers, doubles, and floating point numbers are supported. The program uses the vector library API.

     

    Outlier Detection in Java

     

    Description

     

    This program reads an input file that contains 10 million numbers. It first calculates the median, and then it prints the list of numbers that are either N times larger than the median or N times smaller than the median, where N is a user-defined integer input parameter. Three versions are supplied:  one that accepts integers, one that accepts floating point numbers, and one that accepts double numbers.

     

    Cube Building with Apache Spark

     

    Description

     

    This program reads an input file that has 1 million points and builds a cube that has 1,000 cells. Each cell in the cube is defined by a range in each of the three dimensions. Each cell contains the number of points that belong to that cell.

     

    Approximate K-Nearest Neighbors Classification

     

    Description

     

    This program computes the class membership of a test data point using the "k closest" training data points. Both training and test data sets are read in as input files. k is an integer input parameter. The program prints the computed labels of all the points in the test data set.