This content has been marked as final. Show 1 reply
For purposes of this example, records are extracted and loaded into a fact staging table W_AR_XACT_FS. Foreign keys corresponding to the dimension tables are then added to the records, and the transformed records are then loaded into fact table W_AR_XACT_F.
For this example, the fact table is assumed to have 8 associated dimension tables whose foreign keys are to be inserted into the records in the fact staging table prior to the records being loaded into the fact table. According to the processing depicted in FIG. 2, the dimension tables are sorted based upon the number of records in the tables. Table A shown below depicts the dimension tables sorted in ascending order based upon the number of records (in thousands) in the tables.
Sorted Dimension Tables
Dimension table No of records (′000)
After iteratively outer joining dimension tables with the fact staging table, starting with one with the smallest number of records according to the processing depicted in FIG. 2, it is determined that the first six tables are suitable for outer joins with the fact staging table while the last two tables are not suitable. Lookup transformations may be used to add foreign keys for the last two dimension tables. The final SQL query for joining the fact staging table W_AR_XACT_FS with the first six dimension tables is as follows:
SELECT /* USE_HASH(W_ORG_D
LEFT OUTER JOIN W_ORG_D
ON W_AR_XACT_FS.CUSTOMER_ID = W_ORG_D.INTEGRATION_ID
AND W_AR_XACT_FS.DATASOURCE_NUM_ID = W_ORG_D.DATASOURCE_NUM_ID
AND W_AR_XACT_FS.POSTED_ON_DT >= W_ORG_D.EFFECTIVE_FROM_DT
AND W_AR_XACT_FS.POSTED_ON_DT < W_ORG_D.EFFECTIVE_TO_DT
LEFT OUTER JOIN W_SUPPLIER_D
ON W_AR_XACT_FS.SUPPLIER_ID = W_SUPPLIER_D.INTEGRATION_ID
AND W_AR_XACT_FS.DATASOURCE_NUM ID = W_SUPPLIER_D.DATASOURCE_NUM_ID
AND W_AR_XACT_FS.POSTED_ON_DT >= W_SUPPLIER_D.EFFECTIVE_FROM_DT
AND W_AR_XACT_FS.POSTED_ON_DT < W_SUPPLIER_D.EFFECTIVE_TO_DT
LEFT OUTER JOIN W_SUPPLIER_ACCOUNT_D
ON W_AR_XACT_FS.SUPPLIER_ACCOUNT_ID = W_SUPPLIER_ACCOUNT_D.INTEGRATION_ID
AND W_AR_XACT_FS.DATASOURCE_NUM_ID = W_SUPPLIER_ACCOUNT_D.DATASOURCE_NUM_ID
AND W_AR_XACT_FS.POSTED_ON_DT >= W_SUPPLIER_ACCOUNT_D.EFFECTIVE_FROM_DT
AND W_AR_XACT_FS.POSTED_ON_DT < W_SUPPLIER_ACCOUNT_D.EFFECTIVE_TO_DT
LEFT OUTER JOIN W_PRODUCT_D
ON W_AR_XACT_FS.PRODUCT_ID = W_PRODUCT_D.INTEGRATION_ID
AND W_AR_XACT_FS.DATASOURCE_NUM_ID = W_PRODUCT_D.DATASOURCE_NUM_ID
AND W_AR_XACT_FS.POSTED_ON_DT >= W_PRODUCT_D.EFFECTIVE_FROM_DT
AND W_AR_XACT_FS.POSTED_ON_DT < W_PRODUCT_D.EFFECTIVE_TO_DT
LEFT OUTER JOIN W_SALES_PRODUCT_D
ON W_AR_XACT_FS.SALES_PRODUCT_ID = W_SALES_PRODUCT_D.INTEGRATION_ID
AND W_AR_XACT_FS.DATASOURCE_NUM_ID = W_SALES_PRODUCT_D.DATASOURCE_NUM_ID
AND W_AR_XACT_FS.POSTED_ON_DT >= W_SALES_PRODUCT_D.EFFECTIVE_FROM_DT
AND W_AR_XACT_FS.POSTED_ON_DT < W_SALES_PRODUCT_D.EFFECTIVE_TO_DT
LEFT OUTER JOIN W_INVENTORY_PRODUCT_D
ON W_AR_XACT_FS.INVENTORY_PRODUCT_ID = W_INVENTORY_PRODUCT_D.INTEGRATION_ID
AND W_AR_XACT_FS.DATASOURCE_NUM_ID = W_INVENTORY_PRODUCT_D.DATASOURCE_NUM_ID
AND W_AR_XACT_FS.POSTED_ON_DT >= W_INVENTORY_PRODUCT_D.EFFECTIVE_FROM_DT
ANDW_AR_XACT_FS.POSTED_ON_DT < W_INVENTORY_PRODUCT_D.EFFECTIVE_TO_DT
As can be seen from the above query, the fact staging table W_AR_XACT_FS is outer joined (using LEFT OUTER JOIN) with six dimension tables, namely, W_ORG_D, W_SUPPLIER_D, W_SUPPLIER_ACCOUNT_D, W_PRODUCT_D, W_SALES_PRODUCT_D, and W_INVENTORY_PRODUCT_D.
The plan for the final query is shown below in Table B. As can be seen from Table B, a hash join is used to implement the outer joins.
Id Operation Name Rows Bytes TempSpc Cost (% CPU) Time
0 SELECT STATEMENT 15M 11G 2410K (1) 09:22:24
* 1 HASH JOIN RIGHT OUTER 15M 11G 2410K (1) 09:22:24
2 TABLE ACCESS FULL W_PRODUCT_D 10020 420K 81 (3) 00:00:02
* 3 HASH JOIN RIGHT OUTER 15M 11G 127M 2410K (1) 09:22:21
4 TABLE ACCESS FULL W_INVENTORY_PRODUCT_D 2618K 97M 13724 (2) 00:03:13
* 5 HASH JOIN RIGHT OUTER 15M 10G 127M 1920K (1) 07:28:06
6 TABLE ACCESS FULL W_SALES_PRODUCT_D 2617K 97M 11408 (2) 00:02:40
* 7 HASH JOIN RIGHT OUTER 15M 10G 23M 1456K (1) 05:39:56
8 TABLE ACCESS FULL W_SUPPLIER_ACCOUNT_D 493K 17M 2187 (2) 00:00:31
* 9 HASH JOIN RIGHT OUTER 15M 9757M 23M 1029K (1) 04:00:14
10 TABLE ACCESS FULL W_SUPPLIER_D 493K 17M 2700 (2) 00:00:38
* 11 HASH JOIN RIGHT OUTER 15M 9227M 76M 624K (1) 02:25:41
12 TABLE ACCESS FULL W_ORG_D 1715K 57M 24909 (3) 00:05:49
13 TABLE ACCESS FULL W_AR_XACT_FS 15M 8727M 216K (2) 00:50:25
FIG. 3 is a simplified block diagram of a processing system 300 that may be used to practice an embodiment of the present invention. System 300 may be part of an ETL tier or may also be part of a data warehouse system and may be used to execute SQL queries and lookup transformations. As shown in FIG. 3, system 300 includes a processor 302 that communicates with a number of peripheral devices via a bus subsystem 304 . These peripheral devices may include a memory subsystem 306 , input devices 308 , output devices 310 , and a network interface subsystem 312 . Bus subsystem 304 provides a mechanism for letting the various components and subsystems of system 300 communicate with each other as intended. Although bus subsystem 304 is shown schematically as a single bus, alternative embodiments of the bus subsystem may utilize multiple busses.
Processor 302 is configured to perform processing performed by system 300 . For example, processor 302 may be configured to execute programs, code, or instructions to perform transformation and load operations according to an embodiment of the present invention. Processor 302 may also control other subsystems or devices.
Input devices 308 enable a user such as a data warehouse administrator or an ETL process operator to interact with and provide information to system 300 . Input devices 308 may include wired or wireless devices such as a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display, audio input devices such as voice recognition systems, microphones, and other types of input devices. In general, an input device may refer to any device or mechanism that may be used for inputting information to system 300 . Input devices 308 typically allow a user to select objects, icons, text and the like that appear on a monitor via a command such as a click of a button or the like.
Output devices 310 may include wired or wireless devices such as a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices, etc. Examples of a display subsystem include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, etc. In general, an output device may refer to any device or mechanism for outputting information from system 300 . For example, the trace files may be viewed by a user using an output device.
Network interface subsystem 312 provides an interface to other computer systems, and networks. Network interface subsystem 312 serves as an interface for receiving data from other sources and for transmitting data to other sources from system 300 . Embodiments of network interface subsystem 312 include an Ethernet card, a modem (telephone, satellite, cable, ISDN, etc.), (asynchronous) digital subscriber line (DSL) units, FireWire interface, USB interface, and the like. For example, subsystem 312 may be coupled to a computer network, to a FireWire bus, or the like. In other embodiments, network interface subsystem 312 may be physically integrated on the motherboard of system 300 , and may be a software program, such as soft DSL, or the like.
Memory subsystem 306 may be configured to store the basic programming and data constructs that provide the functionality of the present invention. For example, a computer program or software code modules (or instructions) implementing the functionality of the present invention may be stored in memory 306 . These software modules may be executed by processor(s) 302 . Memory 306 may also provide a repository for storing data used in accordance with the present invention. For example, memory 306 may store a fact staging table and or a data warehouse.
Memory 306 may include a number of memories including a main random access memory (RAM) for storage of instructions and data during program execution and a read only memory (ROM) in which fixed instructions are stored. RAM is generally semiconductor-based memory that can be read and written by processor 302 . The storage locations can be accessed in any order. RAM is generally understood to refer to volatile memory that can be written to as well as read. For example, the hash tables used for the outer join operation corresponding to data from a dimension table may be stored in a RAM. There are various different types of RAM.
Memory 306 may also comprise subsystems that provide persistent (non-volatile) storage and caching for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a Compact Disk Read Only Memory (CD-ROM) drive, an optical drive such as a DVD, removable media cartridges, flash memory, and other like storage media.
Processing system 300 can be of various types including a personal computer, a portable computer, a workstation, a network computer, a mainframe, a kiosk, or any other data processing system. Due to the ever-changing nature of computers and networks, the description of system 300 depicted in FIG. 3 is intended only as a specific example for purposes of illustrating the preferred embodiment of a processing system. Many other configurations having more or fewer components than the system depicted in FIG. 3 are possible. For example, the processing system may be a desktop computer, portable computer, rack-mounted or tablet configuration. Additionally, the processing system may be a series of networked computers. Further, the use of different micro processors is contemplated, such as Pentium™ or Itanium™ microprocessors; Opteron™ or AthlonXP™ microprocessors from Advanced Micro Devices, Inc., and the like. Further, use of different types of operating systems is contemplated, such as Windows®, WindowsXP®, WindowsNT®, or the like from Microsoft Corporation, Solaris from Sun Microsystems, LINUX, UNIX, and the like.
Although specific embodiments of the invention have been described, various modifications, alterations, alternative constructions, and equivalents are also encompassed within the scope of the invention. Embodiments of the present invention are not restricted to operation within certain specific data processing environments, but are free to operate within a plurality of data processing environments. Additionally, although embodiments of the present invention has been described using a particular series of transactions and steps, it should be apparent to those skilled in the art that the scope of the present invention is not limited to the described series of transactions and steps.
Further, while embodiments of the present invention has been described using a particular combination of hardware and software, it should be recognized that other combinations of hardware and software are also within the scope of the present invention. Embodiments of the present invention may be implemented only in hardware, or only in software, or using combinations thereof.
The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that additions, subtractions, deletions, and other modifications and changes may be made thereunto without departing from the broader spirit and scope as set forth in the claims.