the basic requirement is, to a) check new + changed records in Siebel whenever the user saves them within the GUI, and b) to check new + updated records brought into Siebel via an interface using a business service, accessing a web service.
First, I tried to use the webservice test tool for the existing EntityMatch service. As I don't know how to fill all the fields I did not get any result. I tried to fill "entityid" with "account" and "name" with an account name that I recently created in Siebel: no results. Hence I do not know if EDQ is not aware of the newly created record, or if my webservice request is wrong.
An additional requirement is, to configure and fine-tune the duplicate check. At the moment, I have no idea of how to create a real-time duplicate check. Usually, I would need the real time data, and some kind of existing data to check new records against. I then would need to setup some rules.
If I look at the Match - Entity service, there are two data sources, the "Working Data Reader" (real time), and the "Reference Data Reader", which is for batch jobs only. Are the "working data" all records available in the stateless system? Or those which belong to the actual real time web service request?
Hence, I cannot imagine where the reference data comes from.
The "Match Entities" step in the "Match - Entity" process seems to allow the fine-tuning of the matching rules, that's fine so far.
If it is a stateless system, and there are let's say 50.000 accounts in Siebel, how does the system know about those 50.000 accounts after a server restart? It cannot just contact Siebel, because I did not provide it with any access to the Siebel database or any Siebel web service.
The CDS matching services have reference data ports wired in for reference data matching, but these are not used for EDQ attached to Siebel. They are used when the match process is used for data enrichment.
With Siebel, EDQ *does not hold* a copy of the customer data. Siebel sends records to EDQ, so EDQ does not need to access or attempt to maintain a synchronized copy of the 50,000 records.
I recommend reading the EDQ online help topic on Real-Time Matching - that may help to clarify things. Also read Siebel's DQ Adminstration Guide. Siebel has a DQ interface with defined attribute mappings which are then mapped into the EDQ web services. When enabled, EDQ is called whenever a record is added or modified in Siebel to check if it is a duplicate. The driving record is passed across, along with any candidates that share a key value, in a single web service request. EDQ then returns any matches to Siebel. No data is persisted in EDQ here. All of this is pre-defined - you can change it if you need to, but you do not need to create web services from scratch.
Note that for this to work really well you need Siebel 126.96.36.199 or later as this allows the use of EDQ to generate the keys used for candidate selection. Without this, candidate selection will be too rudimentary for really effective matching.
Also, if you want to test the services using the web service tester that is fine, but you need to add multiple records per request, and set the 'candidate' input to 0 for the driving record, and 1 for each candidate. Candidates are not compared with each other, but are compared against the driving record. See the CDS Business Services guide for a full guide to the interfaces.
When integrated with Siebel, the connector handles stamping the records in this way.
Sorry, I did not find the mentioned documentation. Do you mean the Director help, "Advanced Features" > "Real-time Matching" and > "Matching Concept Guide"?
According to that one (paragraph "Real time duplicate prevention" > "Matching"), Siebel sends possible matching keys to EDQ whenever a new account is being created. This explanation matches with your's.
I think I got that one, now. The approach of giving Siebel the task to do the actual matching work seemed a little bit unorthodox to me, but also to the rest of the team here. Originally we intended to use a different solution (not from Oracle), so it's probably just very surprising that Siebel can do that job, although a specialist solution is used.
However, it is very important for us to understand how it works. Would you agree to my flowchart?
This is basically correct, though Siebel sends records through, not keys. The keys are simply used to select candidates.
With Siebel 188.8.131.52 and later there is an additional step not shown above where the EDQ key generation service is called for the driving record before selecting candidates.
Note that Siebel does *not* do the matching work and has no matching logic defined. (The slight exception to this is that in older versions of Siebel, key generation and thus candidate selection are done using Siebel logic; this is the way that their Universal DQ interface was initially done, but it is flawed because it does not allow multiple keys per record, nor complex keys that can enable cross-script matching, phonetic matching etc. etc.)
If EDQ is used for key generation as well (where the keys used for candidate selection are generated using an EDQ service), Siebel is only involved in querying its own database and sending these records across. As the data changes quickly it is essential to keep keys in line with the committed data so it makes much more sense to store them alongside the data than it does to store them in the DQ engine which cannot control the transactions.
I can assure you that this architecture is not at all unorthodox.
Well, it's not really right with regards to when the key gen service is called. It has to be called before candidate selection in order to populate the keys onto the driving record. The key values are not committed to the database as this time because we don't know if the record will be added or updated yet.
There is a diagram in Section 6 of the Business Services Guide Oracle&reg; Enterprise Data Quality