Using CAS filesystem crawler, we have crawled and created recordstore instance with millions of records. We have also ingested those records into EID datasource. However since those filesystem is part of document management system, we are making join with records from database to the records from cas recordstore instance.
We have a requirement to reingest some of the records again into datasource, so we want to conditionally retrieve record from recordstore based on the record from database.
Please anyone let me know whether the conditional retrieval of record is possible with cas recordstore instance?
IIRC CAS 3.0 Record Store only supports random access via one or more record IDs, and these are unlikely to be your join key to your database. You also can't do a filtered-scan of the whole store.
There's a couple of ways you could tackle this, depending on how often you need to refresh, the number of records you are refreshing etc. Assuming your Record Store is large, it may not be practical to read it all and build the hash for the join-key. Instead, you could read the entire Record Store, and for each record perform the lookup to your database (which should have an index on those columns), returning zero rows if the record does not require update. This approach will be least attractive if the Record Store takes a very long time to scan-read, and there is a very small number of records to update.