I am new to Endeca.
My question is:
In traditional BI, there would be a warehouse where the incremental load would be carried out from source systems (Structured data) on a regular frequency.
Now with Endeca, able to access data from Social media, is the concept of Incremental load from social media still valid? I mean, is that how it is done from social media also?
Can some one help me understand the concept?
In my experience, social media in Endeca is dealt with in a transient manner where the social media chatter is consumed in sort of a time-boxed "rolling window". Thus, as new tweets/posts/comments are added daily, others are "falling off the end". Removing older posts is a good practice to keep your Endeca Server lean in order to keep it performing well.
To accomplish this, you would simply load new social media records daily (or at whatever cadence you find appropriate) then use the delete records feature of the product to remove older records. Branchbird blogged about this new delete records capability recently here: http://branchbird.com/blog/oracle-endeca-updating-deleting-data/
With this approach, your concern may quickly become losing the older social media data. Since Endeca should always be used as a system of reference, you will need to establish another system of record for your social media feeds. Hadoop's HDFS is a great answer for storing the raw social feeds, ensuring you keep all social media going back as far as you need.