Combining relational data bases with data management in Hadoop
In many application scenarios, it is expedient to combine conventional relational data bases with NoSQL data bases and big data technologies. Harnessing a two-track solution, the entire data pool remains manageable. One part of the data pool will still be managed in a relational, transactional data base, while another part is migrated to a Hadoop-based, distributed data keeping solution.
Application example: Website log data
In most instances, transactional applications like online marketplaces also involve unstructured data. Log data accumulating many different individual components that must be searchable represent a typical example. Hadoop is capable of storing very large data volumes that grow daily – also over longer periods of time. They can be searched and aggregated quickly in combination with a search platform like Apache Solr.
Application example: Data Warehouse
The data warehouse sector is an additional application area where big data technologies are currently attracting attention. The license, support and hardware costs of modern RDBMS software are high. In addition, the implementation of “cubes” for evaluation is very complex and must be performed for all new aggregates. Hadoop-based systems are increasingly seen as more cost efficient alternatives – whether as a partial or a complete replacement. In this context, we are involved with open source technologies like Presto and Jasper Reports, for example. Presto is a distributed SQL Query Engine developed by Facebook. What is special here, is the fact that it enables queries from Hadoop-based data pools with Hive and Cassandra, as well as from relational data bases and proprietary data storage. Jasper Reports, based on Java, can be used to generate professional reports.