Guardium Big Data Intelligence Architecture

Guardium Big Data Intelligence is a system for storing, managing and providing access to the IBM InfoSphere Guardium Database Activity Monitoring (DAM) system (called Guardium in the documents that follow).

Guardium Big Data Intelligence is a Big Data system that uses the SonarW NoSQL Data Warehouse to store data that is extracted from Guardium collectors. Guardium Big Data Intelligence allows you to store large amounts of Guardium data in one place - thus eliminating the need for complex aggregation processes and allowing you to centralize data from hundreds of Collectors and for long periods of time in one place. Because the data is stored in a best-of-breed data warehouse, reports and analytics run fast and the data can be used for multiple purposes.

Guardium Big Data Intelligence includes the following components:

  • The SonarW NoSQL Data Warehouse.
  • The SonarCollector ETL layer and specific Guardium ETL algorithms.
  • The Guardium Big Data Intelligence GUI.
  • The SonarK discovery GUI (based on Kibana).
  • SonarSQL, providing SQL access to Guardium data stored within SonarW.
  • JSON Studio providing a GUI for advanced analytic query building and visualization.
_images/sonarg_arch.png

Guardium Big Data Intelligence is a software package that is installed on a RHEL Linux server. Guardium Big Data Intelligence can run on a physical server or as a virtual machine. Guardium Big Data Intelligence can be installed as the only application on the server or co-located with other applications. However, due to the nature of the Guardium Big Data Intelligence Big Data workloads, Guardium Big Data Intelligence is a resource-intensive application and consumes all resources available to it - compute, memory and I/O. It is therefore recommended to run Guardium Big Data Intelligence on its own server.

Guardium Big Data Intelligence receives data from Guardium collectors through an SCP process of compressed extraction files. These files are produced by the collectors and the mechanism is supported for Guardium versions 9.x and 10.x. If you are running version 9.5 collectors you need to install the IBM data extraction patch 609 (or a cumulative later patch). Consult your Guardium Big Data Intelligence account manager for the precise IBM patch required. Guardium 10 has built in support for producing these extract files.

Data coming from Guardium Collectors is copied to the Guardium Big Data Intelligence server where it is processed using a Guardium-specific ETL process before it is inserted into SonarW. When you configure data extraction from Guardium collectors you specify a hostname where the extract files should be copied to. This host can be the Guardium Big Data Intelligence host or a separate host which will serve as the staging area for the extract files (from which Guardium Big Data Intelligence ETL will copy the files). It is recommended that the collectors copy the files directly to the Guardium Big Data Intelligence server to prevent an additional and unnecessary copy.

Collectors produce and copy files on an hourly basis. The Guardium Big Data Intelligence ETL process runs continuously and ingests these extract files on an ongoing basis. Data is therefore available in Guardium Big Data Intelligence with a lag not longer than ~60-75 minutes.

Once the data is in SonarW, various tools provide access to the Guardium data. These include a Guardium Big Data Intelligence custom-built reporting layer, JSON Studio for building queries, reports and visualizations directly over the Guardium data, a Web Services layer and a SQL layer. All these are installed on the Guardium Big Data Intelligence server as part of the Guardium Big Data Intelligence installer.

System Sizing

A single Guardium Big Data Intelligence node is usually used for up to 30TB of compressed Guardium data. You can store more than 30TB on a single node and reporting times may still be reasonable but you can also cluster multiple Guardium Big Data Intelligence nodes to provide faster response times. Consult your Guardium Big Data Intelligence account manager for additional sizing guidelines.

Each Guardium Big Data Intelligence node should have the following specs:

  • Two Intel Xeon processors, at least 6 cores per socket, at least 2.4Ghz each.
  • At least 64GB of memory.
  • Either HDD or SSD drives. In both cases, and especially when using HDDs, the drives should be striped using RAID0 or RAID10. For example, you can choose to use SATA drives. In this case you should create a single RAID array using at least four such disks to give you the ability to read at a rate nearing 500MB/s. The system has been optimized to allow leveraging low-cost SATA drives in order to achieve the most cost-effective large data store using inexpensive drives.
  • At least one SSD of size ~400MB used for temp storage for SonarW.

If you are deploying Guardium Big Data Intelligence on an Amazon AWS EC2 instance, Guardium Big Data Intelligence recommends using a m4.4xlarge instance or an m4.10xlarge instance if workloads are expected to be very large. An io2 EBS volume with at least 10K-12K PIOPS is recommended since this will allow you to grow the volume as your data size grows with no changes to the Guardium Big Data Intelligence application or to RHEL. If you choose general purpose EBS then you should use a RAID0 configuration.

When using virtual machines (VMs), the recommended minimum production configuration is 8 vCPUs and 64GB RAM. You must work with your VM administrator to provision enough IOPS dependant on your loads and data volumes. A VM machine with 32GB RAM and 6 vCPUs can be used for a POC host, with the understanding that the performance will not be optimal.

If you are deploying Guardium Big Data Intelligence on a machine that has between 96GB to 128GB of RAM, set the parameter block_allocation_size_percentage to 33. This will take advantage of the memory available. If you are deploying on a machine that has 128GB of RAM or more, set block_allocation_size_percentage to 50.

Limits

A single Guardium Big Data Intelligence system maintains up to 10 Trillion distinct sessions per collector. If there is a Guardium system that will feed more than 10 Trillion sessions then older sessions will be deleted and the newer sessions will be maintained.