Guardium Big Data Intelligence Application

To open the Guardium Big Data Intelligence GUI navigate your browser to (assuming you used default ports in the installation) :

https://<your Guardium Big Data Intelligence hostname or ip>:8443/sonarg.xhtml


https://<your Guardium Big Data Intelligence hostname or ip>:8443


http://<your Guardium Big Data Intelligence hostname or ip>:8080

The Guardium Big Data Intelligence GUI has four sections:

  • A query GUI allowing you to search through the data using any set of conditions.
  • An operational GUI that allows you to view incoming data from Guardium collectors.
  • JSON Studio - a GUI that allows you to ran arbitrarily sophisticated queries and analytics as well as build powerful visualizations using the Guardium data.
  • A set of predefined reports that you can use to view Guardium data. You can also distribute these forms to end-users who can run reports themselves in a self-service approach.

Guardium Big Data Intelligence Data Model

Guardium Big Data Intelligence maintains many collections for Guardium data (depending on Guardium version and which Guardium Big Data Intelligence data marts you configure). Some examples are:

  • Session - all session data; one document per database session captured by Guardium.
  • Instance - one document per activity per time period; referencing the session (and embedding some of the important session attributes) as well as the object-verb information.
  • Exception - one document per recorded exception along with session information and the SQL that caused the exception.
  • Full SQL - one document per request sent to the database including session data and the SQL itself; this data is only recorded by Guardium with a LOG FULL DETAILS audit policy.
  • Outlier data - when using Guardium V10 and up, outlier data (both summary and details) is also copied to Guardium Big Data Intelligence into two separate collections.
  • Group data - Group member data is copied from a Guardium system, usually from the CM (but possibly from any appliance).
  • DM Extract log - Data on records extracted from the Guardium appliances is copied from the Guardium appliances.

Each of these collections is central to running Guardium Big Data Intelligence queries and the GUI, whether using the custom Guardium Big Data Intelligence GUI or JSON Studio, processing begins with selecting one of these collections to query.

In addition to the data models specified above, any data housed within the Guardium system can be extracted into Guardium Big Data Intelligence using the Guardium Data Mart mechanism. For example, you are able to export STAP health metrics into the Guardium Big Data Intelligence warehouse and use this information as the basis for a STAP health performance dashboard.

These are the main databases in the system:

  • admin – Internal to app; maintains admin info
  • lmrm__ae – Internal to app; maintains data related to analytic engines and built-in app
  • lmrm__scheduler – Internal to app; maintains scheduling and job dispatch queues
  • lmrm__sonarg – Internal to app; main metadata database
  • sonar_log – Internal to app; maintains profiling data
  • sonargateway – Internal to app; maintains data used by ingestion layer
  • sonargd – Main database used for storing data related to Guardium and other security consoles

These are the main collections in the sonargd database (used for SonarG and GBDI):

  • classifier – Data related to classification of sensitive data scans
  • classifier_log – Log records related to invoked scans
  • databases_discovered – Data related to discovery of databases using scans such as nmap
  • datasources – Datasource data; datasources are used for classifier scans, VA scans etc.
  • db_error_text – Mapping between database error codes and error texts
  • discovered_instances – Data related to instances discovered through the agents
  • exception – Data related to errors and exceptions such as failed logins and sql errors as reported by the database
  • full_sql – Data related to the SQL statements including the statement and bind values
  • gfiles, grdm, grdmrec and grdmcomplete – Tracking data on DM files processed by the ETL layer; internal to app
  • group_members and group_members_* - Group member data from each of the CMs
  • installed_patches – Data related to the patches installed on each of the appliances
  • installed_policy – Data related to policy rules installed on each of the appliances
  • instance – Query data at hourly granularity with a count as to how many times each such query (contruct) per session occurred within the hour
  • outliers_list – Detailed data from Guardium outlier detection
  • outliers_summary – Summary data from Guardium outlier detection
  • policy_violations – Data related to each violation as fired by policy rules
  • session – Session-level data logged for each logon and logoff to a database. There is a separate record for logon and logoff.
  • stap_status – Data related to the status of STAP

For more information on data marts and data sets see the following DeveloperWorks article:


Guardium Big Data Intelligence users are created and managed by the Guardium Big Data Intelligence administrator using the SonarG User/ Role Management Console. Using this console you can view users and roles, add roles to existing users, manage passwords, and create new users. All such console activity is audited and passwords need to conform to password complexity configuration.

Password complexity configuration is kept in web.xml using the following parameters:

  • sonarg.pwdExpiration - interval in months requiring a password change (default is 2)
  • sonarg.pwdComplexity - required composition of password (default requires an upper, a lower, a digit and a special character)
  • sonarg.pwdLength - minimal length (default is 8)

Users and password may also be created using the administrator shell (cli). After logging in using the admin account create the user using:

$ mongo <sonarg host>:27117/admin -uadmin -p
> db.createUser({
   user: "",
   pwd: "s6p3rs3cr3t",
   roles : [ "sonarGUser" ]

Users can change passwords from the Studio (GUI) using the “Connected as” drop down or using the CLI/command shell:

$ mongo <sonarg host>:27117/admin -u'' -p
> db.changeUserPassword("", "jsonar");

Note that it is not mandatory to use an email address as the username but it is recommended. Functionality such as sign-on and notification relies on the username being an email address and if your user is not an email address you will not be able to review and sign-off on your reports.


In addition to the main data collections mentioned above, you can create any data set within the Guardium Big Data Intelligence warehouse for enriching your reports and data.

One special collection is the group_members collection. This collection maintains group information and is part of the data extracts processed by Guardium Big Data Intelligence - i.e. you do not populate it within the Guardium Big Data Intelligence application but rather it serves as a copy of your existing Guardium group definitions. Usually these extracts will come directly from the CM since all appliances sync from the CM; however, you can choose to receive it from any Guardium appliance.

All predefined reports allow you to select data using either a regular expression or by selecting a group of USERS, OBJECTS or COMMANDS, or a combination of the two approaches.

For power users who use JSON Studio to design new queries and reports, a built in operator called GUARDIUM_GROUPS is available from the operator pull down in both the Finder and the Aggregation Builder, creating the appropriate query to retrieve group data as a subquery.

In addition, when working within JSON Studio you can use the shorthand of:

"$$LMRM_GG$<field name>$<group name>$"

Note that this must be placed as the RHS of an expression so that you would wrap it in a $or or a $and (even if there is only one group in the search. For example, to query all sessions where the DB User Name is within a group and the OS USer Name is within a group use:

$and: ["$$LMRM_GG$DB User Name$Admin Users$", "$$LMRM_GG$OS User Name$Root Users$"]

Note also that group descriptions that have quotes within them cannot be used in predefined reports.

When you want to create a report that has a group as a parameter (i.e. letting the user select the group), create the query pipeline and when publishing the URL enter the group TYPE in the value (without double quotes) and check as As Group? checkbox e.g.:


Supporting Multiple CMs

Guardium Big Data Intelligence supports bringing in data from collectors that belong to multiple distinct CM environments. The only difference in terms of configuration for such a case involves group members. Group members can be different across different CMs and group_members extract files arrive to Guardium Big Data Intelligence from each of the CMs. In order for all these possibly distinct definitions to be unified you need to edit /etc/sonar/sonargd.conf and add:

  - group_members

This will create separate group_members_<hostname> collection for each CM as well as a view called group_members that is a union of all the group definitions. You can build queries that use any members across all CMs or that refer to members of a group in a specific CM environment.

Prebuilt Reports and Dashboards

Guardium Big Data Intelligence includes a set of prebuilt reports and dashboards that you can use as-is or as starting points for customized reports. The prebuilt reports can be viewed from the Predefined drop-down menu or from the Admin Tasks drop-down menu. Almost all reports can be viewed as an HTML report or downloaded as a CSV. Most report require a from-to date range many predefined data reports can also be run for the past 24 hours to avoid having to choose the date range.

Note: When you run a report dates will appear with a timezone next to it. But when you generate a CSV the data will look like 2016-07-18 00:05:02 in the CSV and when you open it into Excel it will be a date type. All CSV dates are always converted to UTC time even if the report shows a timezone. CSVs do not have a context and strings in a CSV such as 2016-07-18 00:05:02Z or 2016-07-18 00:05:02EDT are not recognized by Excel as dates and thus become strings - hence the date is left as 2016-07-18 00:05:02 and it is always in UTC.

The current set of reports and dashboards includes:

Session Reports

  • Opened On: Reports on the database sessions that have a session-start date between a date range with a variety of filters (most filters are regular expressions for maximum flexibility with a default filter value of .* which matches anything).
  • Active On: Similar to the above but the date range filters for sessions that were open at that time (i.e. the session start was before the range end and session end was after the range start).
  • Currently Open: Displays sessions that have not yet been closed (and that match all other filters).
  • Long Running Sessions: Provides a report and a graphic timeline of sessions lasting more than a specified number of hours and starting within a specific timeframe.
  • Trusted Connections: Shows the list of trusted connections when using the Trusted Connections Profiling engine (in SAGE).
  • Signatures: Summarizes sessions based on client, server, user, type, and instance and provides counters per day+hour - if the same session signature appears many times within an hour a single record is produced with a counter. Data is available only when the SAGE signature engine is turned on.
  • Summary/Details: A dashboard that gives high-level information on various slices of the sessions and a detailed report.
  • New to Profile: Reports on the first time a connection appears in the connection profile (based on client, server, type, instance, ports and protocol). Data is available only when the SAGE profiling engine is turned on.
  • In Profile: Shows the entire connection profile as constructed by the SAGE profiling engine.
  • Sliding Window: Provides moving average analytics that allows you to find deviations in normal patterns. Looks 5 weeks back comparing each day of the week to it’s predecessor days of the week. The result shows any server where the last received session is not today (indicating that something in collection may be broken) as well as any significant reduction in number of sessions compared to the previous weeks (same day of the week). Start with this report to discover collection issues (stopped completely or too few). Counting is normalized for the current hour (e.g. if run at 3am will only count each day betweenmidnight and 3am).

Query Reports

  • Queries: Shows queries run on the database serves. Filters are available for both session-level entities such as DB user name and source program as well as query-level filters such as commands and objects. Filters are available as regular expression and as groups. Queries are aggregated per 1-hour period with counters. The failed-SQLs filter shows all cases where there are more failure than entered in the filter; a 0 means show both successful and failed queries.
  • Full SQL: Similar but based on audit records coming from a “log full details” policy action.
  • Summary/Details: A dashboard that gives high-level information on various slices of the queries and a detailed report.
  • Signatures: Shows unique query signatures (per day)
  • User Clusters: When you activate and setup the SAGE user clustering engine, provides the clusters of users based on the groups you select as indicative.
  • Sliding Window: Provides moving average analytics that allows you to find deviations in normal patterns. Looks 5 weeks back comparing the current day of the week to it’s predecessor days of the week.

Exceptions Reports

  • Exceptions: Displays exceptions/errors recorded (such as failed logins and SQL errors) based on multiple filtering criteria.
  • Failed Login Offenders: The top offenders generating failed logins.
  • SQL Error Offenders: The top offenders generating SQL errors along with codes.
  • Summary/Details: A dashboard that gives high-level information on various slices of the exceptions and a detailed report.
  • Signatures: Summarizes exceptions based on client, server, user, type and provides counters per day+hour - if the same exception signature appears many times within an hour a single record is produced with a counter. Data is available only when the SAGE signature engine is turned on.
  • New to Profile: Reports on the first time an exception appears in the profile (based on client, server, type etc.) Data is available only when the SAGE profiling engine is turned on.
  • In Profile: Shows the entire exception profile as constructed by the SAGE profiling engine.
  • Sliding Window: Provides moving average analytics that allows you to find deviations in normal patterns. Looks 5 weeks back comparing the current day of the week to it’s predecessor days of the week.

Violations Reports

  • Policy Violations: Displays policy violations recorded based on multiple filtering criteria.
  • Signatures: Summarizes violations based on client, server, user, etc. and provides counters per day+hour - if the same violation signature appears many times within an hour a single record is produced with a counter. Data is available only when the SAGE signature engine is turned on.
  • New to Profile: Reports on the first time a violation appears in the profile (based on client, server, etc.) Data is available only when the SAGE profiling engine is turned on.
  • In Profile: Shows the entire exception profile as constructed by the SAGE profiling engine.
  • Sliding Window: Provides moving average analytics that allows you to find deviations in normal patterns. Looks 5 weeks back comparing the current day of the week to it’s predecessor days of the week.

Review Reports

  • User Involvement Justifications: Report showing current Justify workflow events involving the logged-in user.
  • Pending Justifications: Report showing currently open Justify workflow events for the logged-in user.
  • User Review / Sign-Off: (pre v3) To-do list requiring sign-off when not using the Justify applications (allows signing and commenting at a report level). As of v3.1 these can and should be signed-off within the Justify application.

User Reports

  • Users: A Guardium Big Data Intelligence administrator can view the list of defined users and their roles.
  • Admin Review/Sign: A Guardium Big Data Intelligence administrator can view all reports / sign-off requests for all users and sign on their behalf.

Incoming Files Reports

  • File Ingest: Shows a list of ingested file by collector / type / time.
  • Processed Ingest: Shows a pivot table of ingested files for all collectors for the last 7 days. Each type shows how many files and for how many different hours data files exist.
  • Files Pivot: Shows a pivot table of files on disk for 7 days back from the selected time (regardless of ingestion). Directories include incoming, archive and audit or a .* for all. Complete is either 1 for “complete” files, 0 for data files or 2 for both.
  • Daily Incoming Data Over: Shows collectors that have sent in aggregate over X MBs of (uncompressed) data in a single day (looks only at the last 7 days). You can use this to set an email alert to notify you when a collector suddenly sends over 3GB (to catch a policy bug quickly before collectors’ disks fill up)
  • File Sets: Shows potential set gaps (among session, session end, exception and full_sql).
  • Expected Files: Shows a pivot with the last 7 days where each day shows which file sets may be incomplete (given a starting set).
  • Expected Filed (Detailed): Similar to the above but a detailed hourly breakdown.

Job Reports

  • Scheduled: Shows the defined schedules; date filter is when the job was first scheduled
  • Dispatcher: Shows what jobs were fired and what their current status is
  • Processed: Shows more details on jobs fired and processed

Workflow Reports

  • All Justifications: Shows all Justify events and transitions based on a variety of filters.
  • Pending Justifications: Shows open (pending) justifications, who they are assigned to, and what status they’re in.
  • Definitions: Report showing the workflow transition definitions.
  • Admin Review / Sign: (pre v3) To-do list allowing an admin to sign-off on behalf of a user when not using the Justify applications (allows signing and commenting at a report level). As of v3.1 these can and should be signed-off within the Justify application.

Policy Analysis

  • A heatmap that analyzes Full SQL recorded by Guardium filtered by timeframe and a set of collectors (regex) to show the number of records collected based on day and the policy rule.
  • Sliding window analysis for policy rules and collector name.

Run Times

  • Report: Shows run times of all running reports and queries.
  • Graph: Similarly in a graph format.

Guardium Security Operations Center (SOC) Dashboard

  • Guardium SOC Tab: Provides high level view of security events including the number of reported policy violations in the past 24 hours, number of violations for the top 10 database servers generating violations in the past 12 hours and how these violations are unfolding, number of violations for the top 10 users generating policy violations in the past 12 hours, number of exceptions/error in the past 24 hours, number of critical and major failed assessment tests in the past month (requires VA to be used), the current number of monitored servers and the current number of monitored database instances.
  • Policy Violations Tab: Heatmap of policy violations for the last 100 hours broken down by hour and by rule type as well as a punchcard of the daily violations by rule for the past month.
  • Errors Tab: Average daily database exception reported in the past month as well as a heatmap of exceptions by day and type for the last month.
  • Vulnerabilities Tab: When VA is used, shows failed tests in last month by database type and category and by type of failure and database type along with the number of failed tests by severity and category.
  • Outliers Tab: When Guardium 10 outliers are used shows the breakdown of flagged outliers on all collectors for the past week.

Appliances Dashboard

  • Disk Usage Tab: Heatmap showing relative disk usage in percentage across all collectors.
  • Max Load Tab: Heatmap showing relative maximum system load across all collectors.
  • Avg. Load Tab: Heatmap showing relative average system load across all collectors.
  • Snif Restarts Tab: Heatmap showing relative number of snif restarts across all collectors.
  • Guessed Tab: Heatmap showing relative number of guessed sessions across all collectors.
  • Collector Detailed Stats Tab: For a selected bind variable collector (as a regex) show graphs for load, disk and rates over time.
  • Collector History Report Tab: For a selected bind variable collector (as a regex) show all buff usage data over time.
  • Guardium Big Data Intelligence Data Feeds Tab: Shows information about the collectors sending data to Guardium Big Data Intelligence including gauges that count how many collectors have been sending data through the past week, relative data sizes sent by the different collectors as well as a breakdown by day and hour of incoming data sizes from all collectors.

Agents Dashboard

  • Active / Inactive S-TAPs Tab: Shows heatmaps for all STAPs as well as just STAPs that have been inactive within the selected time period. Provides a useful view of which STAPs have been inactive over time and for how long.
  • S-TAP Status Tab: For selected STAPs (filtered through dashboard bind variables) shows the active/inactive value. The S-TAP Status report at the bottom does not show all values but rather only values where a change occurs. A line at time X is shown only if the status for that STAP at time X-1 was different.
  • S-TAP - Collector Tab: Shows the associations between appliances and S-TAPs based on time/stap/collector.

DB 360 Dashboard

The DB 360 dashboard presents diverse data from various elements of a database security implementation where all data is organized based on the database itself. DAM data, VA data, scan data and more is all consolidated and presented for particular server or servers. The DB 360 data required the DB 360 SAGE engine to be turned on. The dashboard has a master Database Servers tab that shows data for all databases organized as pivot tables as well as a number of detailed tabs showing data based on the selected line or on regular expressions selected for IP and/or hostname in Edit Parameters.

  • Database servers Tab: A summary view detailing overall coverage including a bullet chart showing progress on all fronts towards a coverage goal, trending over time, and a pivot table by server/type. The lower pivot table provides details by server and period outlining last ping time from agent, number of monitored sessions per period, last VA date and last classifier scan. Note that when you group ports (a parameter of the dashboard) you will not see the datasources that are mapped per instance.
  • Sessions Tab: Per selected server(s) show users with the largest number of sessions, number of sessions opened over time, and a boxplot for number of client IPs, hostnames and programs used when connecting to the selected server(s).
  • Exceptions Tab: Heatmap for relative number of exceptions by date and type and a boxplot for users that generate exceptions (useful for visually seeing outliers).
  • Violations Tab: Heatmap for violations by user over time for the selected server(s)
  • Vulnerabilities Tab: Summary of VA data for the selected server(s) including results by type and failures by category as well as a detailed report on all results.
  • Data Classification Tab: Summary and details for classifier scans.
  • Instances Discovered Tab: Discovered instances through S-TAP discovery (when enabled) and database discovered through port scanning (when enabled).
  • Datasources Tab: Datasource details related to the selected server(s).
  • Operations Tab: S-TAP and collector information monitoring selected server(s).

Outliers Dashboard

The Outliers Dashboard presents data flagged by the SAGE Machine Learning engine - this engine performs user behavior analysis and emits outliers. Outliers are flagged by comparing users to other users, users to a group of users categorized as similar (e.g. DBAs) and users to themselves across time. Data users includes connections, exceptions and policy violations. The dashboard presents all this data in both summary and detailed views.

  • Outlier Breakdown Tab: Provides a summary view that shows all outliers, of all types and dimensions.
  • Outlier Heatmaps Tab: Heatmaps showing outliers by user and by database over time.
  • Area-Based Visualizations Tab: Shows outliers using various area-visualizations where the area expressed severity.
  • Outlier Data Tab: Provides a detailed list of outliers of all types sorted by the z-score (how extreme the outlier is). Drill down on the user name shows all sessions opened by this user to this datastore in the last 10 days and drill down on the datastore link shows all queries performed by this user on this datastore in the past 10 days.
  • User Span Outliers Tab: Box plot showing user span clusters and outliers.
  • Error/Exception Outliers Tab: Box plot showing exceptions by users clusters and outliers.

Monitoring Gaps Dashboard

Provides insight into monitoring gaps. Based on analysis of what traffic is being captured/monitored in the past day and comparing it to what was captured in the last 10 days. Also looks at STAP active/inactive data.

  • Monitored Gaps Tab: Lists all server/service name/network protocol for which connections have dropped considerably (parameter) in the last 24 hours compared to the last 10 days. The threshold parameter is a percentage. For exaple, setting it to 50 means you want to see all servers/protocols where teh last day has dropped below 50% of the average in the last 10 days. Setting the value to zero means you want to see where such data simply stopped being collected at all.
  • STAP Gaps: Lists all STAPs showing green (active) and yet not showing captured connections.

Reduction and Timelines for Reports

Reports often include very detailed information. While there are many reports that provide aggregated information too, Guardium Big Data Intelligence allows you to take any report and reduce it by any number of dimensions and view the data as you would like to. Click on the Reduce button, then select which fields to aggregate by, which fields to omit, which fields to take a representative value (one of the values) and which fields to combine into a list and click Reduce Now. You will get a report with the reduced data set. You can do multiple reductions by further reducing the reduced data set using a field subset. You can save, load and share reducer definitions.

Similarly, you can plot data on a timeline. Select which fields to display as a label, group, color and either select a single (start) date (the event will be plotted as a dot on the timeline) or a start/end date (in which case the event box will represent the duration). Timelines can also be saved and loaded.

Using JSON Studio and Using the Report Builder

The most robust tool in the system is JSON Studio (click the Analyze button). Using JSON Studio you can build any pipeline, analyze any data and create any report, view or dashboard. For full documentation of the tool see However, with great power comes a learning curve and JSON Studio is a tool better used by advanced admins and developers.

For adding simple reports to the Predefined Reports section on Guardium Big Data Intelligence’s home page use the Report Builder (using the Add Report button). Using this tool select a collection (data set). The metadata of the collection will be displayed showing you what fields exist, what types the data is, and what the frequencies are (if the screen comes up blank click the Rebuild Metadata button; this may take a few seconds but will rebuild the metadata for all collections).

You can then begin to build your report. You select which fields should be displayed by the report by entering sequencial numbers (starting with 1) in the Field Order in Report column. A zero means that the field will not be on the report as will an empty selection. You can also specify the field heading used in the report in the Field Header in Report column. Leaving the later empty will use the field name for the heading.

Each report has filtering criteria. To add a field for searching on, enter a sequence in the Order in Submit Form column. Only strings numbers and dates can be searched on. When you add a date or numeric field the form will have a from-to set of variables. When you add a string field you can determine whether the search is based on a regular expression, a perfect match or word containment; the default is regex.

Each user can only see/edit their own report definitions. When you are done building the report save it and add it to the custom menu for yourself or for all users and refresh the home page for it to take effect. You can also use the Preview button to test your new report.

Scheduling and Dispatching

Queries defined within the Guardium Big Data Intelligence UI can be scheduled for delivery using the SonarGateway scheduler. Similarly, predefined reports can be scheduled directly on the predefined report filter screen and JSON Studio queries can be scheduled.

A scheduled job is defined by selecting what the job contents will be (i.e. which query/report), what schedule to use, and how to deliver the results. The schedule is defined as a CRON string and delivery can be via email by SCP-ing the results to a host/directory or as events sent over syslog. SMTP, syslog and SCP targets need to be configured in the dispatcher.conf file for delivery to occur.

Configuration processes differ slightly when scheduling a query from the Guardium Big Data Intelligence GUI and from JSON Studio vs. scheduling a predefined report. From the Guardium Big Data Intelligence GUI or JSON Studio, once you have a saved query that you want to schedule click on the Publish URL button. You do not need to enter your credentials but you enter any bind variables needed unless you plan to supply a bind variable collection (more on that in the next section). Click on the Compute URL button and validate your URL then click on the Schedule Job button and select the initial day you want the query to run or pick an existing scheduled job if you already have one that you wish to edit. Enter all the scheduling parameters including the CRON string and save. Note that the time slot displayed on the calendar will be computed from the cron string and will be displayed as the next fire time regardless of which day you initially clicked on. The CRON schedule will be displayed as UTC time adjusted to the timezone of your browser. The schedule will fire at the time of the Guardium Big Data Intelligence server. For example, if you select a cron string of 0 30 23 then the scheduler will fire at 11:30pm of the Guardium Big Data Intelligence machine. The slot used in your calendar will be 11:30pm UTC adjusted to your browser’s timezone. When you save the job definition the job is persisted and the schedule created.

To schedule a predefined report navigate to the Predefined Views page and click the button for the appropriate report you want to schedule. This opens the filter page for the report. Enter your credentials and click on the scheduler button. Select the scheduled time and job attributes and then click on the “Schedule” button in the form. This will validate the job and schedule it.

Very often you want to schedule a report that has a “from” and a “to” variable (e.g. to determine which activities should be included). Because you might want this report to run every day or every week or every month you should not hard-code the from/to times.

Instead, use the following syntax to ensure that the from/to time frames are generated when the job schedule fires. Build the query using a bind variable (e.g. called $$from and/or $$to) and then when defining the schedule on the API screen enter one of the following strings as bound to the from and/or to:

  • “$$LMRM_NOW” - will be the time when the scheduler fires.
  • “$$LMRM_NOW-4DAYS” - will be 4 days prior to when the scheduler fires.
  • “$$LMRM_NOW-7HOURS” - will be 7 hours prior to when the scheduler fires.

You can use any number and you can use a + or a - sign. If you want 1 week use 7DAYS and if you want a month use 31DAYS to ensure that all activity will be included even on longer months. Do not include spaces and remember that the bind variable must be within quotes.

As an example, if you want to schedule a report that sends all sessions received in the past 6 hours use a predefined report for session and enter “$$LMRM_NOW-6HOURS” (with the double quotes) in the from field and “$$LMRM_NOW” (with the double quotes) in the to field.

As another example, your custom query might have a variable called “$$from”. In the Bind Variables section you can map from to “$$LMRM_NOW-1DAYS”. When you save (and publish!) your query, click the Publish URL button. Fill in the bind var mapping, as an example, as shown below. The click on Publish URL. Then click on Schedule Job and enter the scheduling information.


Common pitfalls:

  1. Using DAY instead of DAYS.
  2. Using __ instead of _
  3. Forgetting the $$
  4. Defining the LMRM_NOW in the query rather than as the bind variable mapping
  5. Forgetting the quotes

When delivering results by email you can choose between three options:

  • Notification - you will receive an email that results are ready and will need to login to Guardium Big Data Intelligence using the Review/Sign button on the predefined reports page.
  • Link - you will receive an email with a link. Click on the link to see the results. If you also want to sign the results login to Guardium Big Data Intelligence using the Review/Sign button on the predefined reports page.
  • Attachment - you will receive an email with an attachment (the PDF or CSV file). Click on the link to see the results. If you also want to sign the results login to Guardium Big Data Intelligence using the Review/Sign button on the predefined reports page. Note that the dispatcher has a configurable limit for attachment size and attachments large than this size will be omitted from the email.

Scheduled jobs are fired by the Tomcat server hosting the Guardium Big Data Intelligence applications. Note that if the Tomcat server is down and a schedule passes, it will not be fired when the server re-starts. Scheduled jobs are only fired when the scheduling Tomcat server is up.

When a schedule fires the scheduler adds a directive for the dispatcher. The dispatcher runs the report and creates the CSV or the PDF. These are then copied or sent based on the job definition. A deliverable may be sent to multiple users - each receiving a separate email. Note that you can deliver results by email even to people who do not have a user defined in the system - they just are not able to sign receipt. For users to be able to receive and sign a result they need to be defined as a user in the system and their username needs to be their email address.

There are two ways to deliver results to different users:

1. You can upload a spreadsheet to define multiple runs, each with a different set of values assigned to a different email address. You then specify the collection you are using in the “Parameter collection for bind variables” field. Per line in this collection the dispatcher runs the report (each time with a different set of bund values) and emails the result to the specified user.

2. If you structure your pipeline/query/report in a way that one of the fields has an email address you can enter this field into “Field used for email target”. The dispatcher will auto-split the results into multiple reports and send each person only the lines that “has their name on it”.

Users can sign receipt of the deliverables from the Review/Sign button on the predefined reports page. Admins can sign any other users’ reports’ deliverables on their behalf (for example when they are not available). The user who did the signing is recorded and can be viewed on the same report where the signing is done. Each deliverable can only be signed once.

Jobs create either PDFs, CSVs or both and when selecting BOTH, both will be delivered by email.

All CSVs and PDFs generated by the dispatcher as part of these scheduled jobs can be archived by the dispatcher. When configured, deliverables created per day are packaged into a single compressed and encrypted archive file and copied from the Guardium Big Data Intelligence host to a SCP target. Each one of these archive files also includes the sign-offs that occurred during that day as a CSV report. PDFs and CSVs are purged from the Guardium Big Data Intelligence based on a configured retention period (e.g. all CSVs and PDFs older than 30 days are purged). All these configurations are done by the Sonar admin by modifying dispatcher.conf in the sonarFinder install directory.

When sending email notifications the dispatcher has two modes. In the default mode each scheduled job produces a CSV or PDF and causes a separate email to be sent to the users. If you have multiple jobs that fire at the same time producing different deliverables (e.g. different tasks in a classic audit process) all deliverables can be combined into one email with a merged PDF or a single zip of CSVs. To enable aggregation of results add the following to the [dispatch] section of dispatcher.conf and restart the dispatcher service:

bulk_email_report = false


By default CSVs and PDFs will not contain more than 100M rows. Bypassing this limit is possible using the following command performed in the lmrm__sonarg database:

   {app_name: "JsonStudio", username: "*"},
   {$set: {apiJSONReturnLimit: NumberInt(<new limit value>)}})

Note that generating very large reports takes time. The query work is very fast but the generation of the report itself can be slow (reading the query results, converting to a CSV and then converting into a PDF if required). CSV generation works at a rate of approximately 100K lines per minute and can vary with hardware. Here are example times for an extreme (and unrealistic) report generating a CSV and then converting it to a PDF for a query yielding 2.7 Million rows on an AWS m4.4xlarge machine (the PDF has over 130,000 pages). It takes 120 seconds for the warehouse to compute the result set. Generating the CSV takes an additional 20 minutes with a CSV file size of size of 355MB. Generating the PDF takes an additional 3 hours with a PDF file size of 417MB. While lengthy, generating the CSVs and PDFs does not place heavy processing load on the servers. The PDF generation itself can take a very large amount of memory so prefer not generating PDFs for hundreds of thousands of lines or ensure that you have enough virtual memory. CSVs are more manageable and more efficient for very large data sets.

Using Bind Variables in Scheduled Jobs

Queries and reports generated by the scheduler/dispatcher support all bind variable features of Guardium Big Data Intelligence and of JSON Studio. However, when the dispatcher generates results it must have access to all the values for all bind variables or results cannot be generated.

There are three types of bindings that can occur:

  • When you define the schedule you can specify the bind values when you generate the URL.

  • If you use a built-in time-based bind value the dispatcher will fill the value in. This is useful for example when you run a report on a periodic basis (e.g. once a day, once a week, once a month, etc) and each run should look at the previous timeframe. Use a built-in bind variable via one of the following syntax’s (as described in the previous section):

    $$LMRM_NOW-<some num>DAYS|HOURS (or +)
  • Use custom binding - this is the most flexible method and allows you to generate arbitrary bindings and multiple deliveries to receivers based on custom data and custom mappings.

In order to use custom binding you must generate binding data and save it in a collection in the lmrm__scheduler database. The publish screen allows you to generate a template for this in Excel - you can then fill in the values and use the JSON Studio Spreadsheets application to insert it into lmrm__scheduler (or use any application/driver that you are familiar with to insert the data into the database).

Bind value collections are simple to create and simple to understand. Each document (row) specifies a set of values used to bind the variables and a set of receivers. When the dispatcher runs it reads in this bind collection and generated one report/CSV for each document/row in this collection. It then delivers this report to the receivers.

As an example, suppose that you have a report parameterized by the DB User and you want each DBA to receive a report on their activity. You would create a bind collection that has a document/row per DBA where each document would just have two fields - the DB User name and the email of the DBA. When the dispatcher runs it will generate a report per DB User by using the value per document in the query thus generating activity filtered for that user. It will then email that PDF or CSV to the email specified in the second field of each such document.

You can also parameterize the subject, header and footer that will be used in email notification. Use the same syntax as you use it with queries. For example, if your query and thus binding collection has values for $$commandGroup then you can have header, footer and subject defined as shown below:


Note that you cannot embed a $$LMRM_NOW in a custom binding - these must be a part of the published URL definition. The example below shows an API/report definition that has four bind variables - from, to, client and commandGroup. In the example, from and to are defined as a part of the published URL and the spreadsheet used to populate the bind var collection specifies the client and the commandGroup. It would not work to omit the $$LMRM_NOW definitions and instead add two more columns to the bind var collection / spreadsheet. Also note that the limit (or specifying -1 to remove the limit) must be done in the publish section.

_images/bind1.jpg _images/bind2.jpg

Important: The email and copies field must be an array type. If you are using the auto-generated spreadsheet for example, and do not need any copied, make sure to enter []. If you want to email to a single individual with an email address of, enter [“”]. The bind variable replacement values need to be in the appropriate format as you would use bind variables in JSON Studio and Sonar Gateway. As an example, if the replacement value is a group name then since a group name is a string the value must be enclosed in double quotes - e.g. “Servers belonging to Jane”. Do not include single quotes within such string values.

Using Bind Variables for Defaults

When you build a query that is published as a custom report and want the user to select a date, put a value of $$LMRM_NOW-1DAYS or $$LMRM_NOW+1DAYS in the default field. Do not use double quotes. Note that this value must have +/-1 and not another number as it only serves to create the date picker widget. This is NOT the value that will be used for the query - the user will need to put a value of a date before submitting the query.

Using $$LMRM_NOW as Values

When you have an input value (as a user) that is a date on a form or as bind variables of a dashboard you can use the $$LMRM+/-#DAYS syntax to pick the date that you desire (no double quotes).

$$LMRM_NOW Summary

There are many options to use relative dates - both as a user and as a designer; here is a summary:

  • As a user you may use the $$LMRM_NOW syntax for relative date values in forms and dashboards that expect dates; no quotes are needed.
  • As a user you may use the $$LMRM_NOW syntax for relative date values when creating a REST API that has date bind variables; double quotes are needed (e.g. “$$LMRM_NOW-2DAYS”).
  • Do not use the $$LMRM_NOW syntax when you schedule a job - these will not be bound. You cannot for example enter $$LMRM_NOW into a form of a query and schedule it. Instead, make the relative date part of the query and it will be bound when the query is invoked by the dispatcher. Another way to explain this is that you cannot parameterize the relativity of dates and have double binding (once from a bind variable to a $$LMRM_NOW and then from an $$LMRM_NOW to the actual time).
  • As a designer you may use the $$LMRM_NOW syntax for default values; no quotes are needed but only $$LMRM_NOW $$LMRM_NOW+1DAYS and $$LMRM_NOW-1DAYS are supported.
  • As a designer you may use the $$LMRM_NOW within queries; double quotes are needed. These values will be replaced when the query executes - whether on the GUI or as part of a scheduled job.
  • As a dashboard designer you may use the $$LMRM_NOW syntax for default values of a dashboard bind variable; no quotes are needed. You can also use NOW-10DAYS, NOW-17HOURS also without quotes and NOW. No quotes necessary.


Assume you have a report that you need to break down by two dimensions - servers and command. You have three groups of servers (G1, G3 and G4) representing three business units and you need to deliver one report for DDL commands, one for DML commands and one for administrative commands. The spreadsheet you create looks like:


In this case Jane owns the G1 servers, Joe owns the G3 servers and Josh owns the G4 servers. The report is parameterized by $$commandGroup and $$serverGroup - precisely the columns for which the spreadsheet binds values. This spreadsheet is imported into a collection in the lmrm__scheduler database - for example into a collection called mt2. Then, to schedule the report to run daily at 15:10 use:


Note that the name mt1 is that used within the spreadsheet to identify the rows used for running the various reports and that the collection name mt2 is specified at the bottom of the Job Details editor.

NOTE: Do not use $$LMRM_NOW variants as parameters - use them as bind values when you compute the URL. $$LMRM_NOW is expanded before the processing of the multiple parameter lines.

Field used for email target

In versions 3.x and up a scheduled job can specify a field name that is used for the email receiver. This is called “Field used for email target”. Use this when your pipeline that generates the data has a field with an email address. The dispatcher will run the pipeline and then generate multiple results, one per value of the specified field. An email will be sent per person with the lines for that email value only.

Automated Delivery to Apps, Support and Integration with Kafka

Jobs are usually create to generate either PDFs, CSVs or both and are normally delivered via email or SCP. There are three more options - Support, Justify and Kafka.

When you select the Support option a CSV will be generated and encrypted using jSonar’s public key. You may then either email it to jSonar support or uload it to Only jSonar support personnel that have access to jSonar’s private key will be able to decrypt the data.

When you select Justify the resulting data will be used to create events managed by the Justify application - as explained in the Justify documentation

You can stream any Guardium Big Data Intelligence data to Kafka queues and topics. Guardium Big Data Intelligence serves as a Kafka producer. Any pipeline can be used for this but the pipeline must produce two strings - one field called key and one called value. Then, create the job selecting the Kafka type and enter the Kafka server/port location in the Kafka Servers fields and a topic list in the Destinations to copy to field. You can publish to multiple topics by entering the topics divided by strings. For more information see kafka.


Any analytical query that exists within the system or that you build yourself can be used for generating alerts. Alerts are records delivered through syslog or over email that are the result of running the query and having a non-empty result set. Alerts usually embed a threshold value forming the alert basis and this determines whether or not the result set is empty and whether or not the alert gets sent.

Example 1

In this example lets use an existing analytic query to form an alert. The query looks at the “spanning tree” of every database login in the system - computing the number of distinct server IPs, the number of distinct client IPs and the number of distinct client hostnames recorded for every DB User in the system. This is one of the built-in outlier detection visualization showing a box-and-whiskers plot for the user span distribution with the outliers:


Behind this visualization is an analytic query. To generate an alert based on this query and a threshold navigate to custom analytics:


Choose the aggregation link on the top right:


Load the analytic query from the pipeline drop down - in this case called box_u_sip…, and click on the Publish URL button:


Change the limit to -1 and set the threshold you desire (min_span) and click on Schedule Job:


Select how often you want the query/alert to run using the cron string (below set to once an hour) and specify who to deliver the CSV to. Make sure to uncheck the “Send also when empty results” - this is the essence of the alert:


Example 2

As another example, consider the always useful alert for inactive STAPs. Guardium Big Data Intelligence receives STAP information from all collectors and synchronizes them to determine when STAPs are truly inactive. There is a built-in aggregation pipeline for this called stap_inactives published by lmrm__sot. To create an alert based on this pipeline login to the Aggregation Builder under JSON Studio (Custom Analytics) and select the published pipeline from the Pipeline drop down. Then click on the Publish URL button. Change the Limit results to -1 or some large number you wish to limit and then enter values for the from and to bind variables. The value you place there depends on how frequent you choose to schedule the alert. For example, if you with to schedule it on a daily basis then it makes sense to use “$$LMRM_NOW-1DAYS” for from and “$$LMRM_NOW” for to as shown below:


Click on the Compute URL button and then on the Schedule Job button. Enter the cron definition for the schedule and the desired receiver(s). Uncheck the “Send also when empty results” so that you only receive the alert is an STAP has been down:


Using the Guardium Big Data Intelligence Operations Monitoring GUI

Every time a data extract file is loaded by the ETL/ingest system a record is inserted into the grdm collection of the form:

   "_id" : ObjectId("5538e92c13475d4c19dab1e7"),
   "n" : "collector1",
   "t" : ISODate("2015-02-01T05:00:00Z"),
   "v" : 424.4890015106648

n is the name of the collector (or the GID), t is the UTC time upon which the file was ingested and v is the data volume size in MBs. This collection allows you to monitor incoming data and view it on the monitoring dashboard. The monitoring dashboard shows a Cubism display where each collector has all incoming data sizes - the larger/greener the bar the more data came in. If a collector stops sending data then this will be highly visible in the dashboard.

Each extract file can potentially contain many data files. A more granular log of each processed file and how many records were processed is kept in grdmrec where each document looks like:

        "_id" : ObjectId("56302b7616ca4783b7a7c69e"),
        "N" : "",
        "Errs" : 0,
        "F" : "EXP_SESSION_LOG_20151027200000.1.csv",
        "T1" : ISODate("2015-10-28T01:57:10.158Z"),
        "Type" : "session",
        "Num" : 3447,
        "T2" : ISODate("2015-10-28T01:57:10.340Z")

Additionally, the system checks for incoming data and will send an email alert if data stops arriving from collectors. Guardium Big Data Intelligence looks at the average of data coming in per collector over the last 7 days and if this is average is above a (configurable) threshold and in the course of the past X (configurable) hours no data has arrived from that collector (but data has arrived over the past 7 days), then an email is sent.

Note that an alert is sent only when no files or data arrives at all. The system counts every single file and every single record that arrives. You can set up all sorts of alerts if the counts drop below a certain threshold or even drop under a percentage as compares with an average - all these alerts can be defined by building aggregation pipelines during or after the implementation phase

The email will contain a subject line stating “SonarG notification: missing files from Guardium collectors” and the body will list all collectors for which data seems to be missing, for example:

The following collectors copied files in the past but have not in the past 24 hours:

When you know that collectors will undergo scheduled maintenance and you do not want false alerts to be sent you can record the timeframes during which alerts should not be sent per collector. Within the lmrm__sonarg database insert a document into lmrm__alert_exception of the form:

    "name" : "collector93",
    "from" : ISODate("2015-01-01T00:00:00Z"),
    "to" : ISODate("2016-01-01T00:00:00Z")

In this example no alerts will be sent regarding collector93 between Jan 1 2015 and Jan 1 2016.

Alerts can also be enabled for errors. Ingestion errors can be benign or can indicate a problem. You can check for errors using the File Ingest Report (one of the predefined reports) or you can set up an alert to be notified of errors. To enable notification change the following entry in the web.xml in the Tomcat server:


Setting the value to any integer other than -1 activates notifications. You will be notified if incoming files from any particular collector had more than X errors in the last hour where X is the threshold you configure in this entry. Set the value to 0 to be notified of any error. Note that the system will often recover from errors by itself - e.g. even when a problem occurs with a collector extracting files that error is often fixed by Guardium Big Data Intelligence upon the next batch of incoming files and therefore it is not normally necessary to activate this notification.

Errors are mostly due to miss-configuration of schedules on collectors. Guardium Big Data Intelligence will automatically recover erroneous files that come in the wrong order on the next cycle. If a more serious problem occurs on the collectors (such as a hard failure) and partial files exist on Guardium Big Data Intelligence these erroneous files will be moved to the audit directory. To attempt manual recovery of such files you can su to the sonargd user and:

$ cd /var/lib/sonargd/audit
$ for file in *.csv.rejects.rejects.rej;
do mv $file ../inprogress/${file%%.rejects.rejects.rej};

Using JSON Studio for Custom Analytics

Using JSON Studio you can build any queries, reports, analytics and visualizations on top of Guardium data and then publish them as APIs, reports, or graphs. For more information on how to use JSON Studio and for tutorials consult the Guardium Big Data Intelligence JSON Studio documentation here.

Using the Predefined Reports

The Guardium Big Data Intelligence predefined reports give you full access to all four collections of data using arbitrary filtering criteria without the need to login to one of the querying applications. There are eight categories of report templates able to generate any report and any data extraction that you need. Additionally, each category includes an HTML report as well as a CSV generator that produces the same report but as a CSV file that you can use in other applications. Each of the reports or CSVs can also be published - for example, you can produce a link and send it to an end-user of the data; upon receiving the email that user can click on the link and access the report data (after authentication).

In each data category two report templates are available - one giving you the data for the last 24 hours and one in which you can specify an arbitrary from-to data range. In each template you can filter any of the data dimensions or leave the default .* to match all audit events.

By default reports show the first 10,000 line items and CSV downloads include the first 100,000 line items. These default limits may be changed by editing <sonarFinder install dir>/jsonar/SonarFinder/WEB-INF/web.xml and modifying:


Links to predefined reports can be emailed using the “Email” link and they can be scheduled for delivery using the Schedule button.

In addition, for deployments of Guardium Big Data Intelligence receiving data from Guardium collectors version V10 and up, outlier data can be aggregated and viewed. There are four graphs and one report based on the standard outlier summary/detail data. All can be filtered by time and anomaly scores over a certain threshold. The four graphs are sorted by most-recent-first and display up to 1000 data points by default. Two graphs are based on the summary data. The first shows all outliers where the x-axis is time and the y-axis is the anomaly score; bubble size is the number of instances. The second shows a chord sunburst hierarchy with DB User as the first level of the hierarchy. There are two graphs and one details report for the detailed data. The first graph shows all anomalies with a time-based x-axis, an anomaly score on the y-axis and a grouping by an anomaly category. The second shows a chord diagram plotting DB User and instance that can help identify patterns of misuse by personnel or weak server configuration.

Note: Predefined reports make use of groups. By default only group descriptions that match the appropriate types are loaded into the selection pull-down. If you want other group descriptions to be loaded change the default setting in the <sonarFinder install dir>/jsonar/SonarFinder/WEB-INF/web.xml from:




and restart the Tomcat server.

Using the time picker


In the top right corner of your screen you can set the global time used by the Dashboards, Reports and the Analyzer.

You can set the global time by using any of the pre-defined options using the “Quick” section or by specifying an interval of time under “Relative” or “Absolute” sections.

You can control which areas are affected by the global time using the check-boxes:

  • “Apply to Reports”
  • “Apply to Dashboards”
  • “Apply to Studio”

Selected time in the time picker will impact:

  • The default time range for data to be retrieved in the dashboards
  • The default time range of reports
  • The default time set in bind variables used by the Analyzer (a.k.a. “Studio”)

Note: in order for the global time to be active Dashboards & the Analyzer you need to set its usage for the first time:

Dashboards - for each dashboard you need to set the preference time at least once before the global time will take effect. In order to set the usage follow the next steps:

- open the dashboard
- click on “options”
- click on “Edit parameters”
- Select a date in the “from”  and “to” fields
- Click “Update Dashboard”

Analyzer - Need to have the “to” & “from” fields defined as bind variables:

- Open the Analyzer in the aggregation view
- Open the “bind” window
- Make sure you have the “to” & “from” field defined, if not add them
- Set a time in the in the  “to” & “from” fields
- Save the bind settings

Adding Custom Reports

You may add additional reports to the Guardium Big Data Intelligence home page. To do so, build the pipeline or query within JSON Studio (“analyze” from the Guardium Big Data Intelligence home page) and then follow these steps:

  1. Click the publish checkbox and save the query/pipeline.
  2. Click on the Publish URL button.
  3. Optionally, fill in defaults for your bind variables to make it easier for the user and specify a default limit if required. If you want to use the standard report or CSV limits enter REPLIMIT or CSVLIMIT for the limit value respectively. If you have a date field as a bind variable and you want to have the default be one day prior or one day after the time when the user runs the report use $$LMRM_NOW-1DAYS and $$LMRM_NOW+1DAYS (no quotes). Use these strings exactly as they are and note that other variants are not supported. You can use these either in the default or in the value; for user entry use this in the default. Note that the Gateway call will only be populated with these values when configured in the custom report drop down.
  4. Click on the Compute URL button.
  5. Click on the Add to Cutom Reports button.
  6. Set a group and a label that will be used to organize the menus and click on the add button.

If you are logged in as the admin user you can add custom report to all users’ menus.

Configuration Parameters

To configure the Guardium Big Data Intelligence GUI parameters edit <gui install dir>/jsonar/sonarFinder/WEB-INF/web.xml. All Guardium Big Data Intelligence parameters start with the sonarg string. Configuration includes:

  • Various email configuration parameters.

  • Port on which SonarW is listening (sonarg.dbport) - defaults to 27117.

  • Database name for metadata (sonarg.lmrmdb) - defaults to lmrm__sonarg. You can specify a different database and even a database on a different SonarW / Guardium Big Data Intelligence instance. This can be used to create a federated system where multiple Guardium Big Data Intelligence systems share metadata (e.g. when you have one Guardium Big Data Intelligence instance on-prem and one Guardium Big Data Intelligence in a public cloud). Note that even when you specify a remote/different metadata store you still must have a lmrm__sonarg database on the local SonarW instance as that database is still used for the following data:

    1. guardium_groups
    2. Operations monitoring metadata
    3. Predefined report definitions.
  • Scheduler URI - the scheduling subsystem needs to be able to write scheduled job information into the lmrm__scheduler database. Specify this URI to enable scheduling.

  • Report type - by default reports are generated using a Google Visualization Javascript library. This library needs to be accessed over the Internet. For Guardium Big Data Intelligence systems that cannot access the Internet an offline reporting feature is available. To make Guardium Big Data Intelligence use this library rather than access the Internet stop the sonarfinder service, add the following to the web.xml file and start the sonarfinder service:


When using offline reports the _id field is excluded since it is usually for internal-use only. When you are in JSON Studio the _id will show in the reports to help you design queries but when used from the Guardium Big Data Intelligence home page the report will omit the field. The same is true when you schedule a report for delivery. If you wish to override a Web service to include the _id value or used for dispatch, add forceId=1 to the “Add to URL” field when scheduling.

Developer Enhancements - Drill Downs

Depending on the vizualization type, you can add drill downs from a vizualization to underlying data as a report. The supported vizualizations are heatmaps, multi-donuts and box plots when they are a part of a dashboard. The predefined dashboards all have the appropriate drill-down configured.

If you develop new dashbaords and new visualizations you can add drill downs using metadata. A drill down is a metadata document that connects an existing viz to a pipeline saved in the system and is saved in the lmrm__drill_down collection in the lmrm__sonarg database, for example:

        "_id" : 22,
        "name" : "360_class_summary",
        "output" : "graph.360_class_summary_c",
        "published_by" : "lmrm__ae",
        "dd" : "name=360_class_dd&col=classifier&type=agg&output=report&published_by=
        lmrm__ae&bind.Server IP=__lmrm__extract_SIP&bind.Hostname=
        "bind" : {
                "SIP" : {
                        "url" : {
                                "Server IP" : 1
                "Hostname" : {
                        "url" : {
                                "Hostname" : 1
                "Category" : {
                        "value" : {
                                "string" : 1
                "Classification" : {
                        "value" : {
                                "string" : 2

The document specifies a drill down available on the viz called 360_class_summary_c based on the 360_claas_summary pipeline oublished by lmrm__ae. When a node in the viz is double-clicked a report will be opened using the 360_class_dd pipeline. Because a drill down might require values from the dashboard parameters as well as values in the node that was clicked, the bind section in the collection allows you to specify either one. Url means anything in the URL that forms the viz (hence including the dashboard parameters). Value is a value of the clicked node. The key (e.g. string above) is the operator applied to the value and the value (1 or 2 above) is the location of the value required - for example, a node in a heatmap has three values per node - the value itself used to color the node, the value of the row and the value of the column.

Supposed operators for value mapping are:

  • string: Copy the value as a string - e.g. the DB user name or the server IP.
  • stringNonEncoded: Same but without URL encoding applied to the value.
  • datePlusHour: Works on a string of the form “2017-05-07T00” and generates a date.
  • datePlusHourOneHour: Works on a string of the form “2017-05-07T00” and generates a date which is one hour later.
  • dateOnly: Works on a string of the form “2017-05-07” and generates a date.
  • dateOnlyOneAdditionalDay: Works on a string of the form “2017-05-07” and generates a date one day later.
  • ipLookup: Gets a string which is either an IP or a hostname (based on the preferences of the user) and returns the IP.
  • beforeLast: The operator takes the form of “beforeLast:”, “beforeLast@” etc. If using “:” on a string “56098-SQLNQ075:729:ORACLE” returns 56098-SQLNQ075.
  • beforeLastTrim: As above but trim any whitespaces.
  • beforeLastEncoded: As above but URL encodes the string.
  • afterLast: Operator syntax as above. If using “:” on a string “56098-SQLNQ075:729:ORACLE” returns ORACLE.
  • afterLastTrim: As above but trim any whitespaces.
  • afterLastEncoded: As above but URL encodes the string.


  • Where on the Guardium Big Data Intelligence host is the Tomcat server hosting the Guardium Big Data Intelligence UI installed? - The default installation directory is <home dir for user installing SonarG>/sonarFinder
  • What does the Tomcat server include? - The Tomcat server runs JSON Studio, the Sonar Gateway (for Web services access to data), the Guardium Big Data Intelligence GUI and the scheduler.
  • What is the dispatcher? - The dispatcher is a daemon that processes jobs fired by the scheduler and is responsible for email and SCP-based delivery of results. It is a python script running from the sonarFinder directory and configured using dispatcher.conf.
  • Why do I get a Web exception in my browser when I first access the Guardium Big Data Intelligence GUI? - All access to the Guardium Big Data Intelligence GUI is done over SSL. The Guardium Big Data Intelligence software ships with a self-signed certificate that ensure that the data is encrypted. But this certificate is generic and not signed by a CA. The browser therefore cannot validate the authenticity of the endpoint and that it is indeed running on your server (whatever your domain is). If you wish to avoid this exception you need to get a signed certificate for your server from a CA or by self-signing it in a way that browsers within your organization will be able to validate it. You then add this certificate to the Tomcat keystore using standard Tomcat procedures.
  • How do I upgrade my Guardium Big Data Intelligence system? - All upgrades are done using the Guardium Big Data Intelligence installer (e.g. apt-get upgrade in Ubuntu). Upgrades for all internal components (e.g Tomcat) are packaged within these upgrades and you do not need to do any other upgrades. The Guardium Big Data Intelligence upgrade does not upgrade the Linux operating system. Consult your jSonar contact before you perform a major version upgrade the operating system where Guardium Big Data Intelligence is installed to ensure that operating system is supported.
  • Why are there Authorization failure events in SonarW’s log? JSON Studio may be used by users with various authorizations. The first thing that the Studio does it to try to list the databases in the instance. This allows it to know if the user is a “normal” user or a privileged user and builds the collection profile accordingly. For normal users this produces an authorization failure since they do not have privileges to get a list of the available databases. This is normal behavior and can be ignored.