GBDI 4.1

Remote File Copies

In most cases collectors will copy extract files directly to the GBDI incoming directory for processing. GBDI also supports a deployment model by which collectors push their files to a third host and GBDI pulls files from this host. This configuration is especially helpful when GBDI is installed in the cloud while the Guardium appliances are on-premise; rather than having all collectors send data directly to the GBDI machine in the cloud, the collectors send data to a local "jump server" from which GBDI pulls the data. To configure GBDI to pull data from the jump server, add the URIs into the remotes configuration parameter in sonargd.conf:

# List of remote file servers from which to read incoming files. The files
# will be read from the server when they are ready (have an appropriate
# '_COMPLETE' file), and moved to the 'archive' directory there right after
# reading. From then on local processing of the file proceeds as usual.
# List all the remotes you wish to read from. Use a full RFC-3986 URI
# syntax. Include scheme (ssh, scp or sftp), username, password, host, port
# (if not 22), and a path to the incoming folder on the server.
# Example:
# remotes:
#   - sftp://user1:passw0rd@fileserver.domain.tld/full/path/to/incoming
#   - sftp://user2:mypasswd@fileserver2.domain.tld:1220/some/other/path/incoming
# For an empty list of remotes, use:
# remotes: []
remotes: []

You can use standard SSH config keys to pull data without a username/password. For example, in sonargd.conf specify a remotes definition without a username/password as well as a location for the SSH config file:

  - sftp://g2/var/lib/sonargd/incoming

ssh-config: /etc/sonar/ssh_config

and then a standard ssh config entry such as:

host g2
  port 22
  user sonargd
  identityfile /etc/sonar/id_dsa

The remote-schedule option allows you to configure the periods and frequency of the GBDI data pull from the remote server. By default, GBDI will pull data from the on-premise server every five minutes; for more control when traversing a public-facing network, use the remote-schedule section to change the default configuration:

# Set the schedule for reading the files from the remote hosts.
# The format is given by 5 fields: min(ute), hour, day, month and day-of-week.
# minute, hour, day and month are numbers.
# day-of-week can be given as a 3 letter day abbreviation (sun, mon, ...) or
# as a number (mon=1, sun=7).
# each entry can be a single value (e.g. 2 or sun), a list (e.g. 2,5,19 or
# sun,thu), a step value (e.g. */5, meaning every 5), or a combination of all
# of the above e.g. (*/17,12,41-59).
# Day of week can also specify count, so that fri#2 is the second friday of
# the month.
# If both day of week and day of month are restricted, the job will run when
# either matches. All the other fields must match together.
# If not all values are given, the missing values are taken from the default, or *.
# Note: * is a special character in YAML, so it needs quoting in this context.
# Default is every 5 minutes:
#   min: '*/5'
#  min: '*/5'
#  hour: '*'
#  day: '*'
#  month: '*'
#  day-of-week: '*'