Pond¶
Author: Max Kellermann <mk@cm4all.com>
Pond is a volatile round-robin database for log messages. It receives log datagrams and keeps them around for a while, to allow its clients to query them.
Configuration¶
The file /etc/cm4all/pond/pond.conf configures several aspects
of this software:
a receiver is a datagram socket which binds to an address, optionally joins a multicast group, and receives log datagrams
the database stores these datagrams
a listener is a stream socket which binds to an address and accepts connections from clients, allowing them to query database contents
Example:
database {
size "1G"
#max_age "7 days"
}
receiver {
bind "*"
#v6only "yes"
#multicast_group "ff02::dead:beef%br0"
#interface "eth0"
}
listener {
bind "*"
#interface "eth0"
#zeroconf_service "pond"
}
listener {
bind "@pond"
}
listener {
bind "/run/cm4all/pond/socket"
}
auto_clone "yes"
The last two listener blocks configure local sockets, the
first one with an abstract address, and the second one with a socket
path.
Global Options¶
auto_cloneattempts to find another Pond server and clones its database. This requires Zeroconf and will delay startup until the clone is complete, which can take a very long time when the database is large.
database¶
The database block can contain the following settings:
sizespecifies how much memory is allocated in total (in bytes; the suffixes k, M, G are supported). Internally, the database implements a circular buffer which evicts the oldest items if there is no more room for another item.max_age: if specified, then records older than this will be evicted even if there is still room in the buffer.per_site_message_rate_limit: if specified, then each site is rate-limited to this number of messages per second. Excess messages will be discarded silently. This affects only log datagrams which contain only a message (e.g.http_error).
receiver¶
The receiver block can contain the following settings:
bind: an address to bind to. May be the wildcard*or an IPv4/IPv6 address followed by a port. If you omit the port number, it will default to 5479. IPv6 addresses should be enclosed in square brackets to disambiguate the port separator. Local sockets start with a slash/, and abstract sockets start with the symbol@.v6only: if set toyes, then IPv4 support is disabled on an IPv6 listener. This is required to avoid the port conflict when you need an IPv4 listener with a different configuration (e.g. an IPv4 multicast group).multicast_group: join this multicast group, which allows receiving multicast commands. Value is a multicast IPv4/IPv6 address. IPv6 addresses may contain a scope identifier after a percent sign (%).interface: limit this listener to the given network interface.
listener¶
The listener block can contain the following settings:
bind: an address to bind to. May be the wildcard*or an IPv4/IPv6 address followed by a port. If you omit the port number, it will default to 5480. IPv6 addresses should be enclosed in square brackets to disambiguate the port separator. Local sockets start with a slash/, and abstract sockets start with the symbol@.interface: limit this listener to the given network interface.zeroconf_service: if specified, then register this listener as Zeroconf service in the local Avahi daemon. This can be used by clients to discover Pond servers.zeroconf_domain(optional): The name of the Zeroconf domain.zeroconf_interface: publish the Zeroconf service only on the given interface.zeroconf_protocol(optional): Publish only protocolinetorinet6.
@include¶
Include another file. Example:
@include "foo/bar.conf"
@include_optional "foo/may-not-exist.conf"
@include "wildcard/*.conf"
The second line silently ignores non-existing files.
The third line includes all files in the directory wildcard ending
with .conf.
The specified file name may be relative to the including file.
Client¶
The package cm4all-pond-client contains a very simple and
generic client which can be used to query logs.
Querying¶
Example:
cm4all-pond-client localhost query site=foo
cm4all-pond-client localhost query --follow
The first line queries all records of site “foo”. The second line enables “follow” mode, which means that the client receives a continuous live stream of records as they are received by the server, but no past entries are shown.
The first command-line argument specifies the Pond server to connect
to. This can be a numeric IPv4/IPv6 address, a DNS host name, a local
socket path (starting with /) or an abstract socket name
(starting with @). Additionally, a Zeroconf service name can
be used prefixed with “zeroconf/” (requires installing the
avahi-daemon package on all servers and clients).
The following command-line options are available:
- --follow¶
Follow the live stream of records as they are received by the server, but no past entries are shown.
- --last¶
Show only the most recent record.
- --age-only¶
Show the only the age of each record (in seconds before the current wallclock time).
- --jsonl¶
Write JSON-Lines.
- --raw¶
Write raw
LOG_RECORDpackets to standard output instead of pretty-printing them as text lines.
- --gzip¶
Compress the output with
gzip.
- --geoip¶
Look up all IP addresses in the GeoIP database and add a column at the end of each line specifying the country code (or “-” if the country is unknown). This requires the
geoip-databasepackage.
- --anonymize¶
Anonymize the client IP address by zeroing a portion at the end. This doesn’t work in “raw” mode and doesn’t affect IP addresses inside log messages.
- --track-visitors¶
Append a “visitor id” column: each visitor is assigned a unique (and opaque) identification string. This is useful in combination with
--anonymize, because after anonymization, visitors cannot be identified anymore.
- --accumulate=FIELD,{top|more},COUNT¶
Count the number of requests for each value in the field
FIELDand print a table with the counts of each.Valid fields:
remote_host: the client IP addresshost: the HTTPHostrequest headersite: the site name
Valid output types:
top: print only top-mostCOUNTlinesmore: print lines with a counter of at leastCOUNT
Examples:
--accumulate=site,top,10prints the top-10 sites--accumulate=remote_host,more,1000prints client IP addresses that have sent at least 1000 requests
- --per-site=DIRECTORY¶
Instead of writing to standard output, create one file for each site in the specified directory. Existing files will be skipped.
- --per-site-file=FILENAME¶
Makes
--per-sitecreate a directory for each site and create this file in each of them.
- --per-site-nested¶
Makes
--per-sitecreate a nested tree of directories instead of having one flat directory entry per site.
- --host¶
Show the HTTP
Hostrequest header.
- --forwarded-to¶
Show the address of the server each request was forwarded to.
- --resolve-forwarded-to¶
Show the name of the server each request was forwarded to.
- --no-referer¶
Do not show the HTTP
Refererrequest header.
- --no-agent¶
Do not show the HTTP
User-Agentrequest header.
- --content-type¶
Show the HTTP
Content-Typeresponse header.
- --iso8601¶
Print the time stamp in ISO-8601 format.
The following filters are available:
type=TYPEshows only records of the specified type. Available types:http_access: an HTTP requesthttp_error: an HTTP log messagesubmission: an email submissionssh: a log message from an SSH serverjob: a log message from a job process (e.g. Workshop)history: a “history” event
site=NAMEshows only records of the specified site. Specify an empty site name to filter records with no site at all.group_site=COUNT[@SKIP]groups all result records by their “site” attribute, i.e. all records with the same site will be returned successively, followed by all records of the next site and so on. Only records for the firstCOUNTsites are returned, and the rest is ignored. The optionSKIPparameter may be used to skip a number of sites. This can be used to receive records for all sites incrementally, until the result is empty.host=NAMEshows only records of the specified HTTPHostheader. Specify an empty host to filter records with no host at all.uri=URIshows only records whose HTTP request URI is the specified string.uri-prefix=URIshows only records whose HTTP request URI starts with the specified string.generator=NAMEshows only records with the specified “generator” value.since=ISO8601shows only records since the given time stamp. See ISO8601 time stamps for details.until=ISO8601shows only records until the given time stamp. See ISO8601 time stamps for details.time=ISO8601is a shortcut forsince=...anduntil=...date=YYYY-MM-DDis a shortcut which shows records on a certain date (according to the client’s time zone)todayis a shortcut which shows records only of todayduration_longer=DURATIONshows only records with a duration longer than the specified value. The value is a positive integer with one of the unitsus,ms,s,m,h,d. Example:duration_longer=500ms.status=STATUS[:END]shows only records with the specified status. If “END” is also given, then this is the open end of a range. Example:status=500:600shows all server errors.method=METHODshows only records with the specified HTTP method; multiple methods can be specified separated by commas.unsafe_methodshows only records with a HTTP method that are “unsafe”, i.e. POST, PUT etc. (see RFC 9110 9.2.1).window=COUNT[@SKIP]selects a portion (window) of the result. Can limit the number of records and skip a number of records at the beginning.
The client displays records in the standard one-line format by default. If standard output is connected to a datagram or seqpacket socket, then the log datagrams are sent in raw format instead.
ISO8601 time stamps¶
Examples of accepted ISO8601 time stamps:
2019-02-04T16:46:41Z2019-02-04T16:46:41(without time zone)2019-02-04T16:46:41+02(with time zone offset)2019-02-04T16:46:41+0200(with time zone offset)2019-02-04T16:46:41+02:00(with time zone offset)2019-02-04T16:46(seconds omitted)2019-02-04T16(minutes omitted)2019-02-04(time of day omitted)20190204T164641(without field separators)
Other than ISO8601, the following special tokens are understood:
nowis the current time stamptodayis the current date in the local time zoneyesterdayis the previous date in the local time zonetomorrowis the next date in the local time zone
Additionally, time stamps can be specified as an offset relative to now:
+30sis in 30 seconds-30sis 30 seconds ago-15is 15 minutes ago-1his one hour ago-1dis 24 hours ago
Cloning¶
The command clone can be used to clone the contents of another
Pond server:
cm4all-pond-client @pond clone other.pond.server
This asks the local Pond server (listening on abstract socket
@pond) to download the whole database from the Pond daemon on
host other.pond.server.
The operation will run asynchronously, and the client will return
immediately; during the clone, the local Pond server will not accept
any new data on its receiver. It can be canceled at any time
by typing:
cm4all-pond-client @pond cancel
This command is experimental, and should not be used for regular operation. It may change or be removed at any time.
Injecting Data¶
The command inject reads LOG_RECORD packets from
standard input (possibly generated with --raw) and inject
them into the Pond server. The server will only allow this if the
client is local (connected with a local socket, not TCP) and
privileged. Example:
cm4all-pond-client pond.server.local query --raw ... |
cm4all-pond-client @pond inject
This example shows something that is similar to Cloning, but less
efficient, because all data now passes through the client, while
clone transfers data directly between the two Pond servers.
This command was implemented for development and debugging, and is not meant for production use.
Security¶
This software implements no access restrictions. Datagrams from anybody are inserted into the database, and all clients are allowed to access all data.
Due to lack fo access restrictions, this software should not be
accessible to processes which are not authorized to see all data.
Therefore, the Pond listener should not be mounted into
unprivileged jails/containers; instead, Passage should be used as a bridge from
unprivileged entities to the Pond client.