Skip to content

Architecture Guide

Disclaimer

The codebase has evolved very drastically over the last months. This caused side-effects such as unneccesary functions, complex logic here or there or obscure classes, which haven't been refactored yet. Also testing is quite low, since the software architecure changed so much all the time, that it was too much effort to keep it up in the unstable parts of the code. On the road to our first stable release, we will take care of these aspects step by step.

Copy-Plots

This module contains code to copy plots from multiple source directories to multiple target directories. When choosing a disk it takes multiple factors into account such as how mandy plots are already being copied onto that drive. Also the entire process is quite fault-tolerant to disconnects or other issues. While it is not perfect, it does it's job very well for us.

Monitoring

Chia Watchdog

The chia watchdog is the source of chia of data collection. It is also the oldest part of the code from a time before the monitoring layer was added. It consists of a single class ChiaWatchdog with additional class members. These classes are permanently modified by:

  • logfile changes
  • fetching data directly from chia via API calls
  • self-checks run periodically

This information is partially converted into protobuf messages and sent to the server. We don't send everything since some data is just for computations. For example we keep the signage point timestamps to check if a harvester missed a challenge but sending all that data is a waste. It makes more sense to simply send the amount of missed challenges and that's it.

Protobuf Protocol

The collected chia data needs to be sent from the monitoring clients to the monitoring server. The glue for this task is the communication protocol written in [Protobuf][protobuf]. This protocol defines two things: which data is sent from the monitored machines to the server but also what data is stored in the database. The protocol can be found in the folder protocol and the generated source code can be found under chia_tea.protobuf.generated. There is also a [GRPC][grpc] service definition, which defines the communication functions between server and client.

The module chia_tea.monitoring.data_collection converts the ChiaWatchdog data into protobuf messages. It also contains the code to collect hardware information but this will be seperated in the future.

Data Storage

The library uses sqlite3 to store monitoring data on disk. There are two types of tables:

There are two important tables:

  • Update Event Table
  • State Table

The update event table stores all update events ever received for a topic. For example the CpuInfoEvents table contains all cpu data for all machines. To get a fast and easy overview over the current status there is also a state table. In case of the cpu this is CpuInfo. The state table is modified according to the update events, thus if a DELETE event for example is received an entry will be deleted.

Table Schema

The state table uses solely machine_id as primary key to distinguish entries whereas events also use timestamp and event_type. In case an entry is repeatable in protobuf such as a harvester reporting multiple plots for a single machine, then an additional field id is used as part of the primary key to distinguish data entries.

Sqlite Schema Generation

The module chia_tea.protobuf.to_sqlite is the bridge between protobuf and sqlite. It generically defines functions to store or retrieve protobuf messages from sqlite. The code is generic so that almost all changes made to the proto files are automatically handled. While the code works well, it requires severe refactoring to be more understandable and usable.

Limitations are at the moment that enums as well as nested messages cannot be dealt with.

Throttling

The client only sends data if any changes occured. This is always the case for strongly varying metrics such as CPU usage. To avoid spamming the server and thus the database, the submission of such data is throttled in the client config under monitoring.client.send_update_every. We recommend to transmit such data no faster than once per minute. Important data such as harvester connectivity is not throttled at all and the event is transmitted once being captured.

Connection Security

The connection is secured by two certificates (.key and .cert). The filepaths are specified in the config.yml under monitoring.auth. While the monitoring server requires both files '.key' and '.cert', a client uses solely '.cert' to authenticate against the server.

Database Version Compatability

Since the database is generated from protobuf, changing the protocol may require to delete the old database since the tables are inconsistent. This will be addressed in the future.