# Model Training

SlinkyLayer turns raw exchange data into a deployable trading policy through a repeatable, five–step workflow.\
\
Each run is archived as a JSON spec plus content hashes, so any compliant node can reproduce the result exactly.

***

## Wizard Overview

<table><thead><tr><th width="156.93682861328125">Step</th><th width="250.1220703125">Required fields</th><th>Notes</th></tr></thead><tbody><tr><td>1) <strong>Market</strong></td><td><code>exchange, symbol, timeframe</code></td><td>Three bar sizes (5 m, 15 m, 60 m).</td></tr><tr><td>2) <strong>Split</strong></td><td><code>train_start, test_start</code></td><td>Presets: Short, Medium, Long; Custom allowed.</td></tr><tr><td>3) <strong>Reward</strong></td><td><code>reward_name, params</code></td><td>Default <code>return_minus_cost</code>; alternatives Sharpe-scaled, log-return, trade-count, win-loss.</td></tr><tr><td>4) <strong>Budget</strong></td><td><code>total_timesteps, n_envs</code></td><td>Auto = 10 × bars-in-train, clamped 2e5–2e6; presets Quick / Standard / Thorough.</td></tr><tr><td>5) <strong>Hyper-params</strong></td><td><p><code>algo, net_arch, lr,</code> </p><p><code>n_steps …</code></p></td><td>Hidden in Beginner mode; fully editable in Professional mode.</td></tr></tbody></table>

Client-side checks enforce safe ranges, e.g. for PPO\
(n<sub>steps</sub>×n<sub>envs</sub>) mod  batch\_size =0 &#x20;

***

## Supported Algorithms

<table><thead><tr><th width="84.84442138671875">Algo</th><th width="128.6136474609375">Style</th><th width="156.041259765625">Memory</th><th>Typical use</th></tr></thead><tbody><tr><td>PPO</td><td>on-policy</td><td>rollout only</td><td>General workhorse; stable on parallel workers.</td></tr><tr><td>A2C</td><td>on-policy</td><td>rollout only</td><td>Minimal compute; rapid feedback.</td></tr><tr><td>SAC</td><td>off-policy</td><td>replay buffer</td><td>Sample-efficient; handles noisy returns.</td></tr><tr><td>TD3</td><td>off-policy</td><td>replay buffer</td><td>Deterministic, twin critics, action noise.</td></tr><tr><td>DDPG</td><td>off-policy</td><td>replay buffer</td><td>Legacy deterministic baseline.</td></tr></tbody></table>

All share an MLP policy; layer sizes (`net_arch`) and activation (`Tanh` or `ReLU`) are user-selectable.

***

## Training Pipeline

```
rollout_pods  →  learner_GPU  →  checkpoint
      ↑               |               |
feature_table   metrics stream   back-test job
```

1. **Rollout pods** run `n_envs` vectorised environments, each stepping `n_steps` with the chosen fee, slippage, and short settings.
2. **Learner pod** computes gradients on GPU, applies Adam updates, and checkpoints every *k* updates.
3. **Back-test job** replays the frozen policy on the unseen test split; outputs equity curve, Sharpe, Sortino, max drawdown.

***

## Reproducibility and Forward Audit

**Artefacts stored per run**

| File               | Purpose                      |
| ------------------ | ---------------------------- |
| `config.json`      | Full wizard output.          |
| `data_hash.txt`    | SHA-256 of training candles. |
| `feature_hash.txt` | CID of indicator table.      |
| `checkpoint.pt`    | Final weights.               |
| `metrics.json`     | Risk metrics on test split.  |

**Audit path**

Auditor nodes subscribe to the live signal stream only.

1. Verify arrival ≤ 5 s after candle close.
2. Recompute one-step reward using canonical price feed.
3. Sign segment hash; quorum of three identical receipts marks the segment verified on chain.

Auditors never receive private hyper-parameter sweeps, keeping intellectual property safe while giving traders cryptographic proof that signals are fresh and honest.
