Attacking-Lab Scoring Formula v1

Scoring formula designed by Attacking-Lab to address perceived short-comings.

Summary

Each team's score is calculated from offense, defense and sla components of each of their services in all rounds played.

The checker returns one of three results for each service, up, recovering and down:

A service is considered up if all flags could be successfully deployed and retrieved, and all other checks were successful.
A service is considered down if any checks for the current round failed.
A service is considered recovering if not all flags successfully deployed in the so-called retention period could be retrieved by the checker. In this case points are awarded relative to the ratio of flags which could be recovered (sla_ratio), as proposed in Tenet 7. Which flags from the retention period are missing is visualized by the scoreboard.

The retention period should last at least the round-equivalent of 5 minutes, such that there is enough time to recover from sudden flag submitter downtime, and teams need not exploit every round.

Additionally, the end of every round should feature a 5 second checker hold in which no requests by the checker are active to give teams a time slot in which services can be restarted without downtime. This also ensures no team is disadvantaged due to unfortunate scheduling of checker requests.

The following python pseudo-code captures the team score calculation:

@dataclass
class CTFInfo:
    team_count: int # includes NOP team
    retention_rounds: int

@dataclass
class RoundStateFlagstore:
    lost: str | None # flag of the current round if stolen by any team
    active: list[str] # flags of this flagstore deployed in the retention period
    captures: list[str] # flags of this flagstore captured from other teams

@dataclass
class RoundStateService:
    flagstores: list[RoundStateFlagstore]
    checker_result: Literal["up"] | Literal["recovering"] | Literal["down"]

    @property
    def max_sla(self) -> int:
        return 2 * len(self.flagstores) + 1

@dataclass
class RoundState:
    services: list[RoundStateService]

def score(rounds: list[RoundState], ctf: CTFInfo, captures: dict[str, int]):
    attack = defense = sla = 0
    for rnd in rounds:
        for service in rnd.services:
            if service.checker_result == "up":
                sla += 1
            for flagstore in service.flagstores:
                sla_ratio = len(flagstore.active) / ctf.retention_rounds
                if service.checker_result != "down":
                    sla += 2 * sla_ratio

                if (flag := flagstore.lost) is not None:
                    defense -= (1 + captures[flag] / ctf.team_count) / 2

                for flag in flagstore.captures:
                    attack += (1 + 1 / captures[flag]) / 2
    return (attack, defense, sla)

Review

Any round that a service is unavailable, the corresponding team loses SLA equal to sla_max for that round. Additionally, since some flags could not be deployed, the team will receive partial SLA for subsequent rounds in the retention period, at most (retention_rounds - 1) / retention_rounds * sla_max. Therefore, the total cost of a service becoming unavailable for n rounds is at least sla_max and at most n * sla_max + (retention_rounds - 1) / retention_rounds * 2 =~ n * sla_max + 2, both of which are greater than the maximum relative gain of an attacker (len(flagstores) * 2).
To incentivize defense and reduce the relative cost of patching, defense points start at -0.5 for a single attacker and scale linearly to -1 with the number of captures thereafter.
When a service becomes unavaiable due to patching, the lost points can only be recovered relative to the unpatched state if the service will be unsuccessfully attacked for (at worst with len(flagstores) = 1) (n * sla_max + 2) / (len(flagstores) / 2) - n = 5 * n + 4 rounds more than the patching made the service unavaiable for. Patching should reasonably result in at most a few rounds of downtime (e.g. 2), the lost points can be recovered in only a few rounds of subsequent uptime (6 * 2 + 4 = 16). Additionally, the checker hold makes it feasible for valid patches to be deployed with zero downtime deterministically.
Captured flags' value scales with the number of captures, therefore this formula suffers from the same quirk as FaustCTF 2024 and similar, which is that the attack score may decrease over time, confusing players. To mitigate this, the scoreboard displays both the expected and realized attack points.

Tenets

Total score MUST increase with more flags captured
Score increases with attack, which scales with flags captured.
Total score MUST decrease with more flags lost
Score decreases with defense, which scales with flags lost.
Flag value MUST diminish with more successful attacks
A flag's value scales inversely with the number of captures.
Perfect SLA MUST be worth more than any attacker's relative gain
The maximum points gained by any attack (flagstores * 2) is less than the minimum cost of downtime (sla_max = flagstores * 2 + 1).
The cost of downtime MUST NOT outweigh the benefits of patching
The cost of downtime due to patching can be recovered in few subsequent rounds of prevented exploitation.
SLA SHOULD decrease fairly with every missing flag in the retention period
sla_ratio decreases fairly with every missing flag in the retention period.
Flag value SHOULD be calculated independent of its flagstore
Flag value does not depend on the amount of flagstores in the service.