ECSC 2023

Scoring formula for the European Cyber Security Challenge 2023 A/D CTF, based on a paper by students of the Norwegian University of Science and Technology (NTNU) on scoring in Jeopardy and Attack-Defense competitions.

Summary

The paper analyzes many different scoring formulas including those used for FaustCTF 2024 and SaarCTF 2024 to collect requirements for a fair scoring.

The checker returns one of three results for each service: up, recovering and down. The result is up if all SLA checks pass, and down if some SLA checks do not pass. A service is considered recovering if flags for one round in the retention period could not be recovered, but the latest round passed SLA checks.

The following python pseudo-code captures the team score calculation:

BASE_ATK = 1
BASE_SLA = 1
BASE_DEF = 1

WEIGHT_ATK = 1
WEIGHT_DEF = 1
WEIGHT_RANK = 1

COST_MIN = 0
COST_MAX = 4 / 5

type CheckerResult = Literal["up"] | Literal["recovering"] | Literal["down"]

@dataclass
class RoundStateFlagstore:
    lost: str | None # flag of the current round if stolen by any team
    captures: list[str] # flags of this flagstore and round captured

@dataclass
class RoundStateService:
    flagstores: list[RoundStateFlagstore]
    checker_result: CheckerResult
    team_results: list[CheckerResult]

@dataclass
class RoundState:
    services: list[RoundStateService]
    rank: int # inverse scoreboard position
    ranks: dict[str, int]

def score(rounds: list[RoundState], owner: dict[str, str],
          captures: dict[str, int]):
    attack = defense = sla = 0
    for rnd in range(len(rounds)):
        for service in rnd.services:
            for flagstore in service.flagstores:
                for flag in flagstore.captures:
                    attack += BASE_ATK + WEIGHT_DEF \
                        + WEIGHT_ATK / captures[flag]
                    victim_rank = rnd.ranks[owner[flag]]
                    if victim_rank < rnd.rank:
                        attack -= COST_MAX * ((rnd.rank - victim_rank) \
                            / len(rnd.team_results)) ** 2 + COST_MIN

                if service.checker_result != "down":
                    if (flag := flagstore.lost) is not None:
                        num_def = len(rnd.team_results) - captures[flag]
                        if num_def > 0:
                            defense += BASE_DEF + WEIGHT_DEF / num_def

            if service.checker_result == "up":
                sla += BASE_SLA + WEIGHT_DEF + WEIGHT_RANK
            elif servec.checker_result == "recovering":
                sla += (BASE_SLA + WEIGHT_DEF + WEIGHT_RANK) / 2
    return (attack, defense, sla)

Review

Difficult to reason about
Scales defense with number of teams that did not capture instead of attackers, which makes it more influenced by inflated non-playing team counts.
When all teams are exploited in a service, no team loses defense points for that service (weird but improbable edge condition).
In the worst case, a team gains BASE_SLA + WEIGHT_DEF + WEIGHT_RANK + BASE_DEF * len(flagstores) per round (unless the team can't get the service to become non-recovering), and an attacker gains BASE_ATK + WEIGHT_DEF + WEIGHT_ATK / captures[flag]. For the given constants therefore, the points gained from SLA will always outweigh an attackers relative gain.
The cost of downtime for n rounds is at least n * (BASE_SLA + WEIGHT_DEF + WEIGHT_RANK + BASE_DEF * flagstores) and at most (n + (retention_rounds - 1) / 2) * (BASE_SLA + WEIGHT_DEF + WEIGHT_RANK + (BASE_DEF + WEIGHT_DEF) * flagstores). The cost of not patching on the other hand is at most WEIGHT_DEF * flagstores.

Tenets

Total score MUST increase with more flags captured
Attack points scale linearly with the amount of flags captured.
Total score MUST decrease with more flags lost
Defense points scale non-linearly with the amount of attackers.
Flag value MUST diminish with more successful attacks
Flag values scales inversely with the amount of captures.
Perfect SLA MUST be worth more than any attacker's relative gain
For the given constants, the attacker's relative gain will always be less than the points awarded from SLA and BASE_DEF.
The cost of downtime MUST NOT outweigh the benefits of patching
For the given constants, it would take significantly more rounds than the amount spent unavaiable to recover the losses of SLA, which disincentivizes patching.
SLA SHOULD decrease fairly with every missing flag in the retention period
SLA does not decrease fairly with the amount of missing flags in the retention period.
Flag value SHOULD be calculated independent of its flagstore
Flag value is not scaled to the amount of flagstores, and thus independent of flagstore.

Scoring formula was derived from paper and its implementation in ECSC 2023. ↩