ECSC 2023
Scoring formula for the European Cyber Security Challenge 2023 A/D CTF, based on a paper by students of the Norwegian University of Science and Technology (NTNU) on scoring in Jeopardy and Attack-Defense competitions.
Summary
The paper analyzes many different scoring formulas including those used for FaustCTF 2024 and SaarCTF 2024 to collect requirements for a fair scoring.
The checker returns one of three results for each service:
up
, recovering
and down
. The result is up
if all SLA checks pass, and
down
if some SLA checks do not pass. A service is considered
recovering
if flags for one round in the retention period could not be
recovered, but the latest round passed SLA checks.
The following python pseudo-code captures the team score calculation:
BASE_ATK = 1
BASE_SLA = 1
BASE_DEF = 1
WEIGHT_ATK = 1
WEIGHT_DEF = 1
WEIGHT_RANK = 1
COST_MIN = 0
COST_MAX = 4 / 5
type CheckerResult = Literal["up"] | Literal["recovering"] | Literal["down"]
@dataclass
class RoundStateFlagstore:
lost: str | None # flag of the current round if stolen by any team
captures: list[str] # flags of this flagstore and round captured
@dataclass
class RoundStateService:
flagstores: list[RoundStateFlagstore]
checker_result: CheckerResult
team_results: list[CheckerResult]
@dataclass
class RoundState:
services: list[RoundStateService]
rank: int # inverse scoreboard position
ranks: dict[str, int]
def score(rounds: list[RoundState], owner: dict[str, str],
captures: dict[str, int]):
attack = defense = sla = 0
for rnd in range(len(rounds)):
for service in rnd.services:
for flagstore in service.flagstores:
for flag in flagstore.captures:
attack += BASE_ATK + WEIGHT_DEF \
+ WEIGHT_ATK / captures[flag]
victim_rank = rnd.ranks[owner[flag]]
if victim_rank < rnd.rank:
attack -= COST_MAX * ((rnd.rank - victim_rank) \
/ len(rnd.team_results)) ** 2 + COST_MIN
if service.checker_result != "down":
if (flag := flagstore.lost) is not None:
num_def = len(rnd.team_results) - captures[flag]
if num_def > 0:
defense += BASE_DEF + WEIGHT_DEF / num_def
if service.checker_result == "up":
sla += BASE_SLA + WEIGHT_DEF + WEIGHT_RANK
elif servec.checker_result == "recovering":
sla += (BASE_SLA + WEIGHT_DEF + WEIGHT_RANK) / 2
return (attack, defense, sla)
Review
- Difficult to reason about
- Scales defense with number of teams that did not capture instead of attackers, which makes it more influenced by inflated non-playing team counts.
- When all teams are exploited in a service, no team loses defense points for that service (weird but improbable edge condition).
- In the worst case, a team gains
BASE_SLA + WEIGHT_DEF + WEIGHT_RANK + BASE_DEF * len(flagstores)
per round (unless the team can't get the service to become non-recovering), and an attacker gainsBASE_ATK + WEIGHT_DEF + WEIGHT_ATK / captures[flag]
. For the given constants therefore, the points gained from SLA will always outweigh an attackers relative gain. - The cost of downtime for
n
rounds is at leastn * (BASE_SLA + WEIGHT_DEF + WEIGHT_RANK + BASE_DEF * flagstores)
and at most(n + (retention_rounds - 1) / 2) * (BASE_SLA + WEIGHT_DEF + WEIGHT_RANK + (BASE_DEF + WEIGHT_DEF) * flagstores)
. The cost of not patching on the other hand is at mostWEIGHT_DEF * flagstores
.
Tenets
Total score MUST increase with more flags captured
Attack points scale linearly with the amount of flags captured.
Total score MUST decrease with more flags lost
Defense points scale non-linearly with the amount of attackers.
Flag value MUST diminish with more successful attacks
Flag values scales inversely with the amount of captures.
Perfect SLA MUST be worth more than any attacker's relative gain
For the given constants, the attacker's relative gain will always be less than the points awarded from SLA and
BASE_DEF
.The cost of downtime MUST NOT outweigh the benefits of patching
For the given constants, it would take significantly more rounds than the amount spent unavaiable to recover the losses of SLA, which disincentivizes patching.
SLA SHOULD decrease fairly with every missing flag in the retention period
SLA does not decrease fairly with the amount of missing flags in the retention period.
Flag value SHOULD be calculated independent of its flagstore
Flag value is not scaled to the amount of flagstores, and thus independent of flagstore.
-
Scoring formula was derived from paper and its implementation in ECSC 2023. ↩