ECSC 2025 Scoring Formula

Scoring formula for the European Cybersecurity Challenge 2025 A/D CTF.

This formula is based on ATKLAB v2.

Summary

In Jeopardy CTFs, dynamic scoring is used to infer the difficulty of a challenge based on the number of teams that can solve it. This scoring formula applies the same concept to A/D.

In effect, each round is treated as a Jeopardy CTF with the following challenges:

For each flag you capture, you receive ATK points based on the number of teams that capture that flag.
For each service and each flag store, you receive DEF points for each actively exploiting team that did not capture your flag, proportional to the number of teams whose flags that team did capture.

Additionally, you gain a fixed amount of SLA points for each deployed flag that is still valid (submittable for points) and retrievable from the service, as long as the checker status is SUCCESS or RECOVERING.

Checker Status

The checker returns one of the following results for each service:

SUCCESS if all flags could be successfully deployed and retrieved, and functionality checks were successful.
RECOVERING if all checks for the current round succeed, but flags from the past 4 rounds are missing.
MUMBLE if any functionality checks for the current round failed.
OFFLINE if the checker failed to establish a connection to the service.
INTERNAL_ERROR if an internal error occurred.

Implementation

The implementation may be evaluated against real CTF data using our simulator.

This section is slightly adapted from the original wiki for clarity.

Dynamic Scoring

Dynamic scaling is applied to each challenge by using the ECSC 2025 Jeopardy formula. We use a lower base value to account for the fact that the A/D CTF will have more dynamically weighted challenges, as we do not want scores to become too large.

def jeopardy(teams: int, base: int = 10):
    return int(base * (30 / (29 + max(teams, 1))) ** 3)

Implementation Details

teams: The number of teams who have solved this challenge.
base: The maximum number of points that can be earned through a challenge.

The value of a challenge is close to base when the number of solving teams is low and close to zero when the number of solving teams is high. In this case, the value will drop down to 8% of base at 40 teams.

Attack Points

To determine ATK points, the gameserver updates the number of submissions of all valid flags, based on the number of flags submitted each round, and recalculates the value of each flag based on the number of teams who were able to capture it.

def attack_flag(num_submissions: int):
    return jeopardy(num_submissions)

Implementation Details

Context: This function is called per active attacker and for every victim, for each service and flagstore to calculate the value of the stolen flag.

num_submissions: The number of submissions of flag.

Additional ATK points are awarded to each attacking team based on the DEF points that other teams earn from defending against their attacks. This prevents scenarios where defenders gain more DEF points from an attack than the attacker—in other words, failed exploit attempts do not negatively affect the attacker ranking.

def attack(live_round: int, flag_round: int,
               max_victims: int, num_victims: int, num_attackers: int):
    checker_status = defaultdict(lambda: "SUCCESS")
    flag_avail_in = defaultdict(lambda: defaultdict(lambda: True))
    pts = defense_scaled(live_round, flag_round, checker_status, flag_avail_in,
                         max_victims, num_victims, num_attackers, True)
    for flag in flags_stolen:
        pts += attack_flag(flag.num_submissions)
    return pts

Implementation Details

Context: Each round, this function is called per attack, for the service and flagstore bein attacked by an attackers, to calculate the value of the entire attack over all victims.

flag.num_submissions: The number of submissions for the flag of the current victim.
flags_stolen: The flags stolen for this service and flagstore by the attacker that were deployed in flag_round.
max_victims: The number of teams who are not the attacker or NOP, that have atleast one service not in OFFLINE state in the current round.
num_victims: The number of teams exploited by the attack which points are currently being calculated for.
num_attackers: The number of teams attacking this service and flagstore, and obtaining flags stored in flag_round.

The scores of teams who have captured flags previously are updated to reflect the decreased value of those flags by new submissions.

The NOP team does not gain attack points.

Defense Points

To determine DEF points, the gameserver updates the amount of captures of every flag which is still valid each round. For every team, the points gained from defending against a specific attacker are calculated based on the number of teams that were not exploited by them in that flag store in that round. This is meant to reflect that some exploits are much more difficult to defend against than others and reward teams that can construct solid defenses.

These points are then scaled by the number of active teams (excluding the attacking team and NOP), and divided by the number of attackers for that flag store. Defense points are only awarded for active attackers, i.e., those teams that submit at least one flag from that flag store and round. If no teams are exploited, no teams receive DEF points.

def defense_raw(max_victims: int, num_victims: int,
            num_attackers: int, exploited: bool):
    if exploited or num_victims == 0:
        return 0
    return jeopardy(max_victims - num_victims) * max_victims / num_attackers

Implementation Details

Context: Each round, this function is called per service and flag store, for each active attacker and for every team, to update the value of teams defending / not defending the attack.

max_victims: The number of teams who are not the attacker or NOP, that have atleast one service not in OFFLINE state in the current round.
num_victims: The number of teams exploited by the attack which points are currently being calculated for.
num_attackers: The number of teams submitting flags from this service and flag store deployed in a specific round.
exploited: Is this team currently being exploited?

The defense points are scaled this way so that defenders are still able to earn a similar amount of points to a small number of attacking teams, and that being the first to attack a flagstore does not give so many points that it is impossible for other teams to catch up.

To ensure that deleting flags in your own service is not a viable strategy for earning DEF points, we award the defense points for a flag spread across all rounds for which this flag must be retrievable. If a flag is unavailable in a specific round, no defense points are awarded for that flag in that round. Intuitively, this reflects the idea that defense points should be gained for successful defending; if no flags are at risk, no reward is earned.

def defense(live_round: int, flag_round: int, checker_status: dict[int, str],
                   flag_avail_in: dict[int, dict[tuple[int, int, int, int], bool],
                   max_victims: int, num_victims: int, num_attackers: int, exploited: bool,
                   flag_rounds_valid: int = 5):
    pts = 0
    max_round = max(live_round + 1, flag_round + flag_rounds_valid)
    for round in range(flag_round, max_round):
        if checker_status[round] not in {"SUCCESS", "RECOVERING"}:
            continue
        if flag_avail_in[round][flag_round, team, service, flagstore]:
            pts += defense_raw(max_victims, num_victims, num_attackers, exploited) \
                   / flag_rounds_valid
    return pts

Implementation Details

Context: Each round, this function is called per service and flag store, for each active attacker and for every team, to update the value of teams defending / not defending the attack.

live_round: The round of the game in which the defense points are being updated.
flag_round: The round of the game in which the flag being stolen was deployed.
checker_status: The status of the checker for this service for each round of the game.
flag_avail_in: A mapping for which flags were retrievable from a specific specific round (first key), depending on the team, service, flagstore and round they were deployed in. Remember: each round the checker checks that valid flags can be retrieved.
max_victims: The number of teams who are not the attacker or NOP, that have atleast one service not in OFFLINE state in the current round.
num_victims: The number of teams exploited by the attack which points are currently being calculated for.
num_attackers: The number of teams attacking this service and flagstore, and obtaining flags stored in flag_round.
exploited: Is this team being exploited by the attack which points are currently being calculated for?
flag_rounds_valid: The number of rounds each flag is valid for.

Here, live_round is eventually large enough that the flag's final defense value is calculated, taking into account the availability in all rounds the flag is submittable for points.

At the end of the game, some flags need to be retained for fewer rounds. This means that protecting these flags earns proportionally fewer points over time, as there was also less time for other teams to capture them. However, the total number of flags you need to protect (and thus the defense points that can be earned in each round) does not change at the end of the game.

The NOP team does not gain defense points.

SLA Points

To determine SLA points, the gameserver calculates the ratio between the number of valid flags retrievable from a service and the number of rounds a flag is valid for.

def sla(checker_status: str, flags_avail: int,
        base: int = 10, flag_rounds_valid: int = 5):
    if checker_status == "SUCCESS":
        return base * flagstores
    elif checker_status == "RECOVERING":
        return base * flags_avail / flag_rounds_valid
    else:
        return 0

Implementation Details

Context: Each round, this function is called per team and per service.

checker_status: The status returned by the checker for team and service.
flags_avail: The number of flags available in the last 5 rounds from all flagstores of service for team.
base: The maximum value of each challenge, see jeopardy(..) definition.
flag_rounds_valid: The number of rounds each flag is valid for.

This means that at the start of the CTF, SLA points ramp up from zero to base over the first five rounds, as the validity period is five rounds long.

Total Points

The total score is the sum of the ATK, DEF and SLA components.

Review

Since the capture count of each stored flag determines its worth, attackers are rewarded based on how difficult it is to exploit each specific team.
The same goes for defense; a patch is rewarded based on the number of other teams that could not defend against the exploiting team. If a vulnerability is harder to patch or a specific exploit is harder to defend against, successfully doing so earns more defense points.
Any round that a service is offline or malfunctioning, the corresponding team loses points relative to the amount of flags they did not make available to other players. If the team is not defending, it is flagstores * base. If the team is defending, it is more.
Teams only lose SLA and DEF points relative to the amount of rounds each flag has been made unavaiable for. Since the points from defending are spread across the rounds in which the flag is being checked, and recovering points are awarded according to the amount of flags avialable, patching does not cost the full defense / sla for the rounds in which a team is recovering after patching.
A service should be unavailable for at most a few (let's say 3) rounds due to (unsuccesful) patching. We find that the points lost are flagstores * base + (flagstores - 1) * (teams - 1) * base per round, assuming all other flagstores are patched, and that the minimum gain from the additional patched flagstore is (teams - 2) * jeopardy(teams - 2) per round, assuming all other teams apart from NOP have patched and are attacking. The unrealistic scenario aside (all teams only successfully attacking you in a single flagstore while you are the only team to have all other flagstores patched), the number of subsequent rounds a team would have to be available for following 3 rounds of downtime is at most 60 rounds. This is a lot, but would be far lower in practice, e.g., roughly one round if the other flagstores are not earning defense points.

Tenets

Total score MUST increase with more flags captured
Attack points scale linearly with the amount of flags submitted.
Total score MUST decrease with more flags lost
Defense points are awarded for rounds a team is not exploited, despite another team successfully exploiting that flagstore, proportional to the amount of flags available. Thus teams who have not patched lose more flags the more flagstores and flags from those flagstores are captured (in relation to a patched team).
Flag value MUST diminish with more successful attacks
Each flag's value is inversely proportional to the number of captures.
Perfect SLA MUST be worth more than any attacker's relative gain
An attacker's marginal gain per service per team is flagstores * base, the same as the minimum gain for the defender from keeping it alive. Therefore, no points gained from turning off a service, especially not in regards to the other non-attacking players.
The cost of downtime MUST NOT outweigh the benefits of patching
The cost of downtime due to patching can be recovered in few subsequent rounds through defense points gained (see calculation in review section).
SLA SHOULD decrease fairly with every missing flag in the retention period
Teams only lose SLA and DEF points relative to the amount of rounds each flag has been made unavaiable for.
Flag value SHOULD be calculated independent of its flagstore
Flag value does not depend on the amount of flagstores in the service.