Consistency of CVSS
Consistency of CVSSv3.1
The Common Vulnerability Scoring System (CVSS) is a globally known scoring system used by many companies.
With the help of CVSS, security vulnerabilities are evaluated based on certain metrics – for example, whether or not a user needs to be involved in an attack. CVSS is used to calculate a score between 0 and 10, indicating the severity of the vulnerability.
However, previous research has shown that scores of different evaluators for the same vulnerability are likely to differ, but little is still known about the factors that influence the scoring. We therefore conducted an empirical study to investigate the consistency of CVSSv3.1.
Online Study
We first conducted a literature review, analyzed the CVSS documentation, and conducted preliminary research with CVSS experts to get an overview of potential problematic aspects of CVSS, which then became the focus of our paper. This research resulted in a focus on 8 different vulnerability types and 3 metrics which we chose. Participants in the online survey were then asked to rate the selected vulnerabilities using the CVSSv3.1 online calculator.
In the questionnaire, in addition to the scoring assessment, other questions were asked, such as CVSS at work or personal attitudes toward CVSS. Thus, in addition to consistency, we also explored whether there are personal factors that can influence an evaluation. A total of 196 people participated in this study.
To find out whether scores remain consistent across time for an individual, we conducted a follow-up study 9 months later in which 59 people participated. Here, participants were asked to evaluate the same vulnerabilities again. This allowed us to find out whether CVSS scores also differ within a person over time.
Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities
In the following, we provide a description of some of the findings reported in our paper “Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities” which got accepted at IEEE Symposium on Security and Privacy (S&P) 2024.
Consistency of Metrics
None of the metrics considered (Attack Vector, User Interaction, Scope) were assessed consistently enough.
The results showed that for MITM vulnerabilities, it is often unclear whether AV:A or AV:N should be chosen. A reason for AV:A could be the assumption that MITM attacks can only be carried out within the local broadcast domain, close to the target system. The CVSS documentation instructs to score AV:N, as the attacker needs network level to access the communication channel. However, 30% of the participants chose AV:A.
Other notable inconsistencies were between AV:N and AV:L for drive-by download vulnerabilities. When downloaded, the malicious code is usually locally saved on the PC, therefore one can argue for AV:L. On the other hand, the code needs to be accessed from the network, so one can argue for AV:N. Considering the results, 60% – 70% of the participants chose AV:N for drive-by download vulnerabilities.
This indicates that scoring of AV needs further clarification.
In reflected XSS vulnerabilities, the attacker needs to convince the victim to click on a specific malicious link, which leads to a website where malicious code is executed. So User Interaction should be rated as UI:R. In stored XSS vulnerabilities, the malicious code within a website is executed every time a user visits this site. In this vulnerability type, the attacker does not need to convince a victim directly, as the malicious code is executed for every user who visits the website. For the reflected XSS vulnerability, 76% of the participants chose UI:R, whereas only 58% chose UI:R for stored XSS. Compared to stored XSS, evaluating reflected XSS seems to be much clearer. The reason behind this might be that for reflected XSS the common knowledge is that the victim needs to click on a link to trigger the malicious code. However, for stored XSS the CVSS documentation recommends to evaluate UI:R, as the victim also needs to navigate to a web page. Thus, a guideline exists, but seems to be unknown.
The Scope metric presents a significant challenge. We found that irrespective of the actual vulnerability type, a Changed Scope (S:C) was applied by about one third of participants, while the remaining two thirds consistently opted for an Unchanged Scope (S:U). The recurring selection of an Unchanged Scope (S:U) by a considerable number of participants appears to stem from their challenges in interpreting the Scope metric, as evidenced by their comments during our study.
Security Deficiencies
Another problem were security deficiencies, that can facilitate an attack in combination with a vulnerability, but do not lead to a successful attack per se, such as a missing HTTPOnly flag. While enabling this flag can be beneficial, its absence does not mean that an application is vulnerable to XSS attacks, which should result in a CVSS score of None. The severity of HTTPOnly varied the most compared to the other vulnerabilities, where only 7.1% of participants assigned None severity. Another issue was Banner Disclosure, where information on use of a particular web server and its version in HTTP responses is disclosed. This disclosure makes it easier for an attacker to search for known vulnerabilities in the product, although the disclosure alone will not cause any damage. Banner Disclosure was mostly evaluated with Medium severity, but has the highest percentage of None ratings (9.2%). This shows that there is a need for CVSS to specify how security deficiencies should be handled.
Consistency over Time and Peronal Factors
The follow-up study also showed that ratings did not remain consistent over time. From 35% to 55% of the participants rated a different severity for the same vulnerabilities that they assessed in the main study.
Considering personal factors, we found little evidence that personal factors influence the consistency of the ratings. Another interesting finding was that the CVSS documentation is hardly read by its users. Yet, the participants who reported using CVSS documentation during assessments, and participants who answered the seven cases from CVSS documentation correctly, performed better in the study.
Attitudes towards CVSS
Three quarters agreed that CVSS is useful, while almost two thirds felt confident using it. Furthermore, almost half of the respondents considered scores to be inconsistent and an overwhelming majority agreed that scores differ across different raters. These ratios suggest that users are aware of problems associated with CVSS, but that these do not outweigh its advantages. To capture this sentiment, we cite P124: “CVSS is like democracy: the worst system available, except for all the other systems ever tried.”
Discussion
We found that evaluations are not consistent across different evaluators, and are also not consistent over time for the same evaluator. We found a low number of personal factors related to individual (in)consistency, and the corresponding regression model had a low explanatory power. It seems that inconsistency is more closely related to the properties of CVSS, such as problematic metrics and documentation, than to the personal factors that we investigated.
Steps towards Effective Vulnerability Management. We highlighted that a significant number of participants incorporate CVSS in their risk management processes and vulnerability prioritization methods. Consequently, these inconsistencies in CVSS scores can lead to inaccurate resource allocation. This could result in critical vulnerabilities being sidelined while less severe ones receive undue attention. This discrepancy poses a serious challenge to effective vulnerability management, potentially introducing risks that could escalate to tangible damages.
Enhancing Accessibility of CVSS Documentation. Our study also reveals that a significant proportion of evaluators rarely consult the CVSS documentation, with around 30% of our sample reporting that they have never read it. One possible barrier could be the dispersion of information across three documents: the specification, the user guide and the examples. This may discourage users from exploring the documentation, as they may perceive extraction and synthesis of the essential content as too laborious. Moreover, the widespread use of an online CVSS calculator offers another plausible explanation. The calculator provides instant access to tooltips that show brief explanations for each metric, which may give users a sense of being adequately informed. This might lead users to believe that delving into the documentation would not provide any further valuable insights. In light of these findings, we recommend making the CVSS documentation more accessible and intuitive to use. The online calculator could also be further optimized to provide evaluators with more comprehensive and contextually relevant guidance.
Refining Metrics. Our study results also indicate that specific metrics such as User Interaction, Attack Vector and Scope require more precise definition and clearer guidance for some widespread vulnerabilities. We suspect that similar ambiguities may exist for other types of vulnerabilities and other metrics, highlighting the need for further research.
Balancing the Value and Shortcomings of CVSS. Despite the challenges identified with CVSS, our findings confirm its value as an indispensable tool within the security community. Participants largely valued CVSS for its role as a standardised mechanism for communicating the severity of vulnerabilities to different stakeholders. They recognized its limitations, but continued to use it, highlighting the lack of better alternatives. As we move forward, it is clear that while CVSS has its problems, the focus should be on refining and improving this system, rather than discarding it altogether. This highlights the critical need for continued research and iterative development to improve the accuracy, consistency and usability of CVSS, ultimately supporting more robust vulnerability management practices.
Further Information
- Paper: Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities (S&P 2024)
- Public Data Set