The Artemis security scanner

Artemis is an open-source security vulnerability scanner developed by CERT PL. It is built to look for website misconfigurations and vulnerabilities on a large number of sites. It automatically prepares reports that can be sent to the affected institutions. Thanks to its modular architecture, it can be used to combine the results of various other tools in a single dashboard.

Example Artemis scan results — An example of Artemis scanning results

Tl;dr: how to start scanning?

Clone https://github.com/CERT-Polska/Artemis/ and follow the quick start instructions at https://artemis-scanner.readthedocs.io/en/latest/quick-start.html.

Don't forget about the laws regarding security testing in your jurisdiction.

If you are a national CERT, we will be glad to share our experience and help you with setting up your scanning pipeline - contact us at [email protected] with any questions or problems.

How is Artemis different from other security scanners?

The key difference is that Artemis is able to generate actionable reports that can be directly sent to the affected institutions. When generating the reports, Artemis uses built-in heuristics to distinguish false-positives from real vulnerabilities.

Here is an example of such a report:

The following addresses contain version control system data:

https://example.com:443/.git/config

Making a code repository public may allow an attacker to learn the inner workings of a system, and if it contains passwords or API keys - also gain unauthorized access. Such data shouldn't be publicly available.

Even if directory listing in a repository folder is not enabled, a repository may be copied by an attacker. We recommend making the whole version control folders (not only the example files listed above) inaccessible for external users.

The following addresses contain old Joomla versions:

https://example.com:443 - Joomla 2.5.4

If a site is no longer used, we recommend shutting it down to eliminate the risk of exploitation of known vulnerabilities in older Joomla versions. Otherwise, we recommend regular Joomla core and plugin updates.

The following domains don't have properly configured e-mail sender verification mechanisms:

example.com: Valid SPF record not found. We recommend using all three mechanisms: SPF, DKIM and DMARC to decrease the possibility of successful e-mail message spoofing.

example.com: Valid DMARC record not found. We recommend using all three mechanisms: SPF, DKIM and DMARC to decrease the possibility of successful e-mail message spoofing.

These mechanisms greatly increase the chance that the recipient server will reject a spoofed message. Even if a domain is not used to send e-mails, SPF and DMARC records are needed to reduce the possibility to spoof e-mails.

Similar reports (in Polish, as Artemis supports report translations) are sent by CERT PL to the scanned institutions in our constituency.

What does Artemis look for?

Artemis contains a large number of modules to identify the attack surface and detect various types of security vulnerabilities or misconfigurations. It is able to:

find subdomains using open-source sources (crt.sh, Common Crawl, Wayback Machine, …) so that when encountering example.com, it is able to find e.g. mail.example.com or old.example.com,
perform port scanning and service identification (so that it knows whether a service on a given port is a website or e.g. a database), even when a service is running on a non-standard port (e.g. an HTTP server on port 8002),
detect DNS misconfigurations:
- zone transfer,
- subdomain takeover,
find backups, archives, configuration files (e.g. /wp-config.php.bak) and other files that can contain sensitive information,
brute-force weak passwords (FTP, PostgreSQL, MySQL, SSH, and WordPress, soon we will be able to brute-force arbitrary login panels),
detect directory index,
detect known vulnerabilities using Nuclei, which is an open-source security scanner that is able to detect a large number of known problems - Artemis uses this tool so that we don't need to reinvent the wheel,
find e-mail misconfigurations (for example whether SPF and DMARC is set up correctly or whether the SMTP server is an open relay),
detect SQL Injection vulnerabilities (Artemis uses sqlmap under the hood, but due to custom URL analysis, it is also able to find SQL injections in pretty URLs such as https://example.com/pages/1.html),
detect accidentally published Git/SVN repositories,
perform version check for WordPress, Joomla or Drupal,
verify SSL/TLS configuration,
check whether domain expiry date is approaching.

You don't need to configure anything - just follow the quick start instructions and scan!

Besides, with Artemis you can:

rate-limit the scanning so that only one module scans a host on a given time - to configure that, use the following configuration options: LOCK_SCANNED_TARGETS, REQUESTS_PER_SECOND and SCANNING_PACKETS_PER_SECOND,
easily integrate a new tool (pull requests are also welcome),
build a module that works with various types of objects, be it domains, HTTP services or e.g. WordPress instances.

What we scan?

We started scanning at the beginning of 2023. Since then, we scanned ~50.6k domains and IP addresses and ~251.7k subdomains, including e.g.:

~36.9k domains and ~25.7k subdomains of schools, kindergartens and other education-related institutions (excluding universities),
~5.4k domains and IP addresses and ~95.5k subdomains of the local government, including websites of public utility companies (responsible for e.g. road maintenance),
~1.9k domains and ~7k subdomains of companies and other institutions that voluntarily requested scanning,
ok. 1.1k domains and ~2.9k subdomains of health-related websites,
890 domains and IP addresses and ~84.4k subdomains of universities - these were websites of faculties, but also faculty members, conferences or science clubs,
521 domains and ~2.3k subdomains of banks,
397 domains and ~1.1k subdomains of politicians' websites - the scanning was related to 2023 parliamentary election in Poland,
343 domains and ~1.8k subdomains of local newspaper and news portals.

Domains with the www prefix have been excluded from this list. This means that if a website is available under both example.com and www.example.com, it has been counted once.

We repeat the scanning multiple times a year so that we can detect whether the vulnerabilities were fixed. At CERT PL, when we see a subsequent notification about a critical vulnerability sent to the same institution, we call them directly to make sure they received the vulnerability report and understand the situation. We also ask them about the status of the fix.

What we found?

In 2023 we reported ~184.8k vulnerabilities or misconfigurations, including ~11.6k with high severity. ~65.8k of scanned domains/subdomains had at least one vulnerability/misconfiguration.

We found:

~78.7k obsolete Joomla, WordPress or WordPress plugin versions,
~44.2k SSL/TLS misconfigurations,
~27k SPF/DMARC misconfigurations,
~16k exposed administrative login panels, RDPs etc.,
~11.2k information leaks: possible domain transfer, directory listing, phpinfo(), etc.,
~4.5k high/critical vulnerabilities from Nuclei or sqlmap,
~3.4k exposed backups, source code, database dumps or logs,
20 cases where a domain was close to its expiry date.

Administrators receive the reports on an ongoing basis so that they can quickly react and fix the vulnerabilities.

Since the scanning is automatic, the above numbers may contain duplicates or cases, where the website is not actually vulnerable (for example because a SSL/TLS misconfiguration was found on an unused website).

Lessons learned during a larger-scale scanning

Our current workflow looks as follows:

a package of reports is prepared using the export script,
first support line team manages the contact database and sends e-mails with vulnerability reports to best-known contacts,
first support line team manages the follow-up communication (if needed).

During scanning we ran into multiple challenges. The most notable ones include:

Distinguishing true from false positives - for example, when we detect that /wp-config.php.bak is present, we need to check whether it is indeed an exposed configuration file or a false positive such as a redirect to the homepage. Therefore, Artemis includes many heuristics to keep the number of false positives low.
Rate limiting in a distributed environment - we need to make sure no server is overloaded with requests. Implementing this was tricky as we have multiple scanning modules working in parallel. To address this, Artemis can now be configured to limit the scanning of a given IP in a given time to only one module. To do that, use the LOCK_SCANNED_TARGETS, REQUESTS_PER_SECOND and SCANNING_PACKETS_PER_SECOND options. This means that due to the rate limiting our scanning is slower than we would want it to be, but this was a trade-off we were willing to make.
Deduplication - we needed to implement heuristics to detect whether e.g. two similar vulnerabilities on institution.com and www.institution.com are in fact the same one. This involved solving some unexpected challenges, e.g. some servers in our constituency serving the same content on port 80 and ports 81..84.
Contact database - maintaining an up-to-date contact database requires a significant effort, but this is more of an administrative problem than a technical one.
Running a non-trivial production service - Artemis scanner instance is a medium-scale production service - we need to monitor it and troubleshoot unexpected maintenance problems, such as running out of disk space.
Prioritizing the scans - as scanning is easy, we needed to decide whether all targets we want to scan were worth the effort (because each scan takes time, machine resources and human time in case someone responds to our report). Therefore we don't scan everything we could.

Scanned institutions' responses

Responses are mostly positive, but sometimes:

we receive bug reports (which are frequently correct!),
institutions report false positives,
we need to fix the contacts (as maintaining a contact database is a hard task),
the reports are ignored,
institutions fix the vulnerabilities without responding.

Lessons learned

During scanning, we learned that:

Unfortunately, there are still multiple low-hanging vulnerabilities that can be easily exploited.
Many good offensive tools are available - even plain Nuclei or WordPress/Joomla version check would find multiple vulnerabilities when ran at a large scale.
Iterative development contributed to the project success: instead of building the best scanner possible, we built a MVP with a subset of modules and ran initial scans.
During scans, we observed bugs, fixed them, but also added new modules. Building the project in such a way doesn't require large upfront time investment and quickly yields results that can be used to convince the stakeholders to further develop the project and broaden the scanning.
There are multiple archived or forgotten websites that are vulnerable because of obsolete software versions or software engineering best practices not being observed, e.g. performing SQL queries using:
query('SELECT * FROM posts WHERE TITLE = "' . $_GET['title'] . '"');.
It is a good idea for institutions such as universities to discourage faculties or employees to host their own WordPress instances and instead have hosting infrastructure where updates, security hardening etc. is centralized - therefore the risk of a forgotten conference website being attacked is decreased.

How to use Artemis?

Clone https://github.com/CERT-Polska/Artemis/ and follow the quick start instructions at https://artemis-scanner.readthedocs.io/en/latest/quick-start.html.

Don't forget about the laws regarding security testing in your jurisdiction.

If you are a national CERT, we will be glad to share our experience and help you with setting up your scanning pipeline - contact us at [email protected] with any questions or problems.

We encourage you to start a similar project and improve the security of your constituency!