Rapid7 Inc.

10/25/2021 | Press release | Distributed by Public on 10/25/2021 07:51

Recog: Data Rules Everything Around Me

The recog project - a recognition framework used to identify products, operating systems, and hardware through matching network probe data against its extensive fingerprint collection - has been around for many years. In the beginning, Rapid7 used it internally as part of the Nexpose vulnerability scanner. Then, in 2014, the fingerprints and Ruby implementation of the framework were released as open-source software, in keeping with Rapid7's continued commitment to open-source initiatives. Later, in 2018, we released a Java implementation of the framework, recog-java, as open-source, and later that year, Rumble released a Go implementation of the framework, recog-go.

Still, there remained one problem to solve with the framework: balancing the roles of content and code. In recog, three different language implementations, with varying levels of feature parity, all support the most basic requirements of processing the XML fingerprint data, matching input data against the fingerprint collection and returning a collection of enrichment parameters, both static and dynamic. The value of these implementations (the code) isn't fully realized without being combined with the fingerprint data (the content).

However, the Ruby implementation is clearly an outlier, since it stores the framework code alongside the fingerprint data. The problem of content versus code would not be as great of a concern if there were only one language implementation - but instead, we have three, and there have been recent conversations about the possibility of a fourth!

Solving the content vs. code conundrum

Carving off the Ruby implementation from the existing repository would leave the content while creating a consistent structure between all language implementations. Since this act would also remove the fingerprint testing performed by the Ruby implementation, it provides an opportunity to assess fingerprint verification across all recog implementations.

In the past, there were delayed reports of issues discovered between the different regular expression engines used in other language implementations after fingerprint pull requests were merged. Prevention required either the contributor or maintainer to verify fingerprint changes against the Java and Go implementations, and while the Go implementation has a verify tool, this was missing from Java.

In order to facilitate future content separation, the Java implementation would need a fingerprint verification tool. This was not as straightforward, since the Java library neither retained the data parsed from the fingerprint examples nor interpolated all parameters. But after some modifications to the `parse` and `match` methods, I was able to remove these impediments. I created an implementation of the recog fingerprint verification tool that matches both the features and behaviors of the Ruby tool as a new module within the Java implementation.

The final step is automation, which will allow contributors and maintainers to efficiently process fingerprint content changes and focus on the correctness of the regular expressions and enrichment parameters. This helps alleviate concerns around any issues with one or more of the language implementations.

I created a new GitHub Actions verify workflow for this purpose. The initial workflow simply runs the `recog_standardize` tool to ensure each fingerprint asserts known identifiers. The latest update to the workflow adds jobs, in which each language implementation's fingerprint verification tool runs against any updated fingerprint XML files. The verify workflow provides necessary feedback to contributors and maintainers, improving the content modification process.

View of successful verify workflow

These steps are the first of more to come that will aid users, contributors, and maintainers of the recog recognition framework project. Recog content and language implementations form a component within other projects in the information security domain.

Recog is often used as a component in large projects, and we have plans for additional tooling to make the framework more directly usable for end users. As recog develops and grows, the Rapid7 team looks forward to watching projects built on top of it develop and grow.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Subscribe