What is it?
Mquery is a YARA scanning accelerator. It allows you to run standard YARA rules on huge number of samples very quickly.
Running a simple YARA rule on 4.5GB of malware samples
How does it work?
Mquery is able to be that fast by introducing an effective pre-filtering step that allows it to very quickly reject some of the files that won't be matched by the YARA rule. That pre-filtering is done by our own implementation of a n-gram database called UrsaDB. You can learn more about sample indexing on the mquery docs.
mquery scanning process
Rule parsing
In order to run the YARA rule using UrsaDB, mquery first has to convert it to the appropriate syntax. A special module iterates over all rules specified in the condition
sections and checks if they can be described as UrsaDB expressions. This step is pretty tricky because each conditions logic has to be correctly unwrapped to avoid producing any false negatives.
Converting a YARA rule into a UrsaDB query
Filtering
The resulting UrsaDB query is then passed to the indexing engine to produce a list of candidates. These files don't always actually match the YARA rule but if your rule doesn't rely on complex logic or module extensions the amount of false positives should be relatively small.
Final verification
All that's left to do now is to do a final check using the original YARA rule to filter out any possible false positives.
Limitations
Due to its design UrsaDB doesn't work great for all YARA rules. The fastest rules will be those using long strings without wildcards and rules that employ short or wildcard-filled strings will have the slowest boost.
In some cases you can come across rules that can't be converted to a working UrsaDB query. This means that mquery will have to just run the YARA rule on all samples. This, of course, is very slow and depending on your dataset size may take hours if not days.
Mquery tries to protect you against such queries by disallowing you to run them by default. If you're sure that you want to execute them you can modify the configuration to allow it.
Trying to run a very slow query with the safety mechanism in place
Contributing
Both UrsaDB and mquery are open sourced and available on our GitHub page:
We're looking for new contributors and new feature ideas.