Threat Hunt |

2025

Introducing SchemaWiseAI: The AI-Powered Solution for Seamless Database Query Mapping

In today’s data-driven world, businesses and organizations generate vast amounts of data every day. Cybersecurity analysts, data engineers, and database administrators are increasingly turning to Large Language Models (LLMs) to help generate complex database queries. However, these LLM-generated queries often don’t align with an organization’s specific database schema, creating a major headache for data professionals.

This is where SchemaWiseAI comes in — a middleware tool designed to bridge the gap between generic AI outputs and the specific needs of your data infrastructure; currently in proof-of-concept stage. With SchemaWiseAI, you no longer need to manually adjust LLM-generated queries. The tool automatically transforms queries to match your exact data schema, saving time, reducing errors, and making data management easier.

What is SchemaWiseAI?

SchemaWiseAI is a middleware solution that adapts LLM-generated queries to match the unique database schemas of your organization. By ingesting your custom data structures, SchemaWiseAI ensures that every query is perfectly formatted and tailored to your needs, removing the need for manual adjustments. This powerful tool makes your data queries accurate, efficient, and easy to use, so you can focus on what matters most—getting insights from your data.

Why SchemaWiseAI?

LLMs can produce useful queries, but they often come with generic field names and structures that don’t fit your system. This mismatch requires tedious manual work to adapt each query to your specific data schema, causing unnecessary delays and increasing the chances of errors.

SchemaWiseAI solves this problem by automatically mapping field names and data structures to your custom schema. It makes sure that the queries generated by LLMs are accurate, efficient, and ready for execution in your environment, without the need for manual intervention.

Key Features of SchemaWise AI

Field Name Mapping: Automatically converts generic field names from LLM-generated queries into your custom names.
Query Transformation: Transforms AI-generated queries to fit your exact data schema.
Template-Based Query Generation: Quickly generates queries using predefined templates that match your system.

Example

The current proof-of-concept (POC) version of SchemaWiseAI includes a network proxy mapping feature. Below is a snippet of this mapping, which shows how internal field names used within the organization (on the left) are automatically mapped to new field names. For example, proxy log data with specific field names like “srcip“, “dstip“, “status“, etc., is automatically transformed and mapped to standardized names such as “src“, “dst“, “http_status“, and so on.

"proxy_logs": {
            "fields": {
                "srcip": {"map_to": "src", "type": "string"},
                "dstip": {"map_to": "dst", "type": "string"},
                "bytes": {"map_to": "bytes_total", "type": "string"},
                "status": {"map_to": "http_status", "type": "string"},
                "dhost": {"map_to": "dest_host", "type": "string"},
                "proto": {"map_to": "protocol", "type": "string"},
                "mtd": {"map_to": "method", "type": "string"},
                "url": {"map_to": "uri", "type": "string"}
            }

Output

The final outcome of this schema transformation appears as follows:

User Prompt Request: List all HTTP GET requests with status 404 from the last hour

For more transformation examples, check out Github.

Why Choose Ollama for SchemaWiseAI?

At the core of the current SchemaWiseAI is Ollama (https://ollama.com/), a powerful, local AI platform that runs models directly on your machine, ensuring security, privacy, and speed. Here’s why Ollama is the ideal platform for SchemaWiseAI:

Privacy and Security: Run AI models locally, ensuring that your sensitive data remains secure.
Customizable AI: Tailor the LLM to your specific database needs with ease.
Real-Time Performance: No cloud latency, providing fast, on-demand query generation.
Cost-Effective: Avoid high cloud processing costs by running everything on your own infrastructure.

To get started with Ollama, review my last post where I shared steps on how to install and configure Ollama on Kali.

Who Can Benefit from SchemaWiseAI?

SchemaWiseAI is designed for professionals who work with data and rely on accurate, fast, and customized queries. Key users include:

Cybersecurity Analysts: Quickly generate and refine queries for security logs and threat detection.
Data Engineers: Automate the process of adapting AI queries to fit specific database structures.
Database Administrators: Ensure that all queries are properly aligned with custom schemas, reducing errors and failures.
Business Intelligence Analysts: Easily generate optimized queries for reporting, dashboards, and insights.

Current Limitations

Support for More LLMs: Expanding beyond Ollama to include platforms like OpenAI and other popular models.
Integration with More Data Schemas: Supporting a wider range of schemas, such as Palo Alto logs, DNS logs, and Windows logs.
Improved UX/UI: Enhancements to the user interface for a more intuitive experience.
Expanded Query Optimization: More features to optimize queries for different platforms and use cases.
To manage scalability limitations: take machine learning, pattern-based learning approach, or a hybrid approach.

Getting Started with SchemaWiseAI

Ready to give SchemaWiseAI a try? Follow these easy steps to get started listed on Github: https://github.com/azeemnow/Artificial-intelligence/tree/main/SchemaWiseAI

Conclusion: Transform Your Data Queries with SchemaWiseAI

SchemaWiseAI is the perfect solution for organizations looking to streamline their query generation process, improve query accuracy, and save time. Whether you’re a cybersecurity analyst, data engineer, or business intelligence analyst, SchemaWiseAI is designed to make working with data more efficient.

By automating the transformation of LLM-generated queries into organization-specific formats, SchemaWiseAI saves you the time and effort needed for manual adjustments. And with future features like broader LLM support, expanded schema integration, and improved user experience, SchemaWiseAI is positioned to become a game-changer in the world of data querying.

Disclosure:

Please note that some of the SchemaWiseAI code and content in this post were generated with the help of AI/Large Language Models (LLMs). The generated code and content has been carefully reviewed and adapted to ensure accuracy and relevance.

Tagged AI query adaptation, AI-powered query generation, Cyber Security, LLM, ollama, SchemaWiseAI, Splunk query optimization

2020

How to Quickly Analyze a PCAP File

I am so excited to introduce NFPA – a Network Forensic Processing & Analysis tool!

network_security_ediscovery_network_analysis_network_monitor_forensics_tools_pcap_forensics_packet_forensics_capture_wireshark — **NFPA** – Network Forensic Processing & Analysis

My purpose behind NFPA tool is to provide Cybersecurity analysts a more efficient and automated (“click & forget”) means of executing commonly-used, open-source network forensics utilities and analysis queries against a piece of network evidence (PCAP).

NFPA tool helps optimize investigations by reducing errors that are typically involved in manually processing and analyzing network-based evidence through various popular tools and command-line options.

Using NFPA, an analyst can:

quickly process case evidence through various popular tools and utilities all by a simple script execution
review results from 60+ individual, multi-purpose queries pre-ran again the evidence
view the native output from all of the evidence process utilities – providing the opportunity for any validation or further analysis

All of the above is organized in an easy-to-understand structure which allows the analyst to quickly find answers as well as the authoritative source of those answers.

Here is a quick demo of NFPA in action:

A key requirement when designing NFPA was to keep dependencies as minimum as possible. I wanted to make sure I leverage a platform that is already commonly used by analysts which is pre-configured with all of the necessary tools and capabilities. This would allow analysts to instantly begin their work on investigations and not have to deal with the underlying system engineering.

To that end, here is the only dependency:

SANS SIFT Ubuntu Workstation (https://digital-forensics.sans.org/community/downloads)

Additionally, the NFPA is built-in Bash. Which means you do not have to import any specific libraries or run a certain version. Another advantage of using Bash is that you will most likely be able to run NFPA on other Linux distributions (may need to install some purpose-built network forensic tools separately).

The first version of the tool is now available on Github. Please check it out and let me know what you think!

Tagged for572, gnfa, netflow, NFDump, PassiveDNS, pcap, sift workstation, tcpdump, tshark, wireshark, zeek

2020

What your CMD command line security is missing

Here is what your should do to increase your cmd command line security — Gap in Your Command-line Security

I want to write a follow-up on my last post about chain-of-commands not properly being captured by many defensive tools. During further research and testing, I observed that built-in Windows Command line actions are also not captured.

For instance, a simple act of deleting a file from the CMD Command-line is neither captured in SYSMON or in Windows Event logs:

 CMD.EXE > del /f test_file.txt

The only event observed in SYSMON for the above action was the following:

Additionally, nothing notable was observed in Windows Event logs.

This simple act of deleting a file is a common technique used by the adversaries. This action could be done both manually or through malware. One example where this technique is used is in the case of the Robbinhood Ransomware. In this sandbox report, you can see various quite-delete operations that Robbinhood malware executes.

I understand that there are other means of extracting CMD Command-line execution content. However, many of those require digital forensics analysis.

For instance, you can review Command-line history by analyzing the memory capture using a tool such as Volatility with plugins: cmdscan, consoles or just running strings against the memory image. However, this type of analysis requires either a memory image capture or a specialized commercial solution that can scan live memory content (example). Unfortunately, most organizations do not have access to these enterprise-solutions thus their ability to hunt for such Command-line techniques becomes limited.

MITRE ATT&CK Evaluations also has an entry for this technique 9.C.4 File Deletion where you can select various technologies from the drop-down list and see how they detect this technique.

If you are collecting and hunting full CMD Commandline, I would love to hear about your feedback; especially, if the technology/method that you are using is not one of the ones tested in ATT&CK Evaluations above.

https://attackevals.mitre.org/technique_comparison.html?round=APT29&step_tid=9.C.4_T1107&vendors=

Tagged Adversary Technique, ATT&CK, cmd, Digital Forensics, Memory, MITRE, Windows 10

Cybersecurity | Digital Privacy | AI | Technology | Business| Leadership

Category Archives: Threat Hunt