Since Generative AI is currently the most popular topic, I wanted to get my hands dirty and learn something new. I was learning about the Model Context Protocol at the time and wanted to apply it to build something simple.
Idea
On Debian systems, we use debsecan to find vulnerabilities. However, the tool currently provides a simple list of vulnerabilities and packages with no indication of the system's security posture—meaning no criticality information is exposed and no executive summary is provided regarding what needs to be fixed. Of course, one can simply run the following to install existing fixes and be done with it:
apt-get install $(debsecan --suite sid --format packages --only-fixed)
But this is not how things work in corporate environments; you need to provide a report showing the system's previous state and the actions taken to bring it to a safe state. It is all about metrics and reports.
My goal was to use debsecan to generate a list of vulnerabilities, find more detailed information on them, and prioritize them as critical, high, medium, or low. By providing this information to an AI, I could ask it to generate an executive summary report detailing what needs to be addressed immediately and the overall security posture of the system.
Initial Take
My initial thought was to use an existing LLM, either self-hosted or a cloud-based LLM like Gemini (which provides an API with generous limits via AI Studio). I designed functions to output the list of vulnerabilities on the system and provide detailed information on each. The idea was to use these as "tools" for the LLM.
Learnings
- I learned about open-source LLMs using Ollama, which allows you to download and use models on your laptop.
- I used Llama 3.1, Llama 3.2, and Granite 4 on my laptop without a GPU. I managed to run my experiments, even though they were time-consuming and occasionally caused my laptop to crash.
- I learned about Pydantic and how to use it to parse custom JSON schemas with minimal effort.
- I learned about osv.dev, an open-source initiative by Google that aggregates vulnerability information from various sources and provides data in a well-documented JSON schema format.
- I learned about the EPSS (Exploit Prediction Scoring System) and how it is used alongside static CVSS scoring to detect truly critical vulnerabilities. The EPSS score provides an idea of the probability of a vulnerability being exploited in the wild based on actual real-world attacks.
These experiments led to a collection of notebooks. One key takeaway was that when defining tools, I cannot simply output massive amounts of text because it consumes tokens and increases costs for paid models (though it is fine for local models using your own hardware and energy). Self-hosted models require significant prompting to produce proper output, which helped me understand the real-world application of prompt engineering.
Change of Plans
Despite extensive experimentation, I felt I was nowhere close to a full implementation. While using a Gemini learning tool to study MCP, it suddenly occurred to me: why not write the entire thing as an MCP server? This would save me from implementing the agent side and allow me to hook it into any IDE-based LLM.
Design
This MCP server is primarily a mix of a "tool" (which executes on the server machine to identify installed packages and their vulnerabilities) and a "resource" (which exposes read-only information for a specific CVE ID).
The MCP exposes two tools:
- List Vulnerabilities: This tool identifies vulnerabilities in the packages installed on the system, categorizes them using CVE and EPSS scores, and provides a dictionary of critical, high, medium, and low vulnerabilities.
- Research Vulnerabilities: Based on the user prompt, the LLM can identify relevant vulnerabilities and pass them to this function to retrieve details such as whether a fix is available, the fixed version, and criticality.
Vibe Coding
"Vibe coding" is the latest trend, with many claiming that software engineering jobs are a thing of the past. Without going into too much detail, I decided to give it a try. While this is not my first "vibe coded" project (I have done this previously at work using corporate tools), it is my first attempt to vibe code a hobby/learning project.
I chose Antigravity because it seemed to be the only editor providing a sufficient amount of free tokens. For every vibe coding project, I spend time thinking about the barebones skeleton: the modules, function return values, and data structures. This allows me to maintain control over the LLM-generated code so it doesn't become overly complicated or incomprehensible.
As a first step, I wrote down my initial design in a requirements document. In that document, I explicitly called for using debsecan as the model for various components. Additionally, I asked the AI to reference my specific code for the EPSS logic. The reasons were:
- debsecan already solves the core problem; I am simply rebuilding it. debsecan uses a single file generated by the Debian Security team containing all necessary information, which prevents us from needing multiple external sources.
- This provides the flexibility to categorize vulnerabilities within the listing tool itself since all required information is readily available, unlike my original notebook-based design.
I initially used Gemini 3 Flash as the model because I was concerned about exceeding my free limits.
Hiccups
Although it initially seemed successful, I soon noticed discrepancies between the local debsecan outputs and the outputs generated by the tools. I asked the AI to fix this, but after two attempts, it still could not match the outputs. I realized it was writing its own version-comparison logic and failing significantly.
Finally, I instructed it to depend entirely on the python-apt module for version comparison; since it is not on PyPI, I asked it to pull directly from the Git source. This solved some issues, but the problem persisted. By then, my weekly quota was exhausted, and I stopped debugging.
A week later, I resumed debugging with the Claude 3.5 Sonnet model. Within 20-25 minutes, it found the fix, which involved four lines of changes in the parsing logic. However, I ran out of limits again before I could proceed further.
In the requirements, I specified that the list vulnerabilities tool should only provide a dictionary of CVE IDs divided by severity. However, the AI instead provided full text for all vulnerability details, resulting in excessive data—including negligible vulnerabilities—being sent to the LLM. Consequently, it never called the research vulnerabilities tool. Since I had run out of limits, I manually fixed this in a follow-up commit.
How to Use
I have published the current work in the debsecan-mcp repository. I have used the same license as the original debsecan. I am not entirely sure how to interpret licenses for vibe-coded projects, but here we are.
To use this, you need to install the tool in a virtual environment and configure your IDE to use the MCP. Here is how I set it up for Visual Studio Code:
- Follow the guide from the VS Code documentation regarding adding an MCP server.
- My global mcp.json looks like this:
{
"servers": {
"debsecan-mcp": {
"command": "uv",
"args": [
"--directory",
"/home/vasudev/Documents/personal/FOSS/debsecan-mcp/debsecan-mcp",
"run",
"debsecan-mcp"
]
}
},
"inputs": []
}
- I am running it directly from my local codebase using a virtualenv created with uv. You may need to tweak the path based on your installation.
- To use the MCP server in the Copilot chat window, reference it using #debsecan-mcp. The LLM will then use the server for the query.
- Use a prompt like: "Give an executive summary of the system security status and immediate actions to be taken."
- You can observe the LLM using list_vulnerabilities followed by research_cves. Because the first tool only provides CVE IDs based on severity, the LLM is smart enough to research only high and critical vulnerabilities, thereby saving tokens.
What's Next?
This MCP is not yet perfect and has the following issues:
- The list_vulnerabilities dictionary contains duplicate CVE IDs because the code used a list instead of a set. While the LLM is smart enough to deduplicate these, it still costs extra tokens.
- Because I initially modeled this on debsecan, it uses a raw method for parsing /var/lib/dpkg/status instead of python-apt. I am considering switching to python-apt to reduce maintenance overhead.
- Interestingly, the AI did not add a single unit test, which is disappointing. I will add these once my limits are restored.
- I need to create a cleaner README with usage instructions.
- I need to determine if the MCP can be used via HTTP as well as stdio.
Conclusion
Vibe coding is interesting, but things can get out of hand if not managed properly. Even with a good process, code must be reviewed and tested; you cannot blindly trust an AI to handle everything. Even if it adds tests, you must validate them, or you are doomed!