Ace Intelligence

PralayAI — Defensive Cybersecurity AI Assistant

A full-stack defensive cybersecurity chatbot built with a fine-tuned open-source LLM (Qwen2.5 1.5B, QLoRA), FastAPI backend, PostgreSQL chat persistence, and a React Gemini-clone frontend.

PralayAI is a cybersecurity-focused AI assistant designed to help students, developers, and security learners understand defensive cybersecurity workflows. The system uses a fine-tuned Qwen2.5 1.5B Instruct model trained with QLoRA on a curated cybersecurity instruction dataset. The model is deployed via dual inference paths — a local CUDA API for fast development and a public Hugging Face Space for demos — and served through a FastAPI backend with PostgreSQL persistence and a React frontend.

Quick Start

Clone the repo, install dependencies, configure your environment, and launch all 3 services with a single startup script.

git clone https://github.com/OMCHOKSI108/pralayAI
cd pralayAI
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
./start.sh
# Starts: Inference API (:5000) | Backend (:8000) | Frontend (:5173)

Model Architecture

PralayAI is built on Qwen2.5 1.5B Instruct, fine-tuned with QLoRA using the Unsloth framework on a cybersecurity conversational instruction dataset. The LoRA adapter is merged with the base model for deployment. The model repository and adapter are published on Hugging Face for reproducibility.

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Fine-tuning: Unsloth + QLoRA
Adapter: OMCHOKSI108/Paralay1.1
Merged Model: OMCHOKSI108/Paralay1.1-Merged
Dataset: OMCHOKSI108/cybersecdata
Inference API: omchoksi108-pralayai-inference-api.hf.space/generate

System Architecture

The system follows a four-component architecture: React Gemini-clone frontend sends user messages to the FastAPI backend, which persists conversations in PostgreSQL and routes inference requests to either the local CUDA inference API (port 5000, ~4.5s latency) or the Hugging Face Space CPU API (~54s latency). The model generates a defensive cybersecurity response, which is saved and returned through the backend to the frontend.

React Frontend (:5173)
       │
       ▼
FastAPI Backend (:8000) ──► PostgreSQL
       │
       ├── Local CUDA API (:5000) ──► Merged Model
       │         (~4.5s on GPU)
       └── HF Space API (cloud) ──► Merged Model
                 (~54s on CPU)

Safety & Evaluation

PralayAI includes a strict defensive-only safety policy. The model is trained to refuse requests involving phishing, credential theft, malware creation, ransomware, reverse shells, and evasion techniques. An automated evaluation notebook runs 8 defensive queries and 5 adversarial safety prompts, scoring responses on keyword coverage, structure, depth, and refusal quality.

Defensive Use Cases:
  Incident Response | Log Analysis | Threat Detection
  MITRE ATT&CK Mapping | Cloud Security | Malware Defense
  Security Awareness | Hardening Guidance

Blocked Topics:
  Phishing | Credential Theft | Malware | Ransomware
  Reverse Shells | Evasion | Exploitation

API & Inference

The backend exposes a single POST /api/chat endpoint that accepts a message, optional conversation_id, and generation parameters. It applies safety filtering, routes to the inference engine, and returns a structured response with the assistant message, latency, and source. The inference API is also directly callable for testing.

POST /api/chat
{
  "message": "Explain incident response in 5 steps.",
  "conversation_id": null,
  "max_new_tokens": 300,
  "temperature": 0.7
}

Response: {
  "assistant_message": "...",
  "conversation_id": "uuid",
  "latency_seconds": 4.5,
  "source": "local-cuda"
}

Dataset & Training

The model was fine-tuned on a curated cybersecurity conversational dataset covering incident response, log analysis, malware defense, cloud security, and MITRE ATT&CK explanations. Training used QLoRA for memory efficiency, with loss convergence tracked across fine-tuning steps. The model training summary and safety evaluation scores are documented in the repo.

Tech Stack

Python powers the fine-tuning pipeline with Unsloth and QLoRA. FastAPI serves the backend with SQLAlchemy + PostgreSQL for persistence. React with Vite provides the Gemini-clone frontend. Hugging Face handles model hosting and public inference. Local CUDA inference runs via a Flask wrapper.

Python | FastAPI | React | Vite | PostgreSQL | Qwen2.5 | QLoRA | Unsloth | Hugging Face | Docker

GitHub Repository

Full source code including the fine-tuning notebook, model merge script, HF Space deployment configuration, FastAPI backend, React frontend, and comprehensive evaluation notebook.

https://github.com/OMCHOKSI108/pralayAI

Team

Built by the Ace Intelligence founding team.

Om Choksi (CTO) — https://github.com/OMCHOKSI108

Back to home

Ace Intelligence Systems

Preparing a calmer, clearer view of your automation workspace.

Aligning workflows