# XMPP LLM Bot A Python bot that bridges XMPP group chats and direct messages with Large Language Models (LLMs). ## OMEMO support and some other features are based on [Arne/xmpp-ai-bot](https://codeberg.org/Arne) with addition of experimental new features including multi-modal support, file hosting, and media generation. ## Inspired by [jjj333-p/gemini-xmpp](https://github.com/jjj333-p/gemini-xmpp) for group chat features. ## Features ### LLM Capabilities * **Multi-Provider Support:** Native integration for **Google Gemini** (via `google-genai`) and **OpenAI** (via `openai` lib), plus generic support for OpenAI-compatible endpoints (LocalAI, Groq, Ollama). Also supports Groq Compound built-in tools. * **System Prompts:** Configurable global system instructions with **per-room override** capabilities. * **Persistent Memory:** Remembers conversations with configurable context windows. Supports saving memory to disk (`json`) to survive restarts. * **Thinking Skipping:** Option to skip `` bots from reasoning models (like DeepSeek). It helps to keep rooms cleaner. Only necessary when you are using the custom endpoint (%99 of the time you should not). ### Security & Privacy * **OMEMO Encryption (XEP-0384):** Support for E2EE in Direct Messages. (I do NOT guarantee it will work 100% of times) * **AES-GCM Support:** Can encrypt and decrypt `aesgcm://` links (commonly used by clients like *Conversations*, *Cheogram* and *monocles chat*) to read or send encrypted media. * **Access Control:** Whitelist/Blacklist modes for Direct Messages and a "Privileged Users" list. ### Multi-Modal (Vision & Audio) (only tested with Gemini) * **Image Recognition:** Users can send images (http/https or encrypted aesgcm) for the bot to analyze. * **Audio Transcription:** Users can send voice notes; the bot processes them using the LLM's audio capabilities. * **URL Context:** The bot can fetch URLs mentioned in chat, strip the HTML, and read the content to provide context-aware answers. (turn it off if you don't want people to post IP grabber links to bot) * **Video Processing:** Users can send videos and YouTube links; the bot processes them using the LLM's video capabilities. ### Generative Tools * **Image Generation (`!imagen`):** Generates images using **Cloudflare Workers AI Flux** (Black Forest Labs) and automatically uploads them to a file host. * **Text-to-Speech (`!tts`):** Converts text to high-quality speech using **Gemini TTS**. * Supports automatic audio format conversion (PCM -> OGG/WAV) using `pydub`. * **Auto-TTS Mode:** Can optionally reply to *every* message with a voice note. ### File Handling * **Multi-Host Uploading:** Automatically uploads generated media (Images/TTS) to public hosts or your XMPP server itself so XMPP clients can render them inline. * **Supported Hosts:** Catbox, Litterbox, 0x0.st, Imgur, ImgBB, Envs.sh, and Uguu.se. You can use your XMPP server too. ### Core XMPP Features * **MUC Support:** specialized handling for Group Chats (nicknames, mentions, quoting). * **Connection Configuration:** Join retry logic and rate limiting to prevent spam bans. * **Native Formatting:** Supports XMPP replies, quoting (`>`) and Out-of-Band (`OOB`) data for media. ## Prerequisites * **Python 3.9+** (only tested with Python 3.13) * **FFmpeg:** Required for audio conversion (TTS/Voice notes). * Ubuntu/Debian: `sudo apt install ffmpeg` * Arch Linux: `sudo pacman -S ffmpeg` * Fedora: `sudo dnf install ffmpeg` (I am not sure check for the package name yourself) * Windows: Download and add to system PATH. * macOS: `brew install ffmpeg` (not sure figure it out yourself) * Other OS: Figure it out yourself or just disable text to speech. * **System Libraries:** `libolm` or `libgcrypt` may be required for OMEMO support depending on your OS. ## Installation 1. **Clone the repository:** ```bash git clone https://github.com/yourusername/xmpp-llm-bot.git cd xmpp-llm-bot ``` 2. **Set up a Virtual Environment:** ```bash python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install Dependencies:** ```bash pip install -r requirements.txt ``` ## Configuration 1. **Copy the example configuration file:** ```bash cp config.example.ini config.ini ``` 2. **Edit config.ini with your credentials:** XMPP Section: JID, Password, Rooms. Bot Section: API Keys, Models, Triggers. ## Usage **Start the bot:** ```bash python3 bot.py ``` ## Interaction Guide - Mentions: Mention the bot's nickname (@BotName / BotName, / Botname:) or reply to its message to trigger a response. - Triggers: Start a message with the configured trigger (default !aibot) to force a response without mentioning. - Commands: !imagen - Generate an image using Flux. !tts - Speak the provided text. - Direct Messages: Send a DM to the bot for a private conversation (OMEMO supported).