XMPP LLM Bot
A Python bot that bridges XMPP group chats and direct messages with Large Language Models (LLMs).
OMEMO support and some other features are based on Arne/xmpp-ai-bot with addition of experimental new features including multi-modal support, file hosting, and media generation.
Inspired by jjj333-p/gemini-xmpp for group chat features.
TODO
- Context filling using past chats: This feature will allow bot to read the past messages (whatever related to bot or not) to allow better context. It will be optional to preserve privacy.
Features
LLM Capabilities
- Multi-Provider Support: Native integration for Google Gemini (via
google-genai) and OpenAI (viaopenailib), plus generic support for OpenAI-compatible endpoints (LocalAI, Groq, Ollama). Also supports Groq Compound built-in tools. - System Prompts: Configurable global system instructions with per-room override capabilities.
- Persistent Memory: Remembers conversations with configurable context windows. Supports saving memory to disk (
json) to survive restarts. - Thinking Skipping: Option to skip
<thinking>bots from reasoning models (like DeepSeek). It helps to keep rooms cleaner. Only necessary when you are using the custom endpoint (%99 of the time you should not).
Security & Privacy
- OMEMO Encryption (XEP-0384): Support for E2EE in Direct Messages. (I do NOT guarantee it will work 100% of times)
- AES-GCM Support: Can encrypt and decrypt
aesgcm://links (commonly used by clients like Conversations, Cheogram and monocles chat) to read or send encrypted media. - Access Control: Whitelist/Blacklist modes for Direct Messages and a "Privileged Users" list.
Multi-Modal (Vision & Audio) (only tested with Gemini)
- Image Recognition: Users can send images (http/https or encrypted aesgcm) for the bot to analyze.
- Audio Transcription: Users can send voice notes; the bot processes them using the LLM's audio capabilities.
- URL Context: The bot can fetch URLs mentioned in chat, strip the HTML, and read the content to provide context-aware answers. (turn it off if you don't want people to post IP grabber links to bot)
- Video Processing: Users can send videos and YouTube links; the bot processes them using the LLM's video capabilities.
Generative Tools
- Image Generation (
!imagen): Generates images using Cloudflare Workers AI Flux (Black Forest Labs) and automatically uploads them to a file host. - Text-to-Speech (
!tts): Converts text to high-quality speech using Gemini TTS.- Supports automatic audio format conversion (PCM -> OGG/WAV) using
pydub. - Auto-TTS Mode: Can optionally reply to every message with a voice note.
- Supports automatic audio format conversion (PCM -> OGG/WAV) using
File Handling
- Multi-Host Uploading: Automatically uploads generated media (Images/TTS) to public hosts or your XMPP server itself so XMPP clients can render them inline.
- Supported Hosts: Catbox, Litterbox, 0x0.st, Imgur, ImgBB, Envs.sh, and Uguu.se. You can use your XMPP server too.
Core XMPP Features
- MUC Support: specialized handling for Group Chats (nicknames, mentions, quoting).
- Connection Configuration: Join retry logic and rate limiting to prevent spam bans.
- Native Formatting: Supports XMPP replies, quoting (
>) and Out-of-Band (OOB) data for media.
Prerequisites
- Python 3.9+ (only tested with Python 3.13)
- FFmpeg: Required for audio conversion (TTS/Voice notes).
- Ubuntu/Debian:
sudo apt install ffmpeg - Arch Linux:
sudo pacman -S ffmpeg - Fedora:
sudo dnf install ffmpeg(I am not sure check for the package name yourself) - Windows: Download and add to system PATH.
- macOS:
brew install ffmpeg(not sure figure it out yourself) - Other OS: Figure it out yourself or just disable text to speech.
- Ubuntu/Debian:
- System Libraries:
libolmorlibgcryptmay be required for OMEMO support depending on your OS.
Installation
-
Clone the repository:
git clone https://github.com/yourusername/xmpp-llm-bot.git cd xmpp-llm-bot -
Set up a Virtual Environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install Dependencies:
pip install -r requirements.txt
Configuration
-
Copy the example configuration file:
cp config.example.ini config.ini -
Edit config.ini with your credentials:
XMPP Section: JID, Password, Rooms.
Bot Section: API Keys, Models, Triggers.
Usage
Start the bot:
python3 bot.py
Interaction Guide
-
Mentions: Mention the bot's nickname (@BotName / BotName, / Botname:) or reply to its message to trigger a response.
-
Triggers: Start a message with the configured trigger (default !aibot) to force a response without mentioning.
-
Commands:
!imagen - Generate an image using Flux.
!tts - Speak the provided text.
-
Direct Messages: Send a DM to the bot for a private conversation (OMEMO supported).
Languages
Python
100%