Files
xmpp-jarvis/README.md
2025-12-29 19:08:13 +00:00

4.8 KiB

XMPP LLM Bot

A Python bot that bridges XMPP group chats and direct messages with Large Language Models (LLMs).

OMEMO support and some other features are based on Arne/xmpp-ai-bot with addition of experimental new features including multi-modal support, file hosting, and media generation.

Inspired by jjj333-p/gemini-xmpp for group chat features.

Features

LLM Capabilities

  • Multi-Provider Support: Native integration for Google Gemini (via google-genai) and OpenAI (via openai lib), plus generic support for OpenAI-compatible endpoints (LocalAI, Groq, Ollama). Also supports Groq Compound built-in tools.
  • System Prompts: Configurable global system instructions with per-room override capabilities.
  • Persistent Memory: Remembers conversations with configurable context windows. Supports saving memory to disk (json) to survive restarts.
  • Thinking Skipping: Option to skip <thinking> bots from reasoning models (like DeepSeek). It helps to keep rooms cleaner. Only necessary when you are using the custom endpoint (%99 of the time you should not).

Security & Privacy

  • OMEMO Encryption (XEP-0384): Support for E2EE in Direct Messages. (I do NOT guarantee it will work 100% of times)
  • AES-GCM Support: Can encrypt and decrypt aesgcm:// links (commonly used by clients like Conversations, Cheogram and monocles chat) to read or send encrypted media.
  • Access Control: Whitelist/Blacklist modes for Direct Messages and a "Privileged Users" list.

Multi-Modal (Vision & Audio) (only tested with Gemini)

  • Image Recognition: Users can send images (http/https or encrypted aesgcm) for the bot to analyze.
  • Audio Transcription: Users can send voice notes; the bot processes them using the LLM's audio capabilities.
  • URL Context: The bot can fetch URLs mentioned in chat, strip the HTML, and read the content to provide context-aware answers. (turn it off if you don't want people to post IP grabber links to bot)

Generative Tools

  • Image Generation (!imagen): Generates images using Cloudflare Workers AI Flux (Black Forest Labs) and automatically uploads them to a file host.
  • Text-to-Speech (!tts): Converts text to high-quality speech using Gemini TTS.
    • Supports automatic audio format conversion (PCM -> OGG/WAV) using pydub.
    • Auto-TTS Mode: Can optionally reply to every message with a voice note.

File Handling

  • Multi-Host Uploading: Automatically uploads generated media (Images/TTS) to public hosts or your XMPP server itself so XMPP clients can render them inline.
  • Supported Hosts: Catbox, Litterbox, 0x0.st, Imgur, ImgBB, Envs.sh, and Uguu.se. You can use your XMPP server too.

Core XMPP Features

  • MUC Support: specialized handling for Group Chats (nicknames, mentions, quoting).
  • Connection Configuration: Join retry logic and rate limiting to prevent spam bans.
  • Native Formatting: Supports XMPP quoting (>) and Out-of-Band (OOB) data for media.

Prerequisites

  • Python 3.9+ (only tested with Python 3.13)
  • FFmpeg: Required for audio conversion (TTS/Voice notes).
    • Ubuntu/Debian: sudo apt install ffmpeg
    • Arch Linux: sudo pacman -S ffmpeg
    • Fedora: sudo dnf install ffmpeg (I am not sure check for the package name yourself)
    • Windows: Download and add to system PATH.
    • macOS: brew install ffmpeg (not sure figure it out yourself)
    • Other OS: Figure it out yourself or just disable text to speech.
  • System Libraries: libolm or libgcrypt may be required for OMEMO support depending on your OS.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/xmpp-llm-bot.git
    cd xmpp-llm-bot
    
  2. Set up a Virtual Environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install Dependencies:

    pip install -r requirements.txt
    

Configuration

  1. Copy the example configuration file:

    cp config.example.ini config.ini
    
  2. Edit config.ini with your credentials:

    XMPP Section: JID, Password, Rooms.

    Bot Section: API Keys, Models, Triggers.

Usage

Start the bot:

python3 bot.py

Interaction Guide

  • Mentions: Mention the bot's nickname (@BotName / BotName, / Botname:) or reply to its message to trigger a response.

  • Triggers: Start a message with the configured trigger (default !aibot) to force a response without mentioning.

  • Commands:

    !imagen - Generate an image using Flux.

    !tts - Speak the provided text.

  • Direct Messages: Send a DM to the bot for a private conversation (OMEMO supported).