xmpp-jarvis/README.md

# XMPP LLM Bot

A Python bot that bridges XMPP group chats and direct messages with Large Language Models (LLMs).

## OMEMO support and some other features are based on [Arne/xmpp-ai-bot](https://codeberg.org/Arne) with addition of experimental new features including  multi-modal support, file hosting, and media generation.
## Inspired by [jjj333-p/gemini-xmpp](https://github.com/jjj333-p/gemini-xmpp) for group chat features.

## Features

### LLM Capabilities
*   **Multi-Provider Support:** Native integration for **Google Gemini** (via `google-genai`) and **OpenAI** (via `openai` lib), plus generic support for OpenAI-compatible endpoints (LocalAI, Groq, Ollama). Also supports Groq Compound built-in tools.
*   **System Prompts:** Configurable global system instructions with **per-room override** capabilities.
*   **Persistent Memory:** Remembers conversations with configurable context windows. Supports saving memory to disk (`json`) to survive restarts.
*   **Thinking Skipping:** Option to skip `<thinking>` bots from reasoning models (like DeepSeek). It helps to keep rooms cleaner. Only necessary when you are using the custom endpoint (%99 of the time you should not).

### Security & Privacy
*   **OMEMO Encryption (XEP-0384):** Support for E2EE in Direct Messages. (I do NOT guarantee it will work 100% of times)
*   **AES-GCM Support:** Can encrypt and decrypt `aesgcm://` links (commonly used by clients like *Conversations*, *Cheogram* and *monocles chat*) to read or send encrypted media.
*   **Access Control:** Whitelist/Blacklist modes for Direct Messages and a "Privileged Users" list.

### Multi-Modal (Vision & Audio) (only tested with Gemini)
*   **Image Recognition:** Users can send images (http/https or encrypted aesgcm) for the bot to analyze.
*   **Audio Transcription:** Users can send voice notes; the bot processes them using the LLM's audio capabilities.
*   **URL Context:** The bot can fetch URLs mentioned in chat, strip the HTML, and read the content to provide context-aware answers. (turn it off if you don't want people to post IP grabber links to bot)
*   **Video Processing:** Users can send videos and YouTube links; the bot processes them using the LLM's video capabilities.

### Generative Tools
*   **Image Generation (`!imagen`):** Generates images using **Cloudflare Workers AI Flux** (Black Forest Labs) and automatically uploads them to a file host.
*   **Text-to-Speech (`!tts`):** Converts text to high-quality speech using **Gemini TTS**.
    *   Supports automatic audio format conversion (PCM -> OGG/WAV) using `pydub`.
    *   **Auto-TTS Mode:** Can optionally reply to *every* message with a voice note.

### File Handling
*   **Multi-Host Uploading:** Automatically uploads generated media (Images/TTS) to public hosts or your XMPP server itself so XMPP clients can render them inline.
*   **Supported Hosts:** Catbox, Litterbox, 0x0.st, Imgur, ImgBB, Envs.sh, and Uguu.se. You can use your XMPP server too.

### Core XMPP Features
*   **MUC Support:** specialized handling for Group Chats (nicknames, mentions, quoting).
*   **Connection Configuration:** Join retry logic and rate limiting to prevent spam bans.
*   **Native Formatting:** Supports XMPP replies, quoting (`>`) and Out-of-Band (`OOB`) data for media.

## Prerequisites

*   **Python 3.9+** (only tested with Python 3.13)
*   **FFmpeg:** Required for audio conversion (TTS/Voice notes).
    *   Ubuntu/Debian: `sudo apt install ffmpeg`
    *   Arch Linux: `sudo pacman -S ffmpeg`
    *   Fedora: `sudo dnf install ffmpeg` (I am not sure check for the package name yourself)
    *   Windows: Download and add to system PATH.
    *   macOS: `brew install ffmpeg` (not sure figure it out yourself)
    *   Other OS: Figure it out yourself or just disable text to speech.
*   **System Libraries:** `libolm` or `libgcrypt` may be required for OMEMO support depending on your OS.

## Installation

1. **Clone the repository:**
   ```bash
   git clone https://github.com/yourusername/xmpp-llm-bot.git
   cd xmpp-llm-bot
   ```

2. **Set up a Virtual Environment:**
   ```bash
   python3 -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Install Dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

## Configuration

1. **Copy the example configuration file:**
   ```bash
   cp config.example.ini config.ini
   ```

2. **Edit config.ini with your credentials:**

    XMPP Section: JID, Password, Rooms.

    Bot Section: API Keys, Models, Triggers.

## Usage

**Start the bot:**

   ```bash
   python3 bot.py
   ```

## Interaction Guide

- Mentions: Mention the bot's nickname (@BotName / BotName, / Botname:) or reply to its message to trigger a response.

- Triggers: Start a message with the configured trigger (default !aibot) to force a response without mentioning.

- Commands:

     !imagen <prompt> - Generate an image using Flux.

     !tts <text> - Speak the provided text.

- Direct Messages: Send a DM to the bot for a private conversation (OMEMO supported).