[RAG] Implement chat memory for RAG chatbot

DESCRIPTION

We want to add a "chat memory" feature to the RAG process.

The feature can be enabled or disabled from the rag.properties configuration.
When enabled, the last N messages exchanged between the user and the assistant (LLM) are provided to the LLM in order to generate the next response. (N must be configurable in rag.properties).
The chat history (list of messages) should not be stored (yet) in the backend, but only client-side. The UI adds the chat history to the POST request (/ai/rag endpoit)

Example of request:

POST https://127.0.0.1/Datafari/rest/v2.0/ai/rag

{
    "query": "What is my dog's name ?",
    "lang": "fr",
    "history": [
        {
            "role":"user",
            "content": "I just adopted a black labrador. I called her Jumpy."
        },
        {
            "role":"assistant",
            "content": "How nice ! I am sure she will be happy with you."
        },
        {
            "role":"user",
            "content": "What is the capital of France?"
        },
        {
            "role":"assistant",
            "content": "La capitale de la France est Paris, d'après le document `Capitale de la France`."
        }
    ]
}

Parameters:

query: The user query.
lang: The current Datafari language
history: A list of user/assistant messages, sorted by datetime (ASC, the oldest messages come first)

Response (based on chat history and not retrieved sources):

{
    "content": {
        "documents": [],
        "message": "Le nom de votre chien est Jumpy."
    },
    "status": "OK"
}

This example simulates the following fake conversation:

User (fake previous RAG query): I just adopted a black labrador. I called her Jumpy.

Assistant (fake previous generated response): How nice ! I am sure she will be happy with you.

User (actual RAG query): What is my dog's name ?

Assistant (actual generated response): Le nom de votre chien est Jumpy.

The history contains all the user's previous requests, and all the associated generated responses.

Documentation here:

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3136552962/Datafari+RagAPI+-+RAG+-+ALPHA+VERSION

Enhance chat memory compatibility

The first version of the feature uses multi-messages technologies, and seems therefore incompatible with some models such as Mistral, when those are ran using the Datafari AI Agent. We want to find a way to provide a chat-memory solution with Mistral models, either by using a mono-message solution or by finding a way to use multi-message with Mistral.

VERSION CONCERNED

6.2

CHECKLIST BEFORE CLOSING TICKET

Documentation
- I have created the functional documentation in the wiki
- I have created the technical documentation in the wiki
- I have added javadoc comments on key functions in my code
Security
- I have cleaned up any input coming from users
- I have not put any token APIs, passwords or the like in my code
- I am not using 3rd party libraries that are deprecated or not maintained

Edited Apr 16, 2025 by Emeric Bernet-Rollande