Member
- Messages
- 89
Experimenting with llama-cpp-python and 7b gguf models like llama2 and mistral. Smooth sailing, but hit a snag with history-dependent queries. The generator doesn't track history, so input grows fast, affecting processing time and response quality. I tried summarizing with another generator, but messy results. Any standard tips for chatbot history?