Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 15 days ago

Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · 16 days ago

What context length? Neither of them likes to go over 32K.

And what kind of jibberish? If they are repeating, you need to change sampling settings. Incoherence… Also probably sampling settings, lol.

projectmoon@lemm.ee · 16 days ago

Context was set to anywhere between 8k and 16k. It was responding in English properly, and then about halfway to 3/4s of the way through a response, it would start outputting tokens in either a foreign language (Russian/Chinese in the case of Qwen 2.5) or things that don’t make sense (random code snippets, improperly formatted text). Sometimes the text was repeating as well. But I thought that might have been a template problem, because it seemed to be answering the question twice.

Otherwise, all settings are the defaults.

brucethemoose@lemmy.world · 15 days ago

Hmm, what’s the frontend?

And the defaults can sometimes be really bad lol. Qwen absolutely outputs chinese for me with a high temperature.

projectmoon@lemm.ee · 15 days ago

OpenWebUI connected tabbyUI’s OpenAI endpoint. I will try reducing temperature and seeing if that makes it more accurate.