nvim
and Obsidian. One of the big hurdles that I had to get over was actually not messing around with docker-compose
. Normally, this would be the way I would go because it's the self-hosting way, but it introduced complexity that was really not needed to have to deal with ports, uptime, availability, etc. Instead, I finally got over myself and just installed it with the incredibly well packaged command given by the developers:
curl -fsSL https://ollama.com/install.sh | sh
...this worked instantly on Manjaro, even creating a user group and installing it as a systemctl
service which meant that I don't have to deal with ports, ollama
will always be running as a background service
If you were ever worried about taking up GPU memory like I was,
`ollama` actually clears unused models from memory after five minutes. It'll also handle loading
and unloading weights when switching between models.
, and every "frontend" I use can just map to the same server and model set. Clearly this was the design intention, but I liked to keep my system/service relatively clean, and I tend to overthink things.
I was then able to just run ollama pull <model_name>
.
I've been using qwen2.5-coder:7b-instruct-q5_K_S
as a coding assistant. I've found it to be somewhat adequate, albeit a little verbose. It is actually significantly better than the Llama 3.1 instruct models I've used previously for coding, especially in terms of the quality of docstrings produced by the models.
In nvim
, codecompanion.nvim
has been a straightforward install and configure to use the background ollama
service, needing just a little bit of tinkering to use ollama
instead of the default ChatGPT remote service. Essentially, in addition to adding it to lazy.nvim
as a dependency, I added a ~/.config/nvim/lua/config/companion.lua
file that then configures the "strategy" and "adapters":
require("codecompanion").setup(
{
strategies = {
chat = {
adapter = "ollama"
},
inline = {
adapter = "ollama"
}
},
adapters = {
ollama = function()
return require("codecompanion.adapters").extend("ollama", {
name = "qwen2.5-coder",
schema = {
model = {
default = "qwen2.5-coder:7b-instruct-q5_K_S"
},
num_ctx = {
default = 64000
}
},
})
end,
},
})
This uses the same ollama
model for both inline and chat completion tasks. As mentioned already, I've used this for docstring generation (The <C-a>
keymap opens a chat window that I can just paste a function's worth to generate at a time), and because the model is so lightweight, generation is more or less instantaneous. Sometimes I do need to remind it to generate NumPy style docstrings, as it generates something close but not exactly correct in the argument type descriptions.
For notetaking, doing retrieval augmented generation (RAG) has been significantly more straightforward for information retrieval in my (limited) experience. I use nomic-embed-text
, again provided by ollama
, to index notes, and use llama3.2:3b
for chat generation. The small model is incredibly quick on my dGPU, and will generate responses much faster than I can read them. The relevance and usefulness of the responses can vary, but I've been pretty impressed so far when I ask questions about specific things I know exist in my notes so far. This functionality is provided by obsidian-Smart2Brain
, which lets you configure ollama
as the LLM backend for embeddings and for generation.
I'll follow up a bit more as I get the hang of the workflows, but I'm pretty keen to add more slash commands and utilities to nvim
for my coding productivity. As far as notetaking goes, the plugin I'm using now doesn't seem to provide an avenue for workflows (i.e. agents) just yet, but I'm thinking of a few areas beyond autotagging and RAG; how do I improve my learning with LLM assistants?