Running Deepseek locally in vscode as a private copilot

Running Deepseek locally in vscode as a private copilot

Table of Contents

Introduction

I have been experimenting with LLMs within vscode over the last year. I have had Github Copilot installed, which is able to deal with autocomplete and has more recently offered chat capabilities. More recently, I have been researching how I could connect ChatGPT with vscode as I have access to a pro subscription. I’ve also been questioning whether ChatGPT is the best product for me to be honest!

Whilst researching options, I discovered CodeGPT, which is a vscode extension that allows you to select from a large number of models across many AI frameworks and interestingly, opens up the ability to use local LLMs via Ollama. It is so easy to get up and running! Here’s how to do it.

Install Ollama

First I need to install Ollama which allows me to run various models locally. Installing with homebrew is really easy:

brew install ollama

Ollama either needs to be started, or set as a background service that starts on login:

# set as background service
brew services start ollama
# run once
ollama serve

I’m going to run it as a background service. If you’re not ready to commit, I suggest running it once. I can check that it is installed by running the following:

ollama --version
# if running, should get the message `ollama version is x.x.x`

Install CodeGPT

Next, install the CodeGPT extension in vscode. Open the extension tab in vscode and search for CodeGPT.

A shot of vscode extension marketplace with CodeGPT listed

With CodeGPT installed, you will need to register. I signed in with Github for simplicity. Once signed in, I can set up CodeGPT to use a deepseek model. I can do this by clicking on the dropdown and selecting view more.

CodeGPT console is open in vscode, the chat function is open and shows options for selecting a model

Selecting Local LLMs in the header, then choosing Ollama as my local provider, I can then choose one of the available deepseek models. Selecting deepseek-r1:15b I can the press download to install the model.

Once downloaded, I can interact with the chat function and ask questions about my codebase. For example, I can highlight snippets of code and run the /explain command I will get a detailed explanation of what that function is doing.

Setting up autocomplete

With chat installed, I now want to set up the autocomplete feature. This will interact with code that I type offering code completion as I type. There are two deepseek models listed:

  • deepseek-coder:base
  • deepseek-coder-v2

I will start with the base model, I can install it through Ollama:

ollama run deepseek-coder:base

I can then select it in the auto-completion section. Once enabled, autocomplete will kick in when I start to code.

Regional settings

The only issues I have had so far is with regional settings in both the chat function and the autocomplete function.

The first issue was with the chat function - the first response was returned in (I assume) Chinese. This was solved by simply typing can you reply in English in the chat. The second issue is with the autocomplete function which responds in US English. I don’t seem to be able to find a way to solve that, let me know if you work it out!

Performance

The other factor here is that I am running an LLM on my local machine!! It isn’t all that responsive on my M1 Macbook Air, but on a beefier machine I’m sure it will perform much better.

There is a subtle delay in autocompletion - I need to pause for it to catch up, which in some respects isn’t any different to the latency on handing off to github copilot. What is more noticeable though is the lag it causes to other things I’m running (such as when I’m flipping between vscode and Edge building this page and observing it). The laptop is noticeably running warmer than it would normally too!

This laptop isn’t intended for this sort of workload though and in my opinion it does perform at an acceptable level.

Wrapping Up

This was super simple to set up! And now I have a private LLM model running locally on my machine. This isn’t the most responsive on my M1 Macbook Air, but I will test it out on my main machine and update this post with my findings.

References

#mtfbwy



Recent Posts