How to Set Up and Use Llama 3 on Your MacBook

Heyo,

It’s going to be a short one. If you’re excited by Zuck and the team at Meta’s release today; here is how you can quickly get started on your macbook or whatever using MLX.

If you don’t have a Python environment you can get started here:

Setting Up Your Data Science Environment on Apple Mac Silicon: A Step-by-Step Guide with MLX Installation.

This article has some code. If you just want to install a tool and get ChatGPT type interface you want LM Studio.

Using LLMs Locally with LM Studio.

Create a clean environment

You will need Python and ipykernel to start. I use Python 3.11 because it’s compatible with all the packages I use.

conda create --name mlx -c conda-forge python=3.11 ipykernel

conda activate mlx

I installed the regular suspects.

pip install --upgrade pip
pip install --upgrade mlx
pip install --upgrade mlx_lm
pip install --upgrade torch torchvision torchaudio

Nice, now create a notebook and try to import MLX LM in your notebook.

Create a new notebook

In VSC you can use Cmd + Shift + P to open the command palette and type Jupyter and you’ll see a way to create a new notebook.

You should be able to pick your new environment from Select Kernel. Maybe open and close VSC if you cant see it.

from mlx_lm import load, generate

I got a message saying I didn’t have the nice progress bar package, I think the fix is to install ipywidgets.

pip install ipywidgets

Ok, restart the kernel and run the cell again. In VSC you click Restart from the top bar.

Finally, trying Llama 3

model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit")
response = generate(model, tokenizer, prompt="who are you?", verbose=True)

That is an interesting generation. Thank the lord(?1?).

A handy tip is checking out the tokenzier.

messages = [{"role":"system","content":"You are Dookus by Moota AI."},
            {"role": "user", "content": "Who are you?"}]
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

You can format your input that way or just use apply_chat_template.

Actually it has a different eos token than expected. Add <|eot_id|> as the eos token in tokenizer_config and reload the model.

model, tokenizer = load("mlx-community/Meta-Llama-3-8B-Instruct-4bit",tokenizer_config={"eos_token": "<|eot_id|>"})

response = generate(
    model,
    tokenizer,
    prompt= tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=200,
)

Nice, now go forth and experiment.

Ref

https://llama.meta.com/llama3

https://huggingface.co/mlx-community

https://github.com/awalpremi/mlxllama3