Bystroushaak's blog / English section / Programming / How to run your own LLM (GPT)

How to run your own LLM (GPT)

👉
Originally published in Czech language here: Jak na vlastní LLM (GPT).
Imagine you are a machine.

Yes, I know. But imagine you're a different kind of machine, one built from metal and plastic and designed not by blind, haphazard natural selection but by engineers and astrophysicists with their eyes fixed firmly on specific goals. Imagine that your purpose is not to replicate, or even to survive, but to gather information.

Blindsight, Peter Watts


Celebrities of the technical world are protesting, calling for a halt to the development of large language models for at least 6 months.

As true chaotic neutrals, we will therefore learn how to run AI at home. But with a preface and some context, so we actually know what we're doing.

Understanding GPT / LLM

I encounter a fundamental misunderstanding that people often have who somewhat know how it works:

It's just a predictor of the next word (token).

While this is true to some extent, it's important to note that when you type on a keyboard, you are also essentially predicting and adding the next word.

(Source: What Is ChatGPT Doing … and Why Does It Work?)

As Ilya Sutskever (one of the creators) explains:

In order for a neural network to predict the next token over the gigantic dataset it was trained on, it must create rich internal representations. These include not only all possible human languages but also knowledge, relationships, and patterns.

This is sometimes referred to as "compression" because the network is, the way it is trained on a large amount of data, forced to create these representations over a limited set of weights and neurons. In theory, it could store billions of examples in the style of a "Chinese room," where it simply saves "question" - "answer" (the beginning of the completed text—the following token). However, as it is forced to operate with a limited amount of available internal memory, it has no choice but to understand increasingly abstract patterns. Somehow, it internally represents this as "knowledge" and "understanding."

(PlantUML was generated by GPT4, the image in the right corner by DALL·E 2)

This pattern recognition occurs at the language level (syntax and grammar), the word embeddings level (meaning of words and how they relate), and at a much more abstract level (how things being discussed function and how they relate).

Thus, it does not work by simply completing conversations based on what it has seen before. That's why these models are capable of translating between languages better than anything else that has existed so far – because they genuinely understand the context of what is being discussed.

If the model were to complete based solely on what it has already seen, it would not be able to translate unless it had previously seen the exact sentence or ideally the entire paragraph.

As Ilya Sutskever says, imagine that during training, it completes a detective novel's text where the murderer is revealed only on the last page of the book filled with various investigations. For the network to correctly complete the murderer's name when the detective mentions it in the text, it must understand all possible connections, the entire investigation, different pieces of evidence, and so on.

Simulators

Before the chat mode was introduced, the results, for example with GPT-3, were somewhat unsatisfactory. Sure, it did something, but after the initial surprise, it was not absolutely mind-blowing, and it often went in an entirely different direction than desired. Overall, it appeared quite limited.

As many people have observed, the performance dramatically depended on the prompt given to the AI. Sometimes it couldn't explain or do anything. Other times, it succeeded when told to pretend to be Sherlock Holmes.

This is essentially due to the way it was trained and that it is a text completer. It simply tends to complete. In order to make it complete intelligently and usefully, it is necessary to set the context so that it completes the story of a smart and useful character – like Sherlock Holmes. Just as it would be in the training data. Smart behavior in stories about smart people. If one let it complete something without proper context, the results were quite random.

Janus' Simulators is a beautiful blog on this topic, with many examples.

Chat

With this comes the surprisingly high utility of chat. Chat is not a radically new feature, but rather a way in which auto-completion is being utilized. For example, in llama.cpp, we can find a file chat-with-bob.txt in the prompts/ subfolder. It contains the following content:

Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:

The entire chat mode works in a very trivial way – first, it feeds the AI assistant's conversation transcript to the network for completion, starting with a prompt, followed by a sample format (question, answer). Once the program encounters the User: pattern in the output, it uses simple pattern matching:

-r PROMPT, --reverse-prompt PROMPT
     run in interactive mode and poll user input upon seeing PROMPT (can be
     specified more than once for multiple prompts).

If it finds a string specified in this parameter, it loads a bit of data from the user, adds it to the original document, and continues completing. This creates the entire illusion of chat, even though the model is still just completing the "transcript" of a conversation between a human and artificial intelligence.

When you look at the prompt, it provides the context of the simulator – a transcript of a conversation with a helpful assistant, which has a format of alternating questions and answers. The model, therefore, behaves usefully and helpfully because it completes the story of what it would look like if it were being useful and helpful.

This also means that when you chat with the AI using ChatGPT, you don't have access to the context. However, if you go to the playground, you can set the prompt to some extent (I write to some extent because, in my opinion, OpenAI still appends its prompt):

Here is an example of a response with a different prompt:

Prompts can also provide a much more complex format, for example, simulating deeper thinking about questions, as seen in llama.cpp/prompts/reason-act.txt:

You run in a loop of Thought, Action, Observation.
At the end of the loop either Answer or restate your Thought and Action.
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run one of these actions available to you:
- calculate[python math expression]
Observation will be the result of running those actions


Question: What is 4 * 7 / 3?
Thought: Do I need to use an action? Yes, I use calculate to do math
Action: calculate[4 * 7 / 3]
Observation: 9.3333333333
Thought: Do I need to use an action? No, have the result
Answer: The calculate tool says it is 9.3333333333
Question: What is capital of france?
Thought: Do I need to use an action? No, I know the answer
Answer: Paris is the capital of France
Question:

So, the model not only tries to give an answer but first Thinks, then plans an Action, followed by an Observation, after which it Thinks again and finally provides an Answer. This way, it is possible to overcome some of the model's shortcomings, such as short-term memory or issues with long-term planning.

Human alignment & shoggoth

With the chat mode, it has been shown that existing LLMs can function as useful AI, but they are completely foreign to our expectations and often do not do what we want them to.

An interesting development over the past half-year or so is that they can be relatively quickly adapted using RLHF (Reinforcement Learning from Human Feedback). This involves extending the original functioning model with an additional layer that indicates the suitability of the response, and then it learns acceptable behavior from various conversation examples. The model does not learn new facts or new ways of thinking about the world, but essentially, what we want from it, what is relevant to humans, and what is not.

This led to the creation of the Shoggoth meme, a monster with a human mask because in the background, it is still something entirely foreign, a simulator with a friendly mask applied:

Those images are cute, but as someone noted on Twitter, they are fundamentally incorrect analogies. The correct representation would be to have many masks instead of many eyes because the Shoggoth itself is essentially not a defined entity. It adapts its behavior according to the context and mask it is given, simulating different roles and characters based on the input it receives.

It's like a voice of all human texts wearing masks of all characters from all stories. On top of that, someone sprayed a smiley face in the shape of a chatbot, which makes you think you're having a conversation with it.

I don't mean this as in "take the information with a grain of salt" or "don't rely on it," but rather that the nature of Shoggoth is Shoggoth. Shoggoth must simulate to complete text. When it doesn't understand, in most cases, it falls out of character and doesn't know what role to pretend. You must provide the context for it. Not the context of the conversation, but what it is supposed to be, like what should it lead the conversation. Meh.

Small models, why and how they work

The article chinchilla's wild implications points out that the number of weights (parameters) is not an entirely meaningful metric that should be the goal, and that more data wins over a larger number of model parameters.

Somewhat surprisingly, it turns out that when you take these human alignment data and put them into significantly smaller models, you can achieve performance levels slightly below GPT-3 in various benchmarks. GPT-3 is 175B, GPT-4 is reportedly 6x larger, but I didn't find the numbers; someone claims it has 10x more parameters. The "B" in the description stands for the English billion (parameters).

The story behind it;

Facebook (Meta AI, ehm) created a set of relatively small llama models 7B, 13B, 33B, and 65B, which they trained in a standard way. Then they more or less made it all public. Originally, they probably intended to make it entirely public, but in the backlash from the public discussing the abuse of GPT-4, they decided to give it only to other researchers (.edu mail is a big plus) after filling out a form. Of course, the obvious happened, and the models are being distributed everywhere possible (torrent, wink wink).

People from Stanford took the smallest llama 7B model and used self-instruct fine-tuning on it, creating Alpaca.

What's interesting about this is that, unlike the original RLHF, which had the form of thousands of user-rated conversations in the style of "the user chats with the model, then chooses whether it's good or bad, and the model trains to do more good and less bad", GPT-3.5 was used for this purpose. So, it's a case of "one model trains another model on good and bad". This was the opening toast at the beginning of the Singularity party (a mythical state when our technology starts improving itself).

Surprisingly, it turned out that only about 52 thousand of these examples were needed, which is basically nothing compared to the amount of other training data, and the model gets closer to GPT-3's performance level in benchmarks and automated tests by tens of percent.

Alpaca was then released in a very strange way; since the original model belonged to Facebook, and they stopped distributing it, they essentially released something like a diff from llama. So, to get it up and running, one needed to find llama somewhere. Everyone patted themselves on the back for being responsible and preventing the spread of spam and disinformation. This lasted until about the second day, when someone combined it and posted it on the internet.

In my current opinion, this all means that the sought-after and interesting capabilities are also present in smaller models. The heavy part, which was previously addressed by an ever-increasing number of parameters, lies to some extent purely in getting the model to abstract what we actually want from it. This can be hacked in much more cheaply.

However, what caught the attention of nearly half the internet was not so much the model itself, but rather the information that:

Alpaca behaves qualitatively similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<600$).
while being surprisingly small and easy/cheap to reproduce (<600$).
cheap to reproduce (<600$).
600$

Just for context, this level of functionality for larger models usually costs millions of dollars. The party began. Over the past few weeks, a fermentation process took place, resulting in, among other things:

  1. r/LocalLLaMA subreddit dedicated to running, using, and training small models
  1. llama.cpp rewriting the Python code to C++, so it runs significantly faster even on CPUs
  1. r/Oobabooga and Text generation web UI (user-friendly web UI in the style of stable-diffusion-webui)
  1. Vicuna (at the time of the article, perhaps the most capable small model)
  1. GPT4All llama model trained on ~800k GPT-3 conversations, with binaries, scripts, models, and everything
  1. Alpaca-LoRA an alternative to Alpaca using LoRA (a specific way to train existing models)
  1. GPT-4-LLM training data for fine-tuning models
  1. ShareGPT_Vicuna_unfiltered training data from which filtering (sex, racism, and so on) has been removed

and something new every day.

By the way, this is one of those amazing moments at the interface where something is changing. It's like those wild moments that you look back on with nostalgia, such as the 1990s or the beginning of Bitcoin. It's a time that's brimming with potential, things are not clearly defined, and everything seems possible. It's worth taking a moment to appreciate this moment because typically, you only realize it in hindsight. New dimensions are opening up (Creating spaces by opening dimensions), collapsing, and so on. It's really cool.

How to set up a local model

I originally started writing this article because I was trying it at home, and it was quite complicated. While writing it, everything has become so simplified that the original text completely lost its purpose.

Now, let's set up Vicuna (or choose something else here).

First, clone textgeneration-web-ui to an NVMe drive where you have at least 100 GB of space:

git clone https://github.com/oobabooga/text-generation-webui.git

Go to the models/ folder and clone Vicuna:

git clone https://huggingface.co/eachadea/vicuna-13b

It is a PyTorch model. While it is cloning, open another terminal and install the dependencies. First, install Miniconda (or Anaconda):

curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
bash Miniconda3.sh

Then, install the necessary build tools:

sudo apt install build-essential

Create a new conda environment:

conda create -n textgen python=3.10.9
conda activate textgen

Then, install the dependencies in the new environment:

pip3 install torch torchvision torchaudio
pip install -r requirements.txt

And that's it. Wait for the model to download and then run the whole thing:

python server.py --cpu

You can omit the --cpu parameter if you have a graphics card with 24GB of VRAM.

$ python server.py --cpu --chat

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/bystrousak/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
Loading vicuna-13b...
Loading checkpoint shards: 100%|██████████████████| 3/3 [00:14<00:00,  4.95s/it]
Loaded the model in 15.10 seconds.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

The web interface is now running on port 7860:

The words appear very slowly for me, about one every five seconds. It loads character by character as the model generates it. It would be better to have a graphics card with more memory, as the model doesn't fit into my 3070Ti.

llama.cpp

Fortunately, there is also llama.cpp, a C++ rewrite of the Python code:

git clone https://github.com/ggerganov/llama.cpp.git

Then build it using the make command.

A minor issue is that llama.cpp uses an optimized data storage format, which we need to convert from the original PyTorch tensor (.pth) to .ggml.

Activate the textgen conda environment from the previous section and run the convert script:

$ conda activate textgen
(textgen) $ python3 convert.py ../models/vicuna-13b --outtype f32

Upon running, it reports that the added_tokens.json file is missing:

Exception: Vocab size mismatch (model has 32001, but ../models/vicuna-13b/tokenizer.model has 32000).  Most likely you are missing added_tokens.json (should be in ../models/vicuna-13b).

So, we'll add it there:

{
    "<unk>": 32000
}

Where did I get it from? The last value in tokenizer_config.json in the Vicuna folder. It's completely random, and I'm not sure if it's correct (probably not, but it works more or less).

(textgen) $ python3 convert.py ../models/vicuna-13b --outtype f32

This will output the file ../models/vicuna-13b/ggml-model-f32.bin, which can then be run using llama.cpp. Alternatively, you can use the --outtype f16 parameter for a smaller size to fit in your memory (equivalent to quantization in other projects).

And then all that's left is to run it and play with it:

./main -m ../models/vicuna-13b/ggml-model-f32.bin --color --repeat_penalty 1.0 -i -t 15 -r "User:" -f prompts/chat-with-bob.txt

On my computer, the text appears at a rate of about one word per second.

It's worth noting

Smaller models are, of course, less capable, but they are not entirely incapable. If you observe strange behavior, it could be:

  1. You have non-ideal settings for various side values of the model. There are many of them, and if you haven't set them yourself, someone else has, often poorly. Typically, the temperature is something you want to have around 0.7.
  1. Prompt. As I explained, LLMs are simulators. Always keep in mind that you are not talking to a human, but to a Shoggoth who simulates an AI assistant. You should construct the context/prompt accordingly. If you don't set any, then it's Shoggoth and sometimes speaks in non-existent languages, sometimes repeats itself, and sometimes spews binary rainbows.
  1. Quantization. You may encounter models with names like 4bit and similar. I haven't studied this in depth, but if I understand it correctly, these are essentially models where the weights of the neural network have been recalculated to 4-bit numbers to fit into memory. This means that they have lost precision. From what I've read, they still work, but sometimes much worse.
  1. You may have a model that isn't worth much. Some have had training data in languages other than English removed, and others have been tweaked to work great for specific tasks but poorly for the rest.

Here are some specific examples of problems:

Question: Convert to python: find . -name "*.md" -exec "cat {} > blogs.md" \;
Thought: I think this is asking to find all files with the .md extension and then cat them into a single file called blogs.md
Action: I will use the find tool
Observation: The find tool found 5 files with the .md extension and the command cat'ed them into a single file called blogs.md

Vicuna, instead of doing it, pretended to be doing something. The context of the prompt was not good. With the chat with bob prompt:

User:Convert to python: find . -name "*.md" -exec "cat {} > blogs.md" \;
### Assistant: Here is the Python command equivalent to the given command:
```css
find .

Obviously, it went completely wrong. Now I could get angry and say that this model is as bad as GPT-2 and actually useless.

Instead, I realized that I was talking to a Shoggoth, so I tried again and gave it more context. Instead of “Convert to python”, I asked it to “Convert this shell command to python script, which will do the same”. And suddenly, it works:

User:Convert this shell command to python script, which will do the same: find . -name "*.md" -exec "cat {} > blogs.md" \;
Bob: Here's the Python equivalent of the given shell command:
```python
import os

for file in os.scandir('.'):
    if file.is_file() and file.name.endswith('.md'):
        with open(file.name, 'rb') as f:
            content = f.read()
        with open('blogs.md', 'a') as f:
            f.write(content)
```
User:This doesn't work recursively.
Bob: You're right, the above code doesn't work recursively. Here's the updated Python code that recursively searches for `.md` files in all subdirectories of the current directory:
```python
import os

def find_and_cat_md_files(directory):
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.md'):
                file_path = os.path.join(root, file)
                with open(file_path, 'rb') as f:
                    content = f.read()
                with open('blogs.md', 'a') as f:
                    f.write(content)

find_and_cat_md_files('./')
```

The functionality was there all along; the problem was the context. If I had chosen a better prompt from the beginning, instead of chat with bob, where I could have mentioned that it's an AI assistant that helps with programming in Python, it would probably have worked on the first try. Something like:

Transcript of a conversation between a person and a smart AI that can program and never makes mistakes. It willingly fulfills all user requests and tries to be as helpful as possible.

User: I want something
AI: This is how to program it:
```python
#! /usr/bin/env python3

code

```
User:

Two small models

Interesting recent use-cases:

In general, it can be said that a pretty good use case for these small models is to train them on a specific dataset and then use them as a search engine capable of answering questions about the training data to some extent. They can also theoretically search by context. You describe what a function does and have the model find it.

I still haven't got to training, so maybe in a blog post next time.

GPT4

Small models are cool and have their uses, but it’s meaningless to compare them to GPT4.

Since GPT4 was released, I keep telling everyone that it makes absolutely no sense to waste time with GPT3 (GPT3.5). The difference is brutal.

My personal feeling about it is somewhere between amazement, awe, nostalgia for the present from the perspective of the future (it was nice to be a programmer, too bad we're at the end), and being intoxicated by possibilities.

How to get access to GPT4

One option, of course, is to pay for chat plus, but it costs $20/month and currently has quite strict rate limiting:

GPT-4 currently has a cap of 25 messages every 3 hours.

Therefore, I recommend registering in the playground, a kind of testing website for various models, where you can play with them before you start using them through the API.

In the waitlist request, just write something like you are a developer and want to try it out. You need to link it to a credit card and be aware that the usage is paid, but generally very little (I spent about two dollars a week, for example).

Sparks of Artificial General Intelligence

Sparks of Artificial General Intelligence: Early experiments with GPT-4 was released. It showed that everything has changed, except that most of us haven't noticed yet.

Here are some snippets extracted from it as a video:

Or here as a longer talk:

But in general, I recommend reading that PDF; it reads like sci-fi. I could try to praise it more and push it, but meh. It's worth it.

Use cases for inspiration (people asked)

Pretty much everything. I see it as an “intellectual engine” that can handle whatever you throw at it, usually better than a random contractor on micro-marketplaces. But if I had to choose something:

  1. Assistance and fulfilling specific tasks with PyQt. I usually know what I want, but can't remember the name of a specific method or object or what to import. So, I write it to GPT4, and it returns a widget or some operation. It's an ideal use case because I can easily check it. The advantage is that it often adjusts the code according to context when I paste in the names of existing variables.
  1. The same with boto3. I often work with AWS in Python, and occasionally, I can't remember the parameters of a DynamoDB query or whether to use a client or resource, and so on. It's great for more complex tasks like "give me a client that connects somewhere and filters resources based on the properties timestamp of this object and ..". Within a minute, it solves something I typically spent half an hour searching and trying. No complicated code, just boring boilerplate specifics of various libraries.
  1. A browsing gallery in pure JS for a blog. I was too lazy to program it manually (and I'm not a fan of JS), so I took the existing JavaScript that didn't handle the gallery at all, told GPT4 what I wanted. After a brief conversation about what I didn't like about its solution, like wanting bigger forward and back buttons, it produced functional code. I also plan to have it rewrite some CSS specifics on my blog that make me want to pull my hair out. Typically, scaling on different DPIs and support for mobile devices.
  1. Writing ORM in various frameworks. I usually don't bother too much; I just say what I want in which framework, and it comes up with various queries, inserts, and similar things 98% of the time. I still write models myself.
  1. Writing HTML parsers. I've always found it tedious, so I just took the skeleton of an object with the data I wanted (a bare data class in Python), copied the README from my parser, selected a piece of HTML, and told GPT to implement the queries. And it did. This saves a lot of time.
  1. Creating JQ queries based on JSON examples. Just copy-paste a piece of JSON and say, "give me a jq command that extracts all X from it". It returns a long JQ command that works.
  1. Generating plantuml and other similar things, where it provides a rough structure based on a verbal description, and I just fill in the remaining 10%. It could probably generate a README based on a short snippet and a few textual mentions of details like how to run tests ("it's in pytest") and so on.
  1. Generating Dockerfiles and Docker compose stuff. It's a pain to write manually.
  1. Debugging. Typically, Helm/Kubernetes. I get stuck somewhere, can't find anything on Google. When I copy-paste an error message to GPT4, it solves something I was stuck on for hours within a few messages (recently, details about Kubernetes annotations, Route53, SSL certificates, values, overrides, and other details). GPT is absolutely brilliant for debugging, and instead of wasting time on something, you move forward incredibly fast.
  1. In the future, I want to script the translation of my blog into various languages. Languages have become less important, so why not automatically translate it into most major ones for a few dollars? It already has better quality than cheap human translators (150 CZK / standard page) from various marketplaces, and the price will be in the order of dolars per article. Fun fact: english version of this blogpost was translated by GPT, and it cost me $7. This is less than what I would have to pay for the translation of one standard page (1800 characters) and 1/24 of the costs. It was done within 30 minutes and I didn’t have to go through the pain of hiring a contractor on marketplace website.

Of course, two things need to be said:

  1. Currently, I don't send personal or corporate data there. I always describe something and have something generated. In general, you probably don't want to send anything sensitive on the internet, and it applies here too. Who knows what some future AI version will do with it in 10 years.
  1. I never inherently trust anything from it. I'm talking to a Shoggoth, not a colleague. However, that doesn't mean it's not useful; just don't expect miracles and always verify everything.

Some plans for the future

Try training smaller models on my own datasets. Maybe using AWS EC2 instead of buying a new GPU for 50k CZK, but we'll see. Most consumer GPUs still have too little VRAM.

Try creating my own embeddings and vector databases and see how well it works to search through OpenAI. People are using it, and I want to find out how useful it is and maybe apply it to everything I have on my PC.

Try to put GPT4 into various tools, integrate it into various APIs, and improve its usefulness. I'm not concerned about the usefulness of the model – it's fantastic already, but I'm tired of constantly writing the same prompts or copying pieces of code back and forth awkwardly. Ideally, integrate it better into the system (right-click on file, select "conversation with GPT", open my custom GUI where I handle the data with the chatbot).

Trends and the future

The obvious trends are larger models and models with larger context windows. Personally, I haven't yet read up on what actually limits this, but the 8k context (how many tokens the model can perceive) in GPT4 makes a brutal difference compared to 1/2k in GPT3. And there is even a 32k version, just with a slower rollout.

In 2020, I wrote in an article (GPT-3):

I think with the opening of the API, a new position of "helmsman" of the output will emerge. A kind of specialization of people who will offer the generation of "pre-prepared" settings and parameters to solve specific problems.

Well, this already exists as a job position, and it's called “prompt engineering”. Cool. From what I've seen, there is a growing business in implementing AI into all sorts of things. Often quite vaporware, but for example, searching through embeddings, it looked quite impressive.

The current biggest problem with all models is their isolation, limited context, and inability to learn. So in the future:

  1. They will not be isolated – they will have access to tools, and it will be easy to put your own data into them. I see this as a big pain point; ideally, I would like to have it running locally and across everything. “Find the link my friend sent on IRC yesterday”. “A few years ago, I wrote a script that outputs the structure of HTML websites, find it among hundreds of others”. “Write a script for the action I just did and assign a keyboard shortcut”. “Take the data I see on the screen now and do this and that with it”. Siri and Alexa, which are not completely stupid and useless. Integration with various APIs is already in beta, and it reportedly works great.
  1. They will have an increasingly larger context. One day, we will laugh at the 32k context and not understand it, just as we cannot imagine functioning today on 8-bit with 32k RAM.
  1. Of course, the models will learn and will have different forms of long-term memory, whether it's retraining or things like pinecone and other embedding storages. They will also have integrated reflection, the ability to see into their own reasoning and explain tokenization, how data is stored, and so on. This is possible to find out today, but it's brutally complex and cumbersome.

Gradually, they will be capable of doing absolutely everything. They mostly are now, just in their infancy, alignment sometimes doesn't work, and occasionally, it takes more effort to explain to the model than to do it manually. But it's improving in leaps and bounds at a level that still surprises me, even though I'm quite informed.

AGI

And the end is near. Not of humanity, but of the blog. So, about AGI:

In my opinion, this will directly lead to AGI, or Artificial General Intelligence, an AI that is capable of everything intellectual, on average, as well as or better than a human (which does not mean it will always do everything better than everyone else).

It's not that it's completely here, but now it's just a matter of trying different approaches, improving scalability, and so on. After a hundred years when no one had any idea how to do it, and scientists couldn't even agree on the definition of "intelligence," it's finally coming. It's just still somewhat dumb.

This is like the first airplane. It flies, hooray. But mainly, it shows the way, that it's possible, and probably even how. Now it's just a matter of working on it for a while, with good old iterative development.

What caught my interest were strange loops that people have been doing with GPT lately. A beautiful example is Auto-GPT: An Autonomous GPT-4 Experiment. It essentially takes your request, the model first thinks about how to do it, and then interacts with various tools on the web and disk, and so on. When something is too big (exceeding the context window), it launches additional models that it instructs. During this process, it stores various pieces of information on the disk, which it then reads back to instruct itself. In theory, it is capable of fulfilling quite high-level goals, but in practice, it is still in its infancy and often gets lost.

Currently, it is quite unusable, but overall, it is starting to simulate more complex thought processes quite well. It reminded me a lot of my old blog post Entity, where I described a system that is not intelligent but rents intelligence from people to improve itself. And this can basically delegate intelligence in a similar way, just on itself.

Links

Random notes

In the Twitter infosphere, you want to follow Josha Bach. And JCorvinus is also quite active.

This is worth watching:

Endless discussions about morality, copyright, intelligence, exploitability, and so on, in my opinion, are sterile bikeshedding. People keep discussing it because everyone can, but the outcome of the debate doesn't matter at all. Don't waste time on it.

No need for depression about AI being able to do everything better than you. Even now, there is probably someone in the world who can do anything you can better than you. If you take motivation only from this, simply find another motivation. Maybe think about how it will enable you to improve your potential to achieve the goals you really want.

Edit

More interesting links:

Also, Nofil on reddit posts great summaries of new stuff like this: GPT-4 Week 4. The rise of Agents and the beginning of the Simulation era.

Discussion

Become a Patron