Robert’s AI Glossar

Large Language Models

Model Types

Language Models

General language models such as “llama-2-70b” are only capable of continuing a pre-written text. They can be called for a number of tokens and will generate as much output as possible, not minding at all if it follows any task, if the answer is complete or if it stays on topic.

If you provide the following string as an input to “llama-2-70b”:
"The solar system has "

It might complete the text as follows:
"eight planets, namely Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune, each possessing its unique characteristics and orbiting the Sun, the star at the center of the system."

Depending on how many tokens you provide for this operation the response might be shorter, and thereforce the sentence incomplete, or longer and as a result the language model might start talking about something completely irrelevant.

Using just such a text-completion model it is hard to have it follow a “prompt” or given instruction if you expect any specific task to be completed or a response to be formed similiar to ChatGPT.

As specific issue in using these models is that it is hard to tell the model to stop generating output after a certain point. It does not just stop after a sentence has completed and a newline starts or when it observes a stop character such as </s> or ###.

Instruction tuned / Chat Models

Instruction tuned or often so called “chat” models have been extensively trained on human chat interactions. As a result this model will take the prompt as a conversation starter and respond in a question, answer and followup kind of fashion. Making it useful to debate concepts, understand logic and even retrieve specific information.

As follows is an example out of the Puffin training dataset. It shows a theoretical conversation between a human and an assistant, here the assistant is referred to as “gpt”. Multi-turn conversation, such as with followup questions are usually considered to be of higher quality then one-shot answers, though this depends heavily on the usecase. At a minimum such training requires 3.000- 5.000 chats between an AI assistant and a huamn, including the response. Normally however this involves very large training datasets of 50.000 chats or more.

[ { "from": "human", "value": "Define \"in-built payments\"" }, 
{ "from": "gpt", "value": "\"In-built payments\" refers to a payment processing system that is integrated or embedded within a software, platform, or application. This allows users to make transactions directly within the platform without needing to navigate to a separate payment processing site or use third-party applications.\n\nIn-built payment systems streamline the payment process and typically provide a seamless user experience. These systems are commonly found in e-commerce websites, mobile apps, and various online services, where they handle transactions, process payments, and manage secure payment data. In-built payment systems may also support multiple payment methods, such as credit cards, digital wallets, and direct bank transfers, to accommodate user preferences." },

{ "from": "human", "value": "Is it the same as pre-paid?" }, 
{ "from": "gpt", "value": "No, \"in-built payments\" and \"pre-paid\" are not the same concept. While they both relate to payment processing, they refer to different aspects of payments.\n\nIn-built payments, as mentioned earlier, refers to a payment processing system that is integrated or embedded within a software, platform, or application, allowing users to make transactions directly within the platform.\n\nOn the other hand, \"pre-paid\" refers to a payment method where the user pays for a product or service in advance, before using or consuming it. Pre-paid payment methods are commonly found in various industries, including telecommunications (pre-paid mobile phone plans), transportation (pre-paid transit cards), and financial services (pre-paid debit or gift cards).\n\nIn some cases, an in-built payment system might allow users to make pre-paid transactions, but the two terms describe different aspects of payment processing." } ]

Issues with baseline (non community) chat models

Companies such as Meta often make their Chat models technically useless by using a process such as “supervised instruction tuning”, tailoring the responses to match certain safety requirements. While it is good to mind safety, these companies often overshoot by a large margin and force their model to almost never provide any useful answer and if at all only with a boiler plate alongside an upfront appology that it might be wrong and or harmful. This makes the model close to impossible to use within a technical workflow, such as generating inputs for other models or processing data.

Training

Training a model has the goal to enchance it in a certain way. There is many ways to make the output of a model better such as ‘pre-training’ and ‘fine-tuning’ as detailed below. There is also many definitions of better, for example maybe you want to enchance the capability of your model to write ObjectPascal Code or tailor it’s output to be a more reliable input for an internal ERP API EndPoint.

Pre-training

Is the process of adding new capabilities and and partially new information to an existing base model. This process is very similiar to and almost the same as fine-tuning. It requires is however much more compute (tokens, GPU/CPU ressources and time).

It works by training the model on predicting the expected next word. For example if the base language model would normaly translate “Germany” to just “Deutschland” where the desired output should be”Bundesrepublik Deutschland” then the pre-training works in a way that the word “Bundesrepublik” receives a signifficant enough weight adjustment to be favored over just “Deutschland”.

This is done by calling the model multiple times and confronting it with the desired completion. As a result of this weight adjustment, the “width” of the model will decay. Depending on how intensive the pre-training is, whole capabilities such as speaking certain languages or understanding certain logic can deteriorate or disappear completely.

prio to pre-training, translation of “Germany”:
Germany -> Deutschland

During pre-training the weight of “Bundesrepublik” in conjunction with the word “Deutschland” gets boosted due to showing the model many completions which use the words together.

following pre-training, translation of “Germany”:
Germany -> Bundesrepublik Deutschland
Example on pre-training

Fine-tuning

Very similiar to pre-training, fine tuning is used to add new capabilities and and partially new information to an existing base model. It works however quite different and thus requires much less GPU time and resources, as a result it is a quite practical method to adjust a model to better fit your needs.

In contrast to pre-training, where the focus is solely on adjusting the weights of the ‘next word’ predictions, fine-tuning aims to provide examples of logic and informations to a model. This can be in the form of presenting input alongside expected outputs.

Example from the Dharma fine-tuning dataset, both string are provided to the model:

"Passage: The return address is not required on postal mail. However, lack of a return address prevents the postal service from being able to return the item if it proves undeliverable; such as from damage, postage due, or invalid destination. Such mail may otherwise become dead letter mail. Question: do you need to put return address on mail Choices: A: True B: False"

"B"

Operational terms

System prompt

The system prompt only applies to chat models and refers to a message that is placed in the human-assistent chat dialog in the very beginning, to initialize the dialouge and set, expectation, boundaries or give rules for output formats.

A simple way to imagine the system prompt is basically an invisible chat message that is send before the actual chat starts. The system prompt can often times not be changed and is – even though this is debatable – sometimes kept a secret.

Example of an often used ChatGPT system prompt:

You are ChatGPT, a large language model trained by OpenAI. Follow the user's instructions carefully. Respond using markdown.

If you are working via Python or CPP directly towards a local model FastAPI or the OpenAI API you can modify the above prompt to guide the LLM to better fit your desired output format. For example if you want to transform information into a specific XML schema you could modify the system prompt as follows:

You are ChatGPT, a large language model trained by OpenAI. Follow the user's instructions carefully. Provide your response solely as XML in the format provided.

Another example of a system prompt often used by open source models such as vicuna is as follows:

A chat between a curious human and an artificial intelligence assistant.The assistant gives helpful, detailed, and polite answers to the human's questions.

Prompt / Message

The initial, and each subsequent request towards a large language model is often called prompt. The word by now has a bit of a magical flair, however it is most of the times the first and mabye only chat message send towards the model API. This is why within standard Python code using FastAPI and normal tools you will often find this called just ‘message’, though OpenAI calls it prompt.

It is important to understand that this prompt or first message is pre-empted by the system prompt. You can simply assume this to be two string being merged before being passed to the API.

Example of a one-shot prompt:

You are a CISCO CCIE+S certified professional tasked with configuring a CISCO Nexus 3000 core switch. Your ojective is to enable SSH remote management access on interface Ethernet1/32 on VLAN42 instead of mgmt0. Provide step by step instructions on how to adjust the configuration.

Rules for better prompts

Prompt engineering is a new field of study and a lot of tipps and trikcs have already been discovered to improve the quality of both prompts as well – most imprtantly – the desired output.

The RTO rule

A very successful and simple method to boost the quality of your AI interactions is to follow the RTO rule. RTO stand for Role, Task and Output. It literally means that you, in very brief terms, ideally 1 sentence each, break the task down into the role he model should assume, the actual task and the precise desired output.

Example of an RTO prompt:

You are an experienced Python developer well versed in working with Microsoft Office 365 and GraphAPI. Your objective is to develope a small program that retrieves the last 5 e-mails (subject, body, header) from the complaints@contoso.com mailbox and stores the results in a mySQL table. Provide the needed python sample code, use placeholders for API keys and unknown variables such as database paths and access credentials.

In the example above the role is being an experienced python developer. The task is outlined brief but with enough detail to retrieve the mails. the outut is defined so that we don’t want to receive general guidance but actual dummy code.

A more simplistic example of an RTO prompt:

You are an experienced chef with a good understanding of crafting easy to reproduce recipes. Your taks is to make use of 2 eggs, 50ml of cream available in the fridge to backe a healthy but tasty cake suitable for a five year old kid, limiting sugar and salt to a minimum. You provide simple cooking instructions, starting with preparations and then each step, you highlight upfront if steps can be parallelised, your steps contain settings for devices as well as estimated durations.

Context size / context windows / CNTX

The context in simple terms refers to the total amount of information a large language model can process including all inputs as well as the formulated output. A major surprise is often for people to learn that while, proving more context improves the logic and result, it also limits or if reaching the limit blocks the model from generating a response at all.