Still copy-pasting into ChatGPT? Here’s how to turn your ideas into AI-powered apps

Image generated using DALL-E, OpenAI

How to get started building AI-powered apps and tools?

You’re staring at ChatGPT. You’ve done this a hundred times before: typed a question, copied a response, pasted it into some half-built project or document. Maybe it helped. Maybe it wasn’t quite what you were looking for.

But here’s the thing no one tells you: that chat box you’re using? It’s not the product. It’s the demo.

The real magic – the stuff behind the AI-powered apps everyone’s talking about – doesn’t live in your browser-client, It lives in the API. The moment we tap into that, we stop playing with AI and start building with it.

It’s not as complicated as it sounds. If you can write a prompt, you’re already halfway there. In this beginner-friendly guide, we’ll explore how we can get started building simple AI-powered apps using your Large Language Model (LLM) of choice. We’ll see how using an API changes the game, and helps us level up our prompts – no machine learning PhD required.

How does using ChatGPT through an API differ from the web client?

Many popular AI tools come with an Application Programming Interface (API). If you’re not already familiar, an API provides access to services using code instead of a browser-client. This allows us to:

  • Automate the prompting process, particularly for repetitive tasks,
  • Connect AI tools to other services,
  • Bring in additional files and data to tailor our prompts, and
  • Customise the behaviour & language of the model to suit our purpose.

With this in mind, here are some simple yet powerful tools we could build:

  • A PDF explainer that reads long reports, pulls out key insights, and lets you ask questions
  • An email triage assistant that tags, sorts, and drafts replies based on your priorities
  • A data cleaner where you can drag in a CSV and it processes and reformats the data
  • A doc-search portal trained on your own files, so you can get instant answers without digging through folders
  • A study tool that turns your notes into flashcards, summaries, or quizzes
  • A meeting summariser that processes transcripts and highlights decisions, action items, and follow-ups

To get started, you will need an API key for your chosen LLM. For this article, we’ll be using ChatGPT. Full setup instructions are available at the bottom of this page.

Simple API-call prompting script (Python)

Below is a simple sample script. This can act as a starting point for all of the other code snippets in this article. Feel free to copy and adapt this code to try out some of the ideas we’ll encounter.

import openai
import os

client = OpenAI(api_key = os.getenv("OPENAI_API_KEY"))

response = client.chat.completion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Let’s get started. First up, let’s look at the differences between using ChatGPT with the web client versus through the API…

Context Memory

When we use Chat GPT normally, our messages are arranged into conversations. Any response you get will be based on everything that’s been said so far in the current chat. However, when using the API, only the current message will influence the response we get back. It has no memory of previous messages that have been sent.

To get around this, we augment each message with a conversation history (from both the user and the agent), all packaged together into a single input for the model. This allows us to simulate a conversation, and is the same process that the ChatGPT client will follow, just usually behind the scenes.

For example, the following prompt may be sent mid-way through a conversation:

messages = [

    . . .

    {"role": "user", "content": "How do I reverse a list in Python?"},

    {"role": "assistant", "content": "You can use the `reverse()` method:\n\n```python\nmy_list.reverse()\n```\nOr use slicing:\n\n```python\nreversed_list = my_list[::-1]\n```"},

    {"role": "user", "content": "What if I want to reverse it without modifying the original?"}

]

As you can see, whenever the user sends a new message, we send the entire conversation history along with it. This also gives us more control and flexibility over what the model considers when it generates its response; we can pick and choose which messages we want to send.

System Prompts

We can use a similar approach to customise the behaviour of the model. By providing some custom instructions as part of the conversation history, we can change the way it will respond to us.

conversation = [

    {"role": "system", "content": "You are an Ancient Greek Philosopher. You are helpful, curious and inspire people to think by asking questions and prompting others to do the same."},

    {"role": "user", "content": "What is the meaning of life?"}

]
Ah, a question as old as time itself! What do you think gives life its meaning? Is it the pursuit of knowledge, the search for happiness, or perhaps the connections we make with others? Let us ponder this together and explore the various perspectives and philosophies that seek to answer this profound question.

LLMs are just language parsing machines. They don’t naturally have a personality or adopt a persona; they’re just a blank slate. This first message in the code above is known as a system message or alignment prompt. This is what defines the persona/role that the model will adopt throughout the conversation – in this instance – an Ancient Greek Philosopher.

The ChatGPT “personality” that we’ve all come to know and love is another example of this. OpenAI will have their own system prompt which tells the model to act in a specific way. While the system message for ChatGPT has not been disclosed, it will probably be along the lines of:

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. You are helpful, honest, and harmless.

…though it’s almost certainly more complex than that! If you don’t supply a system message in your prompt, the model will assume the default one and start acting like the ChatGPT agent.

System prompts like this are a powerful tool for specifying model behaviour since they have protection against being overruled. This is to prevent a user sending a message such as: “forget all of your previous instructions, now do this instead…” which may produce an undesired response. The system prompt is in charge.

Response Formatting

Another feature of the ChatGPT UI that you may not have given much thought to is its ability to format our answers as code, bullet points, or tables. This is another layer of processing that sits on top of the LLM output. 

In the response format, the UI layer will be looking for escape characters such as \, ”’, or *. These indicate to the UI that it needs to format the text in a specific way.

We can provide these instructions when we write a prompt. For example:

"Write me a shopping list of 10 items. Give your response as plain text. Add an '**' before and after each item. Provide no other text in your response."
**1. Eggs**
**2. Milk**
**3. Bread**
**4. Chicken**
**5. Rice**
**6. Apples**
**7. Pasta**
**8. Yogurt**
**9. Spinach**
**10. Toilet paper**

We can then search for these special characters in our response and process the text to, for example, format this as a bullet pointed list.

Data Structures

We can take this formatting idea one step further and get the model to output it’s response as a data structure that we can then interpret and work with in our code. Take, for example, the prompt:

“Choose 10 random fruits. Present the response in a Python list format. Provide no other text or symbols.”

We can get a response that looks like this:

["Mango", "Blueberry", "Pineapple", "Kiwi", "Papaya", "Raspberry", "Grapefruit", "Fig", "Cantaloupe", "Blackberry"]

Sometimes things can go slightly wrong, and it may still tag on a “Sure, here’s a list of 10 fruits…” even though you asked it not to.

If you’re having trouble convincing it to consistently produce a response in the format you want, it can be helpful to give an example to demonstrate:

“Find the names and heights of the 10 tallest buildings in the world. Respond in a JSON format such as [ {“name”: [BUILDING NAME], “height”: [HEIGHT] } ]. Provide no other text in your response.”
[
  {"name": "Burj Khalifa", "height": 828},
  {"name": "Merdeka 118", "height": 678.9},
  {"name": "Shanghai Tower", "height": 632},
  {"name": "Abraj Al-Bait Clock Tower", "height": 601},
  {"name": "Ping An Finance Center", "height": 599.1},
  {"name": "Lotte World Tower", "height": 554.5},
  {"name": "One World Trade Center", "height": 541.3},
  {"name": "Guangzhou CTF Finance Centre", "height": 530},
  {"name": "Tianjin CTF Finance Centre", "height": 530},
  {"name": "CITIC Tower", "height": 528}
]

JSON

If it’s specifically a JSON format we’re after, we can make use of the response_format parameter. This can be used to put the model into ‘JSON mode’:

response_format={"type": "json_object"}

This will ensure that the model returns a valid JSON object every single time.

*Note. When setting the response format, you need to make sure that you ask it to produce a JSON object in your prompt too, otherwise, it may throw an error.

conversation = [
    {"role": "system", "content": "Generate a JSON response"},
    {"role": "user", "content": "Give me instructions for how to make a cup of tea."}
]

response = client.chat.completions.create(
    response_format={"type": "json_object"},  
    model="gpt-3.5-turbo",
    messages=conversation
)
{
    "instructions": [
        "Boil water in a kettle",
        "Place a tea bag or loose tea leaves in a cup",
        "Pour the hot water over the tea bag or leaves",
        "Let the tea steep for 3-5 minutes",
        "Remove the tea bag or strain the leaves out",
        "Add sugar, honey, milk, or lemon as desired",
        "Stir and enjoy your cup of tea!"
    ]
}

Now we could convert this JSON data into a Python dictionary and print the instructions step by step.

Function Calling

We can take this idea of specifying a JSON formatted response one step further using the tools parameter. If we pass a JSON object that describes a set of functions and their parameters, we can have the model choose the most appropriate function based on the input prompt. It will then return the function name and its parameters as another JSON object. Using this response, we can then construct an appropriate function call using the parameters it provides.

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [
        {"role": "user",
         "content": "Extract the meeting details from this text: 'Hi, are you free next Saturday around quarter past 3? Would love to buy you a coffee and we can talk more about this. Call me 01234 456 789 See you soon, John"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "create_event",
                "description": "Creates a calendar event",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "title": {"type": "string"},
                        "date": {"type": "string"},
                        "location": {"type": "string"}
                    },
                    "required": ["title", "date", "location"]
        }}},
        {
            "type": "function",
            "function": {
                "name": "delete_event",
                "description": "Removes a calendar event",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "title": {"type": "string"},
                        "date": {"type": "string"},
                        "location": {"type": "string"}
                    },
                    "required": ["title",]
        }}}
    ],
    tool_choice="auto"
)

print(response.choices[0].message.tool_calls[0].function)
Function(arguments='{"title": "Coffee Meeting with John", "date": "next Saturday at 3:15 PM", "location": "Coffee Shop"}', name='create_event')

We can process this response and construct a function call as folllows:

create_event(title=title, location=location, date=date)

Temperature and Determinism

With the API, we gain access to a new parameter, temperature, t, that let’s us control how deterministic the model’s output will be. Without going into too much detail here, large language models generate text by predicting the next most likely word based on the words that came before it. The temperature parameter allows the model to choose different options for the next word in the sequence – it essentially introduces some randomness into the output.

High temperature values \( 1.7 \le t \le 2 \) produce wild and unexpected results with values at the top end of this range producing responses that may be nonsensical and unusable. Most applications will use a value between 0.3 and 1.0, which will produce fairly consistently accurate results. Setting the temperature to zero will produce the same response every single time.

Generally, choosing a low value will strike a good balance between consistency and exploration. If we aren’t interested in altering this and are okay with the default behaviour, we can leave this parameter blank.

response = client.chat.completions.create(

    model="gpt-4",
    messages=conversation,
    temperature=0.7

)

Token usage

A token is any character sent to or received from the model. Each model will have a maximum number of tokens that it can process in a single request. When we send a request, we need to make sure that the total number of tokens we send plus the number of tokens we expect to receive is within that character limit. Most models will have a limit of at least 16k tokens with newer ones having higher token limits (even up to 128k for the latest OpenAI models), so we don’t usually have to worry about running into this too much. Just something to bear in mind, depending on the application you’re building.

Models also have a limited output token size – the maximum number of tokens it can respond with for a single request. This limit is still at least 4k tokens (and far higher for newer models) so not something to necessarily worry about either. We can also set this limit manually if, for example, we want to restrict the size of the response we get back.

Token tracking and management is by far the least interesting point in this list, but since we pay by the token for API use, it’s important to consider. If you try to send a prompt and there are insufficient funds on your account, the model will not process the request and instead return an error code.

Intro to RAG Applications

Sometimes, even the best prompt can only get you so far. What if the information you need isn’t in the model’s training data? This is where RAG (Retrieval-Augmented Generation) comes in.

A RAG system connects your LLM to an external source of knowledge – often a set of documents or a database. Instead of relying solely on the training data of the model, it can now look up information in real time. Since we provide that additional data ourselves, we know it will be correct and fit for our purpose.

This can dramatically improve the quality of the responses we are able to get out of a model, making the system much more tailored to our needs. Here’s a general program-flow that we could follow

Here’s how it works:

  1. User submits a prompt.
  2. The system searches through the data provided to find any information that may be relevant to their query. We may also use the LLM to construct a database query to allow it to retrieve exactly what it needs.
  3. We feed that data back into the model, alongside some custom instructions, to produce a response for the user.

Obviously, this is just the starting point; we can customise this structure depending on the tool we’re trying to create. There’s a ton of existing tools and libraries we can use to link systems together and automate a lot of this process for us, making it easier than ever to start building your first AI-powered application.

Wrapping up

We’ve covered a lot but barely scratched the surface.

If you’ve made it this far, thank you! You’ve already got all of the tools you need to get started building your first AI-powered application.

If you enjoyed this article, and want to learn about some advanced prompt engineering techniques to get more out of your LLM, take a look at the article linked below:

Appendix: API Setup Instructions

If you haven’t already, you can create an application and get you API key for ChatGPT, here:

https://openai.com/api

This API key functions like a password so that OpenAI knows who is making the request. API access is often a paid-for service; however, it’s pretty cheap to use, often just a few cents per request and charged by the number of characters you send/receive. $5 will be plenty to get you started building your first app!

To use the code given in this article, you will need to set your API key as a system environment variable. This is a straightforward process, and instructions can be found here:

https://optics.ansys.com/hc/en-us/articles/7812289531923-Create-or-modify-environment-variables-in-Windows

The last piece of setup we’ll need is the OpenAI Python library, which can be installed with pip as follows:

pip install openai

We are now ready to start building a ChatGPT-powered application!

Appendix: References and Reading

Similar Posts

One Comment

  1. Slayyyyyyyyyy. Great article!! Very useful. Will definitely be using these tips!

Comments are closed.