Publish a Streamlit LLM app

Develop and publish a large language model (LLM)-based chatbot on Domino with the Streamlit framework. Learn how to quickly create a Domino-hosted app that takes user input to call generative artificial intelligence (AI) APIs from Hugging Face and your own Domino Domino endpoint.

For more generative AI LLM sample projects, see Domino LLM Reference Projects.

Prerequisites

To complete this how-to you must fulfill the following prerequisites.

Hugging Face API key

Get a Hugging Face API key to access models hosted on Hugging Face.

Create your account on Hugging Face.
Get your API key from Hugging Face.
Create a new Domino user level environment variable called HUGGING_FACE_API_TOKEN and add your API key as its value.

Jupyter server proxy

Create a compute environment in Domino with Jupyter or Jupyterlab and jupyter server proxy. If you use a Domino Standard Environment, then Jupyter, Jupyterlab, and jupyter server proxy are already installed for you.

Domino endpoint

The app uses both a Hugging Face Domino endpoint and a Domino Domino endpoint to demonstrate how to access multiple models from a single app.

To host the model used in this how-to, follow the instructions in the Domino Streamlit app repository.

App architecture

You will create an app frontend using Streamlit chat elements to interact with two different models and get answers to movie-based questions.

The app uses a model from the Hugging Face LLM Models repo to generate lists of movies, and another model deployed as a Domino endpoint to classify movie summaries by genre.

Domino-hosted app makes calls to Domino-hosted Domino endpoint and Hugging Face API

Create a simple chat app that has the following elements:

Title.
Sidebar with the following:
- A password field to add a Hugging Face API key.
- A radio button to select between the two models.
A body with the chat interface.

Interactive app development

You can interactively develop your Streamlit app in a Domino Workspace and see your changes live via an app URL.

Create the app title and sidebar

Launch a new Jupyter or Jupyterlab workspace in Domino. Make sure the jupyter server proxy package is installed.
Create a new file in your /mnt directory called chatbot.py

Add the following code to add a title and sidebar.

import streamlit as st
import os
import json
import requests
import pandas as pd

API_TOKEN = os.environ['HUGGING_FACE_API_TOKEN']
API_URL = "https://api-inference.huggingface.co/models/microsoft/DialoGPT-large"
headers = {"Authorization": f"Bearer {API_TOKEN}"}

# App title
st.set_page_config(page_title="🤖💬 Your Personal Chat Assistant")

# App sidebar
with st.sidebar:
    st.title('🤖💬 Your Personal Chat Assistant')
    if API_TOKEN ==  "":
        API_TOKEN = st.text_input('Enter Hugging Face API Token:', type='password')
        if not (API_TOKEN):
            st.warning('Please enter your Hugging Face API Token', icon='⚠️')
        else:
            st.success('Proceed to entering your prompt message!', icon='👉')
    else:
        st.success('Hugging Face API Token provided!', icon='✅')

    model = st.radio(
    "Select a model :point_down:",
    ('Microsoft DialoGPT-large', 'Your Domino Hosted Prediction API'))

Open a new terminal window in your IDE and run the following code.

echo -e "import os
print('https://your-domino-url/{}/{}/notebookSession/{}/proxy/8501/'.format(os.environ['DOMINO_PROJECT_OWNER'], os.environ['DOMINO_PROJECT_NAME'], os.environ['DOMINO_RUN_ID']))" | python3

Replace your-domino-url with your Domino domain. This is the app URL you’ll use to access the app in a new browser tab once it is deployed from within the workspace.

Note the use of 8501 in the URL. This is the default port at which Streamlit deploys an app.
Type streamlit run chatbot.py in the terminal. You should see your app being published by Streamlit and running on the default 8501 port.
View your app in a new browser tab using the app URL.

Capture user inputs

Next, let’s use Streamlit session state to capture new user inputs.

Add the following code to chatbot.py.

# Seek new input prompts from user
if prompt := st.chat_input("Say something"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

Store the bot and user’s conversation, and view the updated app.

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I help you today?"}]

# And display all stored chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

View the updated app in the app URL to see the changes.

Call the Hugging Face API and Domino endpoint

Write a function to call the Microsoft/DialoGPT-large model hosted on Hugging Face via an API call.

# Query the microsoft DialoGPT
def query_DialoGPT_model(prompt_input, past_user_inputs = None, generated_responses = None):

    payload = {
        "inputs": {
            "past_user_inputs": past_user_inputs,
            "generated_responses": generated_responses,
            "text": prompt_input,
        },
    }

    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    response = json.loads(response.content.decode("utf-8"))
    response.pop("warnings")
    return response.get('generated_text')

Create a new function that invokes the second model hosted as a Domino endpoint in your Domino deployment.

# Query your own internal Domino endpoint
def query_domino_endpoint(prompt_input):

    response = requests.post("https://{your-domino-url}:443/models/{model-id}/latest/model",
        auth=(
            "{domino-endpoint-access-token-here}",
            "{domino-endpoint-access-token-here}"
        ),
        json={
          "data": {
            "{model-parameter-1}": prompt_input
          }
        }
    )

    return response.json().get('result')[0]

Create a function that invokes the Hugging Face LLM model or Domino endpoint hosted in Domino based on the radio button selection in the sidebar.

# Function for generating LLM response
def generate_response(prompt):
    if model == "Microsoft DialoGPT-large":
        response_generated = query_DialoGPT_model(prompt)
    elif model == "Your Domino Hosted Prediction API":
        response_generated = query_domino_endpoint(prompt)
    return response_generated


# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_response(prompt)
            st.write(response)
    message = {"role": "assistant", "content": response}

    st.session_state.messages.append(message)

Complete app code

Putting it all together, your app code should look like the following:

import streamlit as st
import os
import json
import requests
import pandas as pd

API_TOKEN = os.environ['HUGGING_FACE_API_TOKEN']
API_URL = "https://api-inference.huggingface.co/models/microsoft/DialoGPT-large"
headers = {"Authorization": f"Bearer {API_TOKEN}"}

# App title
st.set_page_config(page_title="🤖💬 Your Personal Chat Assistant")

# App sidebar
with st.sidebar:
    st.title('🤖💬 Your Personal Chat Assistant')
    if API_TOKEN ==  "":
        API_TOKEN = st.text_input('Enter Hugging Face API Token:', type='password')
        if not (API_TOKEN):
            st.warning('Please enter your Hugging Face API Token', icon='⚠️')
        else:
            st.success('Proceed to entering your prompt message!', icon='👉')
    else:
        st.success('Hugging Face API Token provided!', icon='✅')

    model = st.radio(
    "Select a model :point_down:",
    ('Microsoft DialoGPT-large', 'Your Domino Hosted Prediction API'))

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I help you today?"}]

# And display all stored chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

# Seek new input prompts from user
if prompt := st.chat_input("Say something"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.write(prompt)

# Query the Microsoft DialoGPT
def query_DialoGPT_model(prompt_input, past_user_inputs = None, generated_responses = None):

    payload = {
        "inputs": {
            "past_user_inputs": past_user_inputs,
            "generated_responses": generated_responses,
            "text": prompt_input,
        },
    }

    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    response = json.loads(response.content.decode("utf-8"))
    response.pop("warnings")
    return response.get('generated_text')

# Query your own internal Domino endpoint
def query_domino_endpoint(prompt_input):

    response = requests.post("https://{your-domino-url}:443/models/{model-id}/latest/model",
        auth=(
            "{domino-endpoint-access-token-here}",
            "{domino-endpoint-access-token-here}"
        ),
        json={
          "data": {
            "{model-parameter-1}": prompt_input
          }
        }
    )

    return response.json().get('result')[0]

# Function for generating LLM response
def generate_response(prompt):
    if model == "Microsoft DialoGPT-large":
        response_generated = query_DialoGPT_model(prompt)
    elif model == "Your Domino Hosted Prediction API":
        response_generated = query_domino_endpoint(prompt)
    return response_generated


# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = generate_response(prompt)
            st.write(response)

    message = {"role": "assistant", "content": response}
    st.session_state.messages.append(message)

Finally, lets view the app again and make sure all the parts are working as expected. Converse with both models and observe the output. If all looks good, sync all changes to your project and stop your workspace.

Publish the app

Use Domino’s publish app feature to publish the chatbot app. Before we do so, we need to create an app.sh file. Recall that Domino requires the app to be hosted with hostname 0.0.0.0 and port 8888. Streamlit, by default, publishes an app at port 8501 but provides us with the option to set the host and port in its config.toml file.

Create an app.sh file and add the content below to the file:

mkdir ~/.streamlit
echo "[browser]" > ~/.streamlit/config.toml
echo "gatherUsageStats = true" >> ~/.streamlit/config.toml
echo "serverAddress = \"0.0.0.0\"" >> ~/.streamlit/config.toml
echo "serverPort = 8888" >> ~/.streamlit/config.toml
echo "[server]" >> ~/.streamlit/config.toml
echo "port = 8888" >> ~/.streamlit/config.toml
echo "enableCORS = false" >> ~/.streamlit/config.toml
echo "enableXsrfProtection = false" >> ~/.streamlit/config.toml

streamlit run chatbot.py

Click Deployments > App from the project sidebar.
Give your app an informative title and description, and set permissions to Anyone can access. This allows anyone with a network connection to your Domino deployment to access the app if they have the URL.

Tip
Select Make globally discoverable to make your app easily discoverable. You can always change this later.
Click Publish Domino App.
After the app status says "Running", click View App to load your app.

Share and consume

Now, you can set the permissions on your app to Anyone can access to share it with colleagues who have access to your Domino instance. You can try this out yourself by opening a private or incognito browser, or logging out of Domino, and navigating to the Deployments > App > Copy App Link.

To browse more apps, click Deploy > Apps in the top navigation menu.

Next steps