Drug Classification App in Code Assist

Learn how to build an app with Code Assist to interactively explore drug classification data. In this example you will:

Load CSV data
Create a visualization app
Train a classification model

Start by initializing Code Assist.

Load data

Download the drug classification data from Kaggle. Use Code Assist to load CSV data. The data should have the following columns:

Age
Sex
BP (blood pressure)
Cholesterol
Na_to_K (ratio of Sodium to Potassium) and
Drug (drug label).

Create a visualization

Use Code Assist to create a visualization. Enable the Enable crossfilter toggle to make the visualization responsive to crossfilters. Click the INSERT CODE button.
Code will be inserted into the notebook and immediately executed to create the visualization.

Add a crossfilter

Use Code Assist to create a crossfilter to select one or more drug classes. Enable the Multiple toggle to select more than one drug class. Click the INSERT CODE button.
Create a second crossfilter. This time choose a Slider filter and select the Na_to_K column. Change the Mode to >=. Click the INSERT CODE button.
The code for both of the crossfilters will be present in the notebook. Use the crossfilters to change the appearance of the visualization.

Create an App

Use Code Assist to create an app. Select, move and resize widgets until you have the required layout. Click the INSERT CODE button.
The code will be inserted into the notebook. Click the PREVIEW to launch a preview version of the app.

app example drug classification app preview default

Create a classification model

Use the same data to build a simple Machine Learning model to predict the drug class. First let’s take another look at the data in df.

Create a model to predict Drug using the remaining columns. You need to import some more packages and then split the data in predictors, X, and target, y. Insert the following code into a new cell in the notebook.

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

X = df.iloc[:,:5]
y = df.iloc[:,5]

The categorical columns, Sex, BP and Cholesterol need to be dummy (one-hot) encoded. Insert the following code into a new cell in the notebook.

sex = pd.get_dummies(X['Sex'])
bp = pd.get_dummies(X['BP'])
cholesterol = pd.get_dummies(X['Cholesterol'])

X = pd.concat([X[['Age']], sex, bp, cholesterol], axis='columns')

The target column, y, also needs to be encoded. Insert the following code into a new cell in the notebook.
```
encoder = LabelEncoder()

y = encoder.fit_transform(y)
```

Now split the data into training and testing sets then build a model using the training set. Insert the following code into a new cell in the notebook.

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

tree = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train)

Make predictions

Finally, use the model to make predictions for the testing set and then calculate the model accuracy on the testing set. Insert the following code into a new cell in the notebook.

tree.predict(X_test)

round(tree.score(X_test, y_test), 4)

The model is not terribly accurate, but given the size of the data it’s not too bad! Simply guessing would yield an accuracy of only 25% and the model achieves 50%.

See the Stocks App for an example of how a model can be incorporated into an app.