Drug Classification App in Code Assist

Learn how to build an app with Code Assist to interactively explore drug classification data. In this example you will:

  • Load CSV data

  • Create a visualization app

  • Train a classification model

Start by initializing Code Assist.

Load data

Download the drug classification data from Kaggle. Use Code Assist to load CSV data. The data should have the following columns:

  • Age

  • Sex

  • BP (blood pressure)

  • Cholesterol

  • Na_to_K (ratio of Sodium to Potassium) and

  • Drug (drug label).

Create a visualization

  1. Use Code Assist to create a visualization. Enable the Enable crossfilter toggle to make the visualization responsive to crossfilters. Click the INSERT CODE button.

    Create a visualization
  2. Code will be inserted into the notebook and immediately executed to create the visualization.

    Code automatically inserted and visualization created

Add a crossfilter

  1. Use Code Assist to create a crossfilter to select one or more drug classes. Enable the Multiple toggle to select more than one drug class. Click the INSERT CODE button.

    Create a crossfilter
  2. Create a second crossfilter. This time choose a Slider filter and select the Na_to_K column. Change the Mode to >=. Click the INSERT CODE button.

  3. The code for both of the crossfilters will be present in the notebook. Use the crossfilters to change the appearance of the visualization.

Create an App

  1. Use Code Assist to create an app. Select, move and resize widgets until you have the required layout. Click the INSERT CODE button.

  2. The code will be inserted into the notebook. Click the PREVIEW to launch a preview version of the app.

Create an app

Create a classification model

  1. Use the same data to build a simple Machine Learning model to predict the drug class. First let’s take another look at the data in df.

    View the data
  2. Create a model to predict Drug using the remaining columns. You need to import some more packages and then split the data in predictors, X, and target, y. Insert the following code into a new cell in the notebook.

    from sklearn.tree import DecisionTreeClassifier
    from sklearn.metrics import confusion_matrix
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import LabelEncoder
    
    X = df.iloc[:,:5]
    y = df.iloc[:,5]
  3. The categorical columns, Sex, BP and Cholesterol need to be dummy (one-hot) encoded. Insert the following code into a new cell in the notebook.

    sex = pd.get_dummies(X['Sex'])
    bp = pd.get_dummies(X['BP'])
    cholesterol = pd.get_dummies(X['Cholesterol'])
    
    X = pd.concat([X[['Age']], sex, bp, cholesterol], axis='columns')
  4. The target column, y, also needs to be encoded. Insert the following code into a new cell in the notebook.

    encoder = LabelEncoder()
    
    y = encoder.fit_transform(y)
  5. Now split the data into training and testing sets then build a model using the training set. Insert the following code into a new cell in the notebook.

    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
    
    tree = DecisionTreeClassifier(max_depth = 2).fit(X_train, y_train)

Make predictions

Finally, use the model to make predictions for the testing set and then calculate the model accuracy on the testing set. Insert the following code into a new cell in the notebook.

tree.predict(X_test)

round(tree.score(X_test, y_test), 4)

The model is not terribly accurate, but given the size of the data it’s not too bad! Simply guessing would yield an accuracy of only 25% and the model achieves 50%.

See the Stocks App for an example of how a model can be incorporated into an app.