Join data with Code Assist

The Transformations widget can be used to join two data frames.

For this tutorial we’ll use the Palmer Penguins data.

We are going to join a summarized version of the penguins data to a data frame containing the locations of the penguin colonies.

  1. Launch the Transformations widget. Use the Groupby and aggregate transformation to calculate the average penguin mass broken down by island and species.

    mass = df.groupby(["island", "species"]).agg(body_mass_g_mean=("body_mass_g", "mean")).reset_index() mass
    mass = df.groupby(["island", "species"]).agg(body_mass_g_mean=("body_mass_g", "mean")).reset_index()
    
    mass
    Data output
  2. Note that not all penguins are present at each of the colonies. Next manually create a data frame with the locations of each of the colonies.

    import pandas as pd islands = pd.DataFrame.from_dict({ "island": ["Biscoe", "Dream"] "latitude": [-65.4333316, -64.7333333], "longitude": [-65.499998. -64.2333333] }) islands
    import pandas as pd
    
    islands = pd.DataFrame.from_dict({
        "island": ["Biscoe", "Dream"]
        "latitude": [-65.4333316, -64.7333333],
        "longitude": [-65.499998. -64.2333333]
    })
    islands
  3. Launch the Transformations widget again. Select the mass data frame and choose the Join/merge transformation. Choose the islands data frame to join with. By default this is an inner join.

  4. Press the Add Transformation button then the RUN button.

    Inner join the data
  5. The code is inserted into the notebook and run.

    Code is inserted and executed
  6. Because the islands data frame does not contain location information for Torgerson this colony is not included in the results.

  7. You can, however, select different join types. Select a left outer join.

    Select different join types
  8. Now Torgerson is included in the results, but the location fields are empty.

There is a lot of flexibility in the way that you can join data frames using the Transformation widget. You can choose between four types of join:

  • inner join

  • left outer join

  • right outer join and

  • full outer join.

You can also specify which columns are used as join keys or simply join on all common columns.