The Transformations widget can be used to join two data frames.
For this tutorial we’ll use the Palmer Penguins data.
We are going to join a summarized version of the penguins data to a data frame containing the locations of the penguin colonies.
-
Launch the Transformations widget. Use the Groupby and aggregate transformation to calculate the average penguin mass broken down by island and species.
mass = df.groupby(["island", "species"]).agg(body_mass_g_mean=("body_mass_g", "mean")).reset_index() mass
-
Note that not all penguins are present at each of the colonies. Next manually create a data frame with the locations of each of the colonies.
import pandas as pd islands = pd.DataFrame.from_dict({ "island": ["Biscoe", "Dream"] "latitude": [-65.4333316, -64.7333333], "longitude": [-65.499998. -64.2333333] }) islands
-
Launch the Transformations widget again. Select the
mass
data frame and choose the Join/merge transformation. Choose theislands
data frame to join with. By default this is an inner join. -
Press the
button then the button. -
The code is inserted into the notebook and run.
-
Because the
islands
data frame does not contain location information for Torgerson this colony is not included in the results. -
You can, however, select different join types. Select a left outer join.
-
Now Torgerson is included in the results, but the location fields are empty.
There is a lot of flexibility in the way that you can join data frames using the Transformation widget. You can choose between four types of join:
-
inner join
-
left outer join
-
right outer join and
-
full outer join.
You can also specify which columns are used as join keys or simply join on all common columns.
-
Group and aggregate records to generate summary data.