When you are developing your model, you can use Workspaces to quickly execute code, see outputs, and make iterative improvements.
You previously learned how to start a Workspace and explored basic options.
In this topic, you will use your workspace to load, explore, and transform data. After the data has been prepared, you will train a model.
-
Click the browser tab to return to the Workspace.
-
Click New > Live Script to create a MATLAB Live Script.
-
Go to Save > Save As… to save it as
ber_hot_weather.mlx
. -
Copy and paste the following command:
opts = detectImportOptions('tegel.csv');
-
Click Run.
-
Copy and paste the following commands to load
tegel.csv
into MATLAB. Then, click Run. This loads the data using thereadtable()
function, giving it the import options object as a second argument.opts.SelectedVariableNames = {'DATE', 'PRCP', 'TMIN', 'TMAX'}; opts = setvartype(opts, {'DATE','PRCP','TMIN','TMAX'},{'datetime','double', 'double', 'double'}); berWeatherTbl = readtable("tegel.csv", opts); head(berWeatherTbl)
These commands loaded the following columns and assigned data types to them.
-
Date – date the temperature was read
-
PRCP – total precipitation for the day
-
TMIN – lowest temperature measured that day
-
TMAX – highest temperature measured that day
The result will look similar to the following:
-
-
Click Section Break to create a new section in your script.
NoteYour cursor must be at the end of the previous section. You might have to go to the Insert tab to find Section Break.
-
Copy and paste the following command to format the dates into a Year/Month/Day format and store each field as a table variable. This helps you examine the data.
[berWeatherTbl.year, berWeatherTbl.month, berWeatherTbl.day] = ymd(berWeatherTbl.DATE);
-
Copy and paste the following command to limit the dataset to temperatures between January 2000 and December 2019, inclusive, by removing rows with data outside this range. This speeds data processing.
berWeatherTbl = berWeatherTbl(berWeatherTbl.year > 1999 & berWeatherTbl.year < max(berWeatherTbl.year) , :);
-
Copy and paste the following commands to divide all temperature data by 10 to get the temperatures in full Celsius degrees. You might have noticed that temperatures in the TMAX or TMIN columns look a bit odd. This is because NOAA uses a temperature format consisting of a tenth-of-a-degree in the Celsius scale.
berWeatherTbl.TMAX = berWeatherTbl.TMAX/10; berWeatherTbl.TMIN = berWeatherTbl.TMIN/10;
-
Copy and paste the following command to complete missing data with interpolated information.
berWeatherTbl = fillmissing(berWeatherTbl, 'linear');
-
Copy and paste the following
head()
function to preview the start of the table.head(berWeatherTbl)
-
Click Run. The result looks like the following:
ImportantClick Save occasionally to save your script.
-
-
Click Section Break to create another section in your Live Script. In this section, you’ll calculate how many hot days have occurred in Berlin since the year 2000. To calculate this, copy and paste the following command to define a hot day as 29 degrees Celsius for the baseline threshold.
hotDayThreshold = 29;
-
Copy and paste the following command to calculate how many hot days have occurred since (and including) the year 2000. This command creates a table column indexing the days with maximum temperatures (TMAX) that meet or exceed the hot day threshold.
berWeatherTbl.HotDayFlag = berWeatherTbl.TMAX >= hotDayThreshold;
-
Copy and paste the following command to use
groupsummary()
to count how many hot days were flagged:numHotDaysPerYear = groupsummary(berWeatherTbl, 'year', 'sum', 'HotDayFlag');
-
Copy and paste the following command to repeat the same approach to find the highest temperature of each year:
maxTempOfYear = groupsummary(berWeatherTbl, 'year', 'max', 'TMAX');
-
Copy and paste the following command to combine the variables to create a table named
annualMaxTbl
:annualMaxTbl = join(numHotDaysPerYear, maxTempOfYear); annualMaxTbl.Properties.VariableNames = {'Year', 'daysInYear', 'hotDayCount', 'maxTemp'}; annualMaxTbl
-
Click Run Section. The table looks like the following:
-
-
Click Section Break to create another section in your Live Script. In this section, you’ll visualize the weather data using a chart with that combines a bar graph and line graph. The chart will use two y-axes.
The bar graph will represent the hot day count (for a given year), and the line graph will represent the highest annual temperature (in Celsius, for a given year). The y-axis on the left side of the chart will correspond to the hot day count, and the y-axis on the right side of the chart will correspond to the highest annual temperature.
-
Copy and paste the following to create a hot day count bar graph.
figure hold on yyaxis left bar(annualMaxTbl.Year, annualMaxTbl.hotDayCount, 'FaceColor', 'b');
-
Copy and paste the following to add a title and labels to the x-axis and left side y-axis.
titleText = sprintf("%s%d%s%d%s%d", "Number of hot days (over ", hotDayThreshold,"\circC) - ", min(annualMaxTbl.Year), "-", max(annualMaxTbl.Year)); title(titleText) ylabel("Hot days per year") xlabel("Year")
-
Copy and paste the following to draw the line plot for the highest temperature each year.
yyaxis right ylabel("Highest Annual Temperature in \circC") plot(annualMaxTbl.Year, annualMaxTbl.maxTemp, 'Color', 'r', "Marker","*") hold off
-
Click Run Section. Your chart should look something like the following:
-
In this section, you will use an interactive machine-learning MATLAB application called Regression Learner to develop a model that can predict the weather for the next 20 days.
1: Partition the data
You must partition the data that will be used with Regression Learner into the following sets:
-
Data to train the model
-
Data to test the model
-
Click Section Break in the Live Script and copy and paste the following code to remove the
HotDayFlag
column.berWeatherTbl.HotDayFlag = [];
-
Copy and paste the following to partition the data.
cv = cvpartition(berWeatherTbl.year, 'Holdout', 0.3); dataTrain = berWeatherTbl(cv.training, :); dataTest = berWeatherTbl(cv.test, :);
-
Click Run Section.
-
2: Train the model
-
Click the APPS tab and click Regression Learner app. If you do not see the Regression Learner app, click the arrow to expand the full app list.
ImportantIf Regression Learner is not in the apps list, contact your IT team or your MathWorks account manager for assistance.
The application opens in a new window.
-
Click New Session and select From Workspace.
-
Use the New Session window to specify the input variables for predictions in your model, as well as the outputs (or responses) you want to predict. In this tutorial, the output is the maximum temperature.
-
For the input variable, from the Workspace Variable list, select dataTrain.
-
For the output, in the Response section, select TMAX (maximum temperature).
-
Select PRCP, TMIN, year, month, and day in the Predictors section.
-
Click Start Session. The Regression Learner window refreshes and shows the original data set and the values of TMAX.
-
-
Select the type of model to be used for model training. For this tutorial, select Coarse Tree.
-
Regression Learner runs best on a container with multiple cores because it can run in parallel and produce models rapidly. If you are using a single-core container, click Use Parallel in Regression Learner to turn off parallel processing.
-
Click Train to start the model training process. The Domino container spins up a parallel pool which is a method to optimize the model training.
-
-
Select Fine Gaussian SVM to compare the results to Coarse Tree. You can select additional models or even select all models and compare the results to identify the best fit for your data.
-
Click the arrow to access the model types.
-
Click Train. The model list automatically selects the model that best fits the data. Several visualizations are shown to demonstrate this.
-
Click Predicted vs. Actual Plot open a chart that shows how many predictions the model made that fit correct values in the data. The closer the predictions are to the diagonal, the better the predictions.
-
-
Click Generate Function to use Regression Learner to create a function that will be used to deploy the model with Domino. MATLAB generates the function in an M-file. Click Save to save the file as
trainRegressionModel.m
.
3: Export the model
-
Click Export Model to export the model to your Domino workspace so you can use it for predictions.
NoteIf you cannot find the Export button, you might have to switch back to the Regression Learner. See the following images for guidance.
-
Type a name for the model, such as weatherModel, and click OK.
-
Close the Regression Learner app and you can see the trained model in your workspace.
Notice that the Command Window shows information about how to use the model to make predictions with the following line of code:
yFit = weatherModel.predictFcn(T);
If you input a table of data, this line of code will output a prediction (as a table). The input table must include data organized like the data you used in
berWeatherTbl
– date, precipitation, minimum temperature, month, day and year. It must not include TMAX, as that value will be predicted. The model will predict the TMAX value and include it inyFit
.
Step 4: Test the model
-
To test the model with the data you partitioned earlier, create a Section Break in your Live Script (
ber_hot_weather.mlx
). Copy and paste the following to use the model with the test data and the function call that was listed in the Command Window.yFit = weatherModel.predictFcn(dataTest);
-
Copy and paste the following to compare the results column to the actual values in the test data set.
err = yFit - dataTest.TMAX;
-
Copy and paste the following to draw a histogram to visualize the results.
figure; histogram(err) xlim([-15 15]) ylabel('Number of predictions'); xlabel('Gap with actual test data')
-
Click Run Section. The result looks like the following:
-
To save the working model to be used later, copy and paste the following in the Command Window and then press Enter.
save weatherModel weatherModel
5: Make predictions
You can use the model to predict the weather for next year. You’ll generate a table with next year’s dates and add randomly selected, historical precipitation and minimum temperature data to the table for those dates. This information helps the model make proper predictions.
-
Create a new Section Break in your Live Script.
-
Copy and paste the following to create a table with date and temperature input data.
todayDate = datetime('today'); daysIntoFuture = 365; endDate = todayDate + days(daysIntoFuture); predictedMaxTemps = table('Size', [daysIntoFuture+1 7], 'VariableTypes', {'datetime', 'double', 'double', 'double', 'double', 'double', 'double'}, 'VariableNames', berWeatherTbl.Properties.VariableNames); x=1;
-
Copy and paste the following to loop through the next 20 days and populate the table.
for i=todayDate:endDate [y, m, d] = ymd(i); minTemps = berWeatherTbl.TMIN(berWeatherTbl.month == m & berWeatherTbl.day == d); prcps = berWeatherTbl.PRCP(berWeatherTbl.month == m & berWeatherTbl.day == d); curMinTemp = NaN; [historicalRowCount z] = size(minTemps); randomRow = randi([1 historicalRowCount]); curMinTemp = minTemps(randomRow); predictedMaxTemps.TMIN(x) = curMinTemp; randomRow = randi([1 historicalRowCount]); predictedMaxTemps.PRCP(x) = prcps(randomRow); predictedMaxTemps.DATE(x) = i; predictedMaxTemps.year(x) = y; predictedMaxTemps.month(x) = m; predictedMaxTemps.day(x) = d; predictedMaxTemps.TMAX(x) = 0; x = x+1; end head(predictedMaxTemps)
-
Click Run Section.
The result is a preview of the table with historical weather data that you can use for weather predictions. The predictions will be listed in the TMAX column of the table after the table is run through the model.
-
To run the model, copy and paste the following into a new Section Break and run the Section.
yFit = weatherModel.predictFcn(predictedMaxTemps); result = table(predictedMaxTemps.DATE, yFit, 'VariableNames', {'Date', 'Predicted TMAX'})
The following is an AI-driven weather prediction.
-
Copy and paste the following code, and then Run Section to draw this in another plot and count how many hot days will be forecasted:
figure plot(result.Date, result.("Predicted TMAX")) titleText = sprintf("%s%d%s", "Weather forecast for the next ", daysIntoFuture, " days in Berlin, Germany (\circC)"); title(titleText) ylabel('Forecasted Daily High Temperature')
-
Copy and paste the following code, and then Run Section to predict how many hot days will happen during the next year.
hotWeatherDaysIdx = result(result.("Predicted TMAX") > hotDayThreshold, :); height(hotWeatherDaysIdx)
The result on January 24, 2022 was a prediction of 0 hot days between January 2022 and February 2022. The results will vary based on the dates, data, and model used.
-
To export your model, in the Command Window, type the following to save it into a MAT file:
save weatherModel weatherModel
Anyone in your Domino project can load it later with the following command:
load weathermodel.mat