04/16/2024 | Press release | Distributed by Public on 04/16/2024 12:12
Predicting bike-sharing demand is difficult. Factors such as weather, holidays, events, and even day-to-day fluctuations in commuter behavior can significantly impact rental patterns. Traditional forecasting methods often fall short, leaving operators grappling with oversupply or undersupply, leading to inefficient resource allocation. This blog post demonstrates the utilization of Oracle Cloud Infrastructure (OCI) Data Science and the Accelerated Data Science (ADS) library to leverage historical bike-sharing data for automated forecasting of future trends, mitigating the necessity for extensive data science or machine learning expertise.
Our example use case uses a dataset on Seoul bike sharing demand from the UCI Machine Learning Repository and the Forecast Operator. The Forecast Operator is a low-code tool for integrating enterprise-grade AI forecasting into your application. Here, we explore a simple use case, but this low-code tool is built for extensibility and developed in partnership with Oracle's own applications. Learn more about forecasting from our documentation.
The following table shows the first five rows of the dataset:
Date | Rented bike count | Hour | Temperature(°C) | Humidity(%) | Wind speed (m/s) | Visibility (10m) | Dew point temperature(°C) | Solar radiation (MJ/m2) | Rainfall(mm) | Snowfall (cm) | Seasons | Holiday | Functioning day |
01/12/17 | 254 | 0 | -5.2 | 37 | 2.2 | 2000 | -17.6 | 0 | 0 | 0 | Winter | None | Yes |
01/12/17 | 204 | 1 | -5.5 | 38 | 0.8 | 2000 | -17.6 | 0 | 0 | 0 | Winter | None | Yes |
01/12/17 | 173 | 2 | -6 | 39 | 1 | 2000 | -17.7 | 0 | 0 | 0 | Winter | None | Yes |
01/12/17 | 107 | 3 | -6.2 | 40 | 0.9 | 2000 | -17.6 | 0 | 0 | 0 | Winter | None | Yes |
01/12/17 | 78 | 4 | -6 | 36 | 2.3 | 2000 | -18.6 | 0 | 0 | 0 | Winter | None | Yes |
Column Name | Description |
Date | Date of the observation (DD/MM/YYYY format) |
Rented bike count | Target variable: Number of bikes rented |
Hour | Hour of the day (0-23) |
Temperature(°C) | Temperature in degrees Celsius |
Humidity(%) | Humidity in percentage |
Wind speed (m/s) | Wind speed in m/s |
Visibility (10m) | Visibility |
Dew point temperature(°C) | Dew point temperature in degrees Celsius |
Solar radiation (MJ/m2) | Solar radiation in MJ/m2 |
Rainfall(mm) | Amount of rainfall (mm) |
Snowfall (cm) | Amount of snowfall (mm) |
Seasons | Contains values : {'Autumn', 'Summer', 'Winter', 'Spring'} |
Holiday | Whether the day is a holiday |
Functioning day | Whether the day is a functioning day |
Before we can start with forecasting operator, we need to get the data ready. We split the raw data into two parts: Historical and additional. The historical data stores the target variable, timestamps, and categories for past observations. The additional data contains timestamps, categories, and any other columns with values over the horizon, providing context for the forecast. We can also extract a test dataset to evaluate the accuracy of our predictions.
The following code block declares variables:
# Declaring variables for forecasting operator data = pd.read_csv("SeoulBikeData.csv", encoding='utf-8') timestamp_col = "Timestamp" series_col = "City" data[series_col] = "seoul" target_col = "Rented Bike Count" horizon = 24
Use the following code block to create the training, testing, and additional datasets:
# Creating historical, additional and test dataset from raw data data[timestamp_col] = pd.to_datetime(data['Date'] + ' ' + data['Hour'].astype(str).str.zfill(2)) data.drop(['Date', 'Hour'], axis=1, inplace=True) data.sort_values(by=timestamp_col,inplace=True) primary_data = data[[timestamp_col, target_col, series_col]] test_data = primary_data.iloc[-horizon:] train_data = primary_data.iloc[:-horizon] additional_data = data.drop([target_col], axis=1, inplace=True)
Writing out the data
# Writing back the datasets train_data_path = "bike_data_train.csv" test_data_path = "bike_data_test.csv" additional_data_path = "bike_data_additional.csv" train_data.to_csv(train_data_path, index=False) test_data.to_csv(test_data_path, index=False) additional_data.to_csv(additional_data_path, index=False)
While this dataset only has data for Seoul, the forecasting operator can leverage city column (target_category_columns) to generate forecasts for multiple cities simultaneously.
Access the OCI Data Science from the Oracle Cloud Console and initiate a notebook session. Within the session, navigate to the environment explorer and proceed with the installation of the AI Forecasting Operator.
Figure 1: A screenshot of the Data Science environment explorerThe command, ads operator init -t forecast, generates baseline YAML configuration files for forecasting operator. Several configuration files that enables different ways to run your forecasting task, including running the task within containerized Data Science jobs or a Data Science job within a conda runtime environment.
Figure 2: An example of setting up the forecasting operator in bash.This forecast.yaml file serves as a starting point for your forecasting configuration. However, you need to fill in some key details like:
Building upon the initial configuration, we created the following forecast.yaml file for our current bike sharing demand forecasting use case:
# Updated config kind: operator spec: datetime_column: name: {timestamp_col} historical_data: url: {train_data_path} additional_data: url: {additioanl_data_path} test_data: url: {test_data_path} output_directory: url: seoul_bike/ model: neuralprophet target_category_columns: [{series_col}] target_column: {target_col} horizon: {horizon} type: forecast version: v1
The model parameter within the forecasting operator offers configurable options for model selection. Specify the desired modeling framework directly by assigning the parameter a supported value such as prophet, arima, neuralprophet, automlx, autots, and auto. Setting this parameter to 'auto' triggers automatic selection of the most suitable framework based on the provided dataset.
For local processing, run the command, ads operator run -f forecast.yaml -b local. Alternatively, use the parameter, --backend-config, to launch the forecasting job on other environments, such as ads operator run -f forecast.yaml --backend-config forecast_job_container_backend.yaml.
When the job is completed, you can find the results in the output directory. Check out files like forecast.csv, metrics.csv, report.csv, and report.html for the details. Using the ADS's Forecasting Operator, organizations can easily implement demand forecasting scenarios with minimal effort. The following visualization, extracted from report.html, shows the predicted number of rented bike count (blue line) compared to the actual rentals (green dots) from December 11th onwards.
Figure 3: Forecasted data overlaying historical. Excerpt from the autogenerated report.Try a 30-day trial with US$300 in free credits gives you access to OCI Data Science service.
Ready to learn more about the Oracle Cloud Infrastructure Data Science service?