Oracle Corporation

04/16/2024 | Press release | Distributed by Public on 04/16/2024 12:12

Bike sharing demand forecasting using OCI Accelerated Data Science

The challenge

Predicting bike-sharing demand is difficult. Factors such as weather, holidays, events, and even day-to-day fluctuations in commuter behavior can significantly impact rental patterns. Traditional forecasting methods often fall short, leaving operators grappling with oversupply or undersupply, leading to inefficient resource allocation. This blog post demonstrates the utilization of Oracle Cloud Infrastructure (OCI) Data Science and the Accelerated Data Science (ADS) library to leverage historical bike-sharing data for automated forecasting of future trends, mitigating the necessity for extensive data science or machine learning expertise.

Our example use case uses a dataset on Seoul bike sharing demand from the UCI Machine Learning Repository and the Forecast Operator. The Forecast Operator is a low-code tool for integrating enterprise-grade AI forecasting into your application. Here, we explore a simple use case, but this low-code tool is built for extensibility and developed in partnership with Oracle's own applications. Learn more about forecasting from our documentation.

Exploratory data analysis

The following table shows the first five rows of the dataset:

Date Rented bike count Hour Temperature(°C) Humidity(%) Wind speed (m/s) Visibility (10m) Dew point temperature(°C) Solar radiation (MJ/m2) Rainfall(mm) Snowfall (cm) Seasons Holiday Functioning day
01/12/17 254 0 -5.2 37 2.2 2000 -17.6 0 0 0 Winter None Yes
01/12/17 204 1 -5.5 38 0.8 2000 -17.6 0 0 0 Winter None Yes
01/12/17 173 2 -6 39 1 2000 -17.7 0 0 0 Winter None Yes
01/12/17 107 3 -6.2 40 0.9 2000 -17.6 0 0 0 Winter None Yes
01/12/17 78 4 -6 36 2.3 2000 -18.6 0 0 0 Winter None Yes
Column Name Description
Date Date of the observation (DD/MM/YYYY format)
Rented bike count Target variable: Number of bikes rented
Hour Hour of the day (0-23)
Temperature(°C) Temperature in degrees Celsius
Humidity(%) Humidity in percentage
Wind speed (m/s) Wind speed in m/s
Visibility (10m) Visibility
Dew point temperature(°C) Dew point temperature in degrees Celsius
Solar radiation (MJ/m2) Solar radiation in MJ/m2
Rainfall(mm) Amount of rainfall (mm)
Snowfall (cm) Amount of snowfall (mm)
Seasons Contains values : {'Autumn', 'Summer', 'Winter', 'Spring'}
Holiday Whether the day is a holiday
Functioning day Whether the day is a functioning day

Data preparation

Before we can start with forecasting operator, we need to get the data ready. We split the raw data into two parts: Historical and additional. The historical data stores the target variable, timestamps, and categories for past observations. The additional data contains timestamps, categories, and any other columns with values over the horizon, providing context for the forecast. We can also extract a test dataset to evaluate the accuracy of our predictions.

The following code block declares variables:

# Declaring variables for forecasting operator
data = pd.read_csv("SeoulBikeData.csv", encoding='utf-8')
timestamp_col = "Timestamp"
series_col = "City"
data[series_col] = "seoul"
target_col = "Rented Bike Count"
horizon = 24
​​​​​

Use the following code block to create the training, testing, and additional datasets:

# Creating historical, additional and test dataset from raw data
data[timestamp_col] = pd.to_datetime(data['Date'] + ' ' + data['Hour'].astype(str).str.zfill(2))
data.drop(['Date', 'Hour'], axis=1, inplace=True)
data.sort_values(by=timestamp_col,inplace=True)
primary_data = data[[timestamp_col, target_col, series_col]]
test_data = primary_data.iloc[-horizon:]
train_data = primary_data.iloc[:-horizon]
additional_data = data.drop([target_col], axis=1, inplace=True)

Writing out the data

# Writing back the datasets
train_data_path = "bike_data_train.csv"
test_data_path = "bike_data_test.csv"
additional_data_path = "bike_data_additional.csv"
train_data.to_csv(train_data_path, index=False)
test_data.to_csv(test_data_path, index=False)
additional_data.to_csv(additional_data_path, index=False)

While this dataset only has data for Seoul, the forecasting operator can leverage city column (target_category_columns) to generate forecasts for multiple cities simultaneously.

Setting up environment

Access the OCI Data Science from the Oracle Cloud Console and initiate a notebook session. Within the session, navigate to the environment explorer and proceed with the installation of the AI Forecasting Operator.

Figure 1: A screenshot of the Data Science environment explorer

The forecast YAML for ADS operator

The command, ads operator init -t forecast, generates baseline YAML configuration files for forecasting operator. Several configuration files that enables different ways to run your forecasting task, including running the task within containerized Data Science jobs or a Data Science job within a conda runtime environment.

Figure 2: An example of setting up the forecasting operator in bash.

This forecast.yaml file serves as a starting point for your forecasting configuration. However, you need to fill in some key details like:

  • The name of your datetime column
  • File paths for your historical, additional, and test datasets
  • Names of your target category columns and the target column
  • The forecast horizon you want

Building upon the initial configuration, we created the following forecast.yaml file for our current bike sharing demand forecasting use case:

# Updated config
kind: operator
spec:
  datetime_column:
    name: {timestamp_col}
  historical_data:
    url: {train_data_path}
  additional_data: 
    url: {additioanl_data_path}
  test_data: 
    url: {test_data_path}
  output_directory:
    url: seoul_bike/
  model: neuralprophet
  target_category_columns: [{series_col}]
  target_column: {target_col}
  horizon: {horizon}
type: forecast
version: v1

The model parameter within the forecasting operator offers configurable options for model selection. Specify the desired modeling framework directly by assigning the parameter a supported value such as prophet, arima, neuralprophet, automlx, autots, and auto. Setting this parameter to 'auto' triggers automatic selection of the most suitable framework based on the provided dataset.

Running the job

For local processing, run the command, ads operator run -f forecast.yaml -b local. Alternatively, use the parameter, --backend-config, to launch the forecasting job on other environments, such as ads operator run -f forecast.yaml --backend-config forecast_job_container_backend.yaml.

The report

When the job is completed, you can find the results in the output directory. Check out files like forecast.csv, metrics.csv, report.csv, and report.html for the details. Using the ADS's Forecasting Operator, organizations can easily implement demand forecasting scenarios with minimal effort. The following visualization, extracted from report.html, shows the predicted number of rented bike count (blue line) compared to the actual rentals (green dots) from December 11th onwards.

Figure 3: Forecasted data overlaying historical. Excerpt from the autogenerated report.

Explore OCI Data Science

Try a 30-day trial with US$300 in free credits gives you access to OCI Data Science service.

Ready to learn more about the Oracle Cloud Infrastructure Data Science service?