noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

California State University,[...]

Apr 29, 2024 First-ever CSUCI CERT group completes training
United States Attorney's Office[...]

Chicago Man Sentenced to 4.5 Years in Prison for Pandemic Unemployment[...]
MoDOT - Missouri Department of[...]

County Road 589 in Stoddard County Closed for Culvert Replacement

Finance

Oracle Corporation

04/16/2024 | Press release | Distributed by Public on 04/16/2024 12:12

Bike sharing demand forecasting using OCI Accelerated Data Science

The challenge

Predicting bike-sharing demand is difficult. Factors such as weather, holidays, events, and even day-to-day fluctuations in commuter behavior can significantly impact rental patterns. Traditional forecasting methods often fall short, leaving operators grappling with oversupply or undersupply, leading to inefficient resource allocation. This blog post demonstrates the utilization of Oracle Cloud Infrastructure (OCI) Data Science and the Accelerated Data Science (ADS) library to leverage historical bike-sharing data for automated forecasting of future trends, mitigating the necessity for extensive data science or machine learning expertise.

Our example use case uses a dataset on Seoul bike sharing demand from the UCI Machine Learning Repository and the Forecast Operator. The Forecast Operator is a low-code tool for integrating enterprise-grade AI forecasting into your application. Here, we explore a simple use case, but this low-code tool is built for extensibility and developed in partnership with Oracle's own applications. Learn more about forecasting from our documentation.

Exploratory data analysis

The following table shows the first five rows of the dataset:

Date	Rented bike count	Hour	Temperature(°C)	Humidity(%)	Wind speed (m/s)	Visibility (10m)	Dew point temperature(°C)	Solar radiation (MJ/m2)	Rainfall(mm)	Snowfall (cm)	Seasons	Holiday	Functioning day
01/12/17	254	0	-5.2	37	2.2	2000	-17.6	0	0	0	Winter	None	Yes
01/12/17	204	1	-5.5	38	0.8	2000	-17.6	0	0	0	Winter	None	Yes
01/12/17	173	2	-6	39	1	2000	-17.7	0	0	0	Winter	None	Yes
01/12/17	107	3	-6.2	40	0.9	2000	-17.6	0	0	0	Winter	None	Yes
01/12/17	78	4	-6	36	2.3	2000	-18.6	0	0	0	Winter	None	Yes

Column Name	Description
Date	Date of the observation (DD/MM/YYYY format)
Rented bike count	Target variable: Number of bikes rented
Hour	Hour of the day (0-23)
Temperature(°C)	Temperature in degrees Celsius
Humidity(%)	Humidity in percentage
Wind speed (m/s)	Wind speed in m/s
Visibility (10m)	Visibility
Dew point temperature(°C)	Dew point temperature in degrees Celsius
Solar radiation (MJ/m2)	Solar radiation in MJ/m2
Rainfall(mm)	Amount of rainfall (mm)
Snowfall (cm)	Amount of snowfall (mm)
Seasons	Contains values : {'Autumn', 'Summer', 'Winter', 'Spring'}
Holiday	Whether the day is a holiday
Functioning day	Whether the day is a functioning day

Data preparation

Before we can start with forecasting operator, we need to get the data ready. We split the raw data into two parts: Historical and additional. The historical data stores the target variable, timestamps, and categories for past observations. The additional data contains timestamps, categories, and any other columns with values over the horizon, providing context for the forecast. We can also extract a test dataset to evaluate the accuracy of our predictions.

The following code block declares variables:

# Declaring variables for forecasting operator
data = pd.read_csv("SeoulBikeData.csv", encoding='utf-8')
timestamp_col = "Timestamp"
series_col = "City"
data[series_col] = "seoul"
target_col = "Rented Bike Count"
horizon = 24

Use the following code block to create the training, testing, and additional datasets:

# Creating historical, additional and test dataset from raw data
data[timestamp_col] = pd.to_datetime(data['Date'] + ' ' + data['Hour'].astype(str).str.zfill(2))
data.drop(['Date', 'Hour'], axis=1, inplace=True)
data.sort_values(by=timestamp_col,inplace=True)
primary_data = data[[timestamp_col, target_col, series_col]]
test_data = primary_data.iloc[-horizon:]
train_data = primary_data.iloc[:-horizon]
additional_data = data.drop([target_col], axis=1, inplace=True)

Writing out the data

# Writing back the datasets
train_data_path = "bike_data_train.csv"
test_data_path = "bike_data_test.csv"
additional_data_path = "bike_data_additional.csv"
train_data.to_csv(train_data_path, index=False)
test_data.to_csv(test_data_path, index=False)
additional_data.to_csv(additional_data_path, index=False)

While this dataset only has data for Seoul, the forecasting operator can leverage city column (target_category_columns) to generate forecasts for multiple cities simultaneously.

Setting up environment

Access the OCI Data Science from the Oracle Cloud Console and initiate a notebook session. Within the session, navigate to the environment explorer and proceed with the installation of the AI Forecasting Operator.

Figure 1: A screenshot of the Data Science environment explorer

The forecast YAML for ADS operator

The command, ads operator init -t forecast, generates baseline YAML configuration files for forecasting operator. Several configuration files that enables different ways to run your forecasting task, including running the task within containerized Data Science jobs or a Data Science job within a conda runtime environment.

Figure 2: An example of setting up the forecasting operator in bash.

This forecast.yaml file serves as a starting point for your forecasting configuration. However, you need to fill in some key details like:

The name of your datetime column
File paths for your historical, additional, and test datasets
Names of your target category columns and the target column
The forecast horizon you want

Building upon the initial configuration, we created the following forecast.yaml file for our current bike sharing demand forecasting use case:

# Updated config
kind: operator
spec:
  datetime_column:
    name: {timestamp_col}
  historical_data:
    url: {train_data_path}
  additional_data: 
    url: {additioanl_data_path}
  test_data: 
    url: {test_data_path}
  output_directory:
    url: seoul_bike/
  model: neuralprophet
  target_category_columns: [{series_col}]
  target_column: {target_col}
  horizon: {horizon}
type: forecast
version: v1

The model parameter within the forecasting operator offers configurable options for model selection. Specify the desired modeling framework directly by assigning the parameter a supported value such as prophet, arima, neuralprophet, automlx, autots, and auto. Setting this parameter to 'auto' triggers automatic selection of the most suitable framework based on the provided dataset.

Running the job

For local processing, run the command, ads operator run -f forecast.yaml -b local. Alternatively, use the parameter, --backend-config, to launch the forecasting job on other environments, such as ads operator run -f forecast.yaml --backend-config forecast_job_container_backend.yaml.

The report

When the job is completed, you can find the results in the output directory. Check out files like forecast.csv, metrics.csv, report.csv, and report.html for the details. Using the ADS's Forecasting Operator, organizations can easily implement demand forecasting scenarios with minimal effort. The following visualization, extracted from report.html, shows the predicted number of rented bike count (blue line) compared to the actual rentals (green dots) from December 11th onwards.

Figure 3: Forecasted data overlaying historical. Excerpt from the autogenerated report.

Explore OCI Data Science

Try a 30-day trial with US$300 in free credits gives you access to OCI Data Science service.

Ready to learn more about the Oracle Cloud Infrastructure Data Science service?

Configure your OCI tenancy with these setup instructions and start using OCI Data Science.
Star and clone our new GitHub repo! We've included notebook tutorials and code samples.
Visit our service documentation
Watch our tutorials on our YouTube playlist
Subscribe to our Twitter feed
Visit the Oracle Accelerated Data Science Python SDK documentation
Try one of our LiveLabs. Search for data science.

Sharing and Personal Tools

Please select the service you want to use:

Back

View original format