Listen to this article

Introduction

In this article I would like to simulate how to make a correlation between air passenger information, and their hotel booking preference. I’ll show you what type of AI I used for this model, what inputs I used, the overall structure of the code, and I’ll share the results

Applicability

The applicability of this approach and these results could be many: from predicting when and how to promote certain structures and to whom, to evaluate if a certain place has all the required structures or need more, or to adapt the prices to the demand

Problem statement

Assuming that I have some passengers’ information, I would like to find relations between the inputs I have and their booking preference. In particular I would like to assess what type of hotel they will be booking, and what type of room they will require

Approach

To address the scope described above, I asked ChatGPT to generate 100000 data containing:

The airline transporting the passenger (e.g. Delta Airlines, Lufthansa, etc…)
The country of origin
The hotel budget (cost per night)
The age of the passenger
The gender of the passenger
The hotel type (Budget, Mid-Range, Luxury)
The hotel stars
The arrival day
The purpose of the travel (Leisure, Business, Conference)
The passenger’s company type (Public, Private, Self-Employed)
The number of nights reserved
The room type (Single, Double, Suite)
The number of people in the room

ChatGPT generated the data for me, with some clear patterns, and he produced a csv that I could download. The patters are sometime too obvious, but for the purpose of demonstration it is good enough. From there, I had just to analyze the file, trying to find correlations and predict them with the help of AI. The AI selected for this purpose is KMeans.

What Is KMeans Clustering?

KMeans is one of the most popular and widely used unsupervised machine learning algorithms. It’s used when you want to discover hidden groupings or segments in your data without having labeled outcomes.

The Intuition Behind KMeans

Imagine you have a dataset of thousands of travelers, and you want to group them into distinct types (e.g., business, leisure, family). You don’t know the labels in advance — but you believe there are natural patterns.

That’s exactly what KMeans does: it tries to split your data into k distinct clusters where each point belongs to the cluster with the closest mean (center).

🔄 How It Works (Step by Step)

Choose k (the number of clusters you want).
Initialize k centroids randomly in your data space.
Assign each point to the nearest centroid → these are your temporary clusters.
Recalculate the centroids as the mean of the points in each cluster.
Repeat steps 3–4 until the clusters stop changing significantly.

This is why it’s called K-Means — it groups by minimizing the distance to the mean of each cluster.

Code set up

Once the AI has been selected, I set up the code as per structure in Figure 1. The code will absolve these three functions:

Data preparation
- The code will read the data from the csv file
KMean training
- The AI will be trained based on the data selected from the csv file
- Only the most influential values will be considered
Validation
- 100 samples will be selected to verify if the AI model can make accurate predictions
- The success rate of hotel type and room type will be calculated

Figure 1 Code structure

Results

Before talking about results, I would like to share some of the details about the KMeans model. The parameters that have been used for the KMeans model are:

n_clusters=9,
init=’k-means++’.
n_init=10
max_iter=1000
random_state=10
algorithm = ‘lloyd’

Provided the number of inputs, these parameters will provide a good training base, and will provide a good foundation for predictions. Now that the KMeans is set, the input values have been selected, and the most influential values that have been found are:

The country of origin
The age of the passenger
The purpose of the travel (Leisure, Business, Conference)
The number of people in the room

After experimenting with the other parameters too, I have found that they are not bringing values, and they are lowering the accuracy of the prediction. Using too many values in-fact, is not useful, especially if they are not relevant and this is known as curse of dimensionality. With the four inputs, and the KMeans set up, the overall success score is:

✅ Hotel Type Accuracy on 100 cases: 90.0%

✅ Room Type Accuracy on 100 cases: 98.0%

This means almost perfect prediction of the room type, and excellent prediction of the hotel type. Here is an extract of some of the results, where the only error is on the prediction of the hotel type of a person coming from Germany with two other people, and is there for a conference. The prediction is for a luxury hotel, but the actual selection is mid-range.

Age	Purpose_of_Travel	Country_of_Origin	People_in_Room	Actual_Hotel_Type	Predicted_Hotel_Type	Actual_Room_Type	Predicted_Room_Type	Hotel_Correct	Room_Correct
25	Leisure	India	3	Luxury	Luxury	Suite	Suite	TRUE	TRUE
32	Conference	Germany	3	Mid-range	Luxury	Suite	Suite	FALSE	TRUE
36	Business	Germany	1	Budget	Budget	Single	Single	TRUE	TRUE
38	Leisure	India	3	Luxury	Luxury	Suite	Suite	TRUE	TRUE
34	Business	India	1	Budget	Budget	Single	Single	TRUE	TRUE
25	Business	Canada	1	Budget	Budget	Single	Single	TRUE	TRUE
51	Business	Germany	1	Budget	Budget	Single	Single	TRUE	TRUE
35	Leisure	USA	3	Luxury	Luxury	Suite	Suite	TRUE	TRUE
32	Leisure	Germany	3	Luxury	Luxury	Suite	Suite	TRUE	TRUE
29	Leisure	Canada	3	Luxury	Luxury	Suite	Suite	TRUE	TRUE

Table 1 Extract of KMeans predictions

Conclusions

The implementation of KMeans for the detection of patterns, was a good choice as it could achieve very good overall results. To achieve those results, some tuning was necessary, and some of the inputs have been excluded to avoid curse of dimensionality. Once the AI was set up, the code was running very fast, achieving the results in less than one minute.

Provided the ease if implementation and the great results achieved, the applicability of this feature could bring a lot of value in several areas. The thing to consider though is the quality of the data, as not all the data will bring the same results. Tuning and expertise will be required to achieve excellent results as in this example.

Copyright

Author: Simone Togni

Platform: aisciencetalk.blog

AI correlation between air passengers and hotel reservations

Introduction

Applicability

Problem statement

Approach

What Is KMeans Clustering?

The Intuition Behind KMeans

🔄 How It Works (Step by Step)

Code set up

Results

Conclusions

Copyright

Like this:

Related

Leave a ReplyCancel reply

Introduction

Applicability

Problem statement

Approach

What Is KMeans Clustering?

The Intuition Behind KMeans

🔄 How It Works (Step by Step)

Code set up

Results

Conclusions

Copyright

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from AI Science Talk Blog