AI correlation between air passengers and hotel reservations

Listen to this article

Introduction

In this article I would like to simulate how to make a correlation between air passenger information, and their hotel booking preference. I’ll show you what type of AI I used for this model, what inputs I used, the overall structure of the code, and I’ll share the results


Applicability

The applicability of this approach and these results could be many: from predicting when and how to promote certain structures and to whom, to evaluate if a certain place has all the required structures or need more, or to adapt the prices to the demand


Problem statement

Assuming that I have some passengers’ information, I would like to find relations between the inputs I have and their booking preference. In particular I would like to assess what type of hotel they will be booking, and what type of room they will require


Approach

To address the scope described above, I asked ChatGPT to generate 100000 data containing:

  • The airline transporting the passenger (e.g. Delta Airlines, Lufthansa, etc…)
  • The country of origin
  • The hotel budget (cost per night)
  • The age of the passenger
  • The gender of the passenger
  • The hotel type (Budget, Mid-Range, Luxury)
  • The hotel stars
  • The arrival day
  • The purpose of the travel (Leisure, Business, Conference)
  • The passenger’s company type (Public, Private, Self-Employed)
  • The number of nights reserved
  • The room type (Single, Double, Suite)
  • The number of people in the room

ChatGPT generated the data for me, with some clear patterns, and he produced a csv that I could download. The patters are sometime too obvious, but for the purpose of demonstration it is good enough. From there, I had just to analyze the file, trying to find correlations and predict them with the help of AI. The AI selected for this purpose is KMeans.

What Is KMeans Clustering?

KMeans is one of the most popular and widely used unsupervised machine learning algorithms. It’s used when you want to discover hidden groupings or segments in your data without having labeled outcomes.

The Intuition Behind KMeans

Imagine you have a dataset of thousands of travelers, and you want to group them into distinct types (e.g., business, leisure, family). You don’t know the labels in advance — but you believe there are natural patterns.

That’s exactly what KMeans does: it tries to split your data into k distinct clusters where each point belongs to the cluster with the closest mean (center).

🔄 How It Works (Step by Step)

  1. Choose k (the number of clusters you want).
  2. Initialize k centroids randomly in your data space.
  3. Assign each point to the nearest centroid → these are your temporary clusters.
  4. Recalculate the centroids as the mean of the points in each cluster.
  5. Repeat steps 3–4 until the clusters stop changing significantly.

This is why it’s called K-Means — it groups by minimizing the distance to the mean of each cluster.

Code set up

Once the AI has been selected, I set up the code as per structure in Figure 1. The code will absolve these three functions:

  1. Data preparation
    • The code will read the data from the csv file
  2. KMean training
    • The AI will be trained based on the data selected from the csv file
    • Only the most influential values will be considered
  3. Validation
    • 100 samples will be selected to verify if the AI model can make accurate predictions
    • The success rate of hotel type and room type will be calculated

    Figure 1 Code structure


    Results

    Before talking about results, I would like to share some of the details about the KMeans model. The parameters that have been used for the KMeans model are:

    n_clusters=9,
    init=’k-means++’.
    n_init=10
    max_iter=1000
    random_state=10
    algorithm = ‘lloyd’

    Provided the number of inputs, these parameters will provide a good training base, and will provide a good foundation for predictions. Now that the KMeans is set, the input values have been selected, and the most influential values that have been found are:

    • The country of origin
    • The age of the passenger
    • The purpose of the travel (Leisure, Business, Conference)
    • The number of people in the room

    After experimenting with the other parameters too, I have found that they are not bringing values, and they are lowering the accuracy of the prediction. Using too many values in-fact, is not useful, especially if they are not relevant and this is known as curse of dimensionality. With the four inputs, and the KMeans set up, the overall success score is:

    ✅ Hotel Type Accuracy on 100 cases: 90.0%

    ✅ Room Type Accuracy on 100 cases: 98.0%

    This means almost perfect prediction of the room type, and excellent prediction of the hotel type. Here is an extract of some of the results, where the only error is on the prediction of the hotel type of a person coming from Germany with two other people, and is there for a conference. The prediction is for a luxury hotel, but the actual selection is mid-range.

    AgePurpose_of_TravelCountry_of_OriginPeople_in_RoomActual_Hotel_TypePredicted_Hotel_TypeActual_Room_TypePredicted_Room_TypeHotel_CorrectRoom_Correct
    25LeisureIndia3LuxuryLuxurySuiteSuiteTRUETRUE
    32ConferenceGermany3Mid-rangeLuxurySuiteSuiteFALSETRUE
    36BusinessGermany1BudgetBudgetSingleSingleTRUETRUE
    38LeisureIndia3LuxuryLuxurySuiteSuiteTRUETRUE
    34BusinessIndia1BudgetBudgetSingleSingleTRUETRUE
    25BusinessCanada1BudgetBudgetSingleSingleTRUETRUE
    51BusinessGermany1BudgetBudgetSingleSingleTRUETRUE
    35LeisureUSA3LuxuryLuxurySuiteSuiteTRUETRUE
    32LeisureGermany3LuxuryLuxurySuiteSuiteTRUETRUE
    29LeisureCanada3LuxuryLuxurySuiteSuiteTRUETRUE

    Table 1 Extract of KMeans predictions


    Conclusions

    The implementation of KMeans for the detection of patterns, was a good choice as it could achieve very good overall results. To achieve those results, some tuning was necessary, and some of the inputs have been excluded to avoid curse of dimensionality. Once the AI was set up, the code was running very fast, achieving the results in less than one minute.

    Provided the ease if implementation and the great results achieved, the applicability of this feature could bring a lot of value in several areas. The thing to consider though is the quality of the data, as not all the data will bring the same results. Tuning and expertise will be required to achieve excellent results as in this example.

    Copyright

    Author: Simone Togni

    Platform: aisciencetalk.blog

    Leave a Reply

    Scroll to Top

    Discover more from AI Science Talk Blog

    Subscribe now to keep reading and get access to the full archive.

    Continue reading