### Building an itinerary - enter in the activity
Lets think of a new node type, Activity. An Activity has a schema that looks like the following (this was generated via ai prompt):
Activity {
id: string
name: string
description: string
location: LocationNode | string // link to location node
start_coordinates: { lat: number; lng: number } // for map-based queries
duration_minutes: number // key for time-based filters
type: "big water tour" | "land based tour" | "bike tour" | "aerial" | ...
activity_level: "strenuous" | "leisurely" | "casual" | "relaxed"
cost_min: number
cost_max: number
tags: string[] // e.g., ["sunset", "nature", "wildlife", "family-friendly"]
seasonality: string[] // ["summer", "fall"], or a date range
group_size_min: number
group_size_max: number
languages_offered: string[] // useful for international users
accessibility_features: string[] // e.g., ["wheelchair_accessible"]
vendor_rating: number // average rating from reviews
cancellation_policy: "strict" | "moderate" | "flexible"
booking_volume: number // could be a rough proxy for popularity
image_url: string
}
Using the start_coordinates
of an Activity
we can find Activities that are close to our candidate Destination
from above.
Making a trip recommendation - approach 1:
This is great if we already have booking data about what tours our travelers have taken, however, we are starting with disconnected travelers / trips and activities. How do we combine both datasets into a functioning workflow that can suggest tours for our travelers
We will likely need to use location as the connective tissue between destinations and tours.
Then we might want to factor in latent features from our embeddings, i.e. whether someone is young and vagabonding and interested in doing risky shit like skydiving. For now, we lack insights on our fake users to do something cool so i will keep it stupid simple. I will fuzzy match activities to destinations and create relationships wherever there are matches in order to do proximity based recommendations.
Recommending naively
First things first, we need to fuzzy match our destination nodes to our newly generated activity nodes. Fuzzy matching is great because we don't need to rely on inference to do some weird and expensive check to see if an activity takes place at a specific destination. It lets us count non exact matches. For every activity node in the db, I run a script that does the following:
def create_activity_destination_relationships():
db = MemgraphDriver()
query = """
MATCH (a:Activity), (d:Destination)
WHERE
// Exact match
a.location = d.name
OR
// Case-insensitive match
toLower(a.location) = toLower(d.name)
OR
// Match without country suffix
toLower(a.location) = toLower(split(d.name, ',')[0])
OR
// Match country names
toLower(a.location) = toLower(split(d.name, ',')[1])
OR
// Match without spaces and special characters
replace(replace(toLower(a.location), ' ', ''), '-', '') = replace(replace(toLower(d.name), ' ', ''), '-', '')
MERGE (a)-[:AT_LOCATION]->(d)
"""
db.execute_query(query, {})
This creates a unidirectional edge AT_LOCATION
between our destinations and our trips. We will use this to create our first naive recommendations. Purely based off of location matching, no fancy shit yet.
Once we have a node id, we can find similar travelers, then grab their trips like so:
# Starting with a node_id, find the most similar nodes using cosine similarity from our earlier embeddings
# then find the most similar from those
def recommend_itineraries(node_id:str, collection_name:str):
qdrant_client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
results = qdrant_client.query_points(
collection_name=collection_name,
query=node_id
)
# Only use the top 3 results in order to avoid too many options.
ids = [point.id for point in results.points][:3]
db = MemgraphDriver()
query = """
MATCH (trav:Traveler)
WHERE trav.id in $ids
OPTIONAL MATCH (trav)-[:TOOK]->(trip:Trip)
OPTIONAL MATCH (trip)-[:AT_DESTINATION]->(dest:Destination)
OPTIONAL MATCH (trip)-[:STAYED_IN]->(acc:Accommodation)
OPTIONAL MATCH (trip)-[:TRAVELED_BY]->(trans:Transportation)
OPTIONAL MATCH (dest)-[:AT_LOCATION]-(activity:Activity)
WITH trav, trip, dest, acc, trans, activity
ORDER BY trip.startDate
RETURN
trav.id as traveler_id,
collect(DISTINCT {
trip_id: trip.id,
start_date: trip.startDate,
end_date: trip.endDate,
duration: trip.duration,
destination: dest.name,
accommodation: {
type: acc.type,
cost: trip.accommodationCost
},
transportation: {
type: trans.type,
cost: trip.transportationCost
},
activities: activity
}) as itinerary
"""
result = db.execute_query(query, {"ids": ids})
itineraries = []
for record in result:
itineraries.append(record)
return itineraries
Running this script will net us a response like this:
<Record traveler_id='e77e92cf-feb8-49da-a6ca-802654558bf9' itinerary=[{'accommodation': {'cost': 800.0, 'type': 'Airbnb'}, 'activities': <Node element_id='640' labels=frozenset({'Activity'}) properties={'accessibility_features': ['[]'], 'activity_level': 'Moderate', 'booking_volume': 2000, 'cancellation_policy': 'Strict', 'cost_max': 120, 'cost_min': 60, 'description': 'Experience the vibrant Carnival atmosphere', 'duration_minutes': 240, 'group_size_max': 15, 'group_size_min': 1, 'id': '51', 'image_url': 'https://images.unsplash.com/photo-1483729558449-99ef09a8c325', 'languages_offered': ['["English"', '"Portuguese"]'], 'location': 'Rio de Janeiro', 'name': 'Rio Carnival Experience', 'seasonality': ['["summer"]'], 'start_coordinates': '{"lat": -22.9068, "lng": -43.1729}', 'tags': ['["cultural"', '"music"', '"dance"]'], 'type': 'Cultural', 'vendor_rating': '4.9'}>, 'destination': 'Rio de Janeiro, Brazil', 'duration': 9, 'end_date': neo4j.time.Date(2024, 1, 15), 'start_date': neo4j.time.Date(2024, 1, 15), 'transportation': {'cost': 150.0, 'type': 'Train'}, 'trip_id': 530813}]>
So for our traveler Jessica Chen, who took a trip to Rome for 7 days
A similar traveler took a trip to Rio de Janeiro, Brazil for 9 days, spent basically the same amount of money and had several of the same travel preferences along the way.
Granted Rio is vastly different from Rome (in both culture and for planning a trip), the two trips are both international - and provide many similarities. We also now have things to do!
Looking forward
Looking forward, we should probably consider ways to learn more about the latent features of our travelers prior to recommending tours. In a similar vein to how we generated this knowledge graph we can continue to simulate user data with llm's that gives us more in terms of preferences (activity level, geographic preferences including past tours taken) in order to see if we can make our recommender less naive.
Also handling seasonality, cost of tours, and potentially recommending group itineraries might give us some interesting challenges to try and tackle!