The Tourist’s Image of the City: A comparative analysis of the visual features and textual themes of interest across three global metropolises

Tourist attractions play a major role in shaping ‘mental images’ of cities. The growing availability of urban big-data in recent years has opened up novel lines of inquiry into the nuances of urban imageability and sentiment. Drawing upon crowdsourced hybrid data in the form of both textual descriptions as well as photographs for 750 tourist attractions across Boston, Singapore and Sydney, this work compares the predominant themes of discussion and visual features of interest that shape tourist sentiment towards
these cities. The study collects over 3500 user reviews and uses Latent Dirichlet Allocation (LDA) for the extraction of high-level topics of discussion. Object detection is also run on over 6000 photographs, and unsupervised clustering is carried out on extracted features to identify clusters of visual elements which capture tourist attention. The findings reinforce the popular identity of Boston as a city steeped in history, while strong perceptions of nature and greenery emerge from Singapore. Tourist interest in Sydney is dominated by specific anchors such as the Sydney Harbor Bridge.

This work was published by the Iberoamerican Society of Digital Graphics (SIGraDi). Read the full paper here:

https://cat2.mit.edu/arc/education/resource_thesis/rohit_sigradi22_camera_ready_paper.pdf


DATA COLLECTION

Data for points of interest listed as ‘tourist attraction’ was programmatically collected across the three cities using the Google Places API. The first round of data collection involved the collection of basic place details using the ‘Nearby Search’ API call. Query locations and radii were set at regular intervals across
the cities, and places data queried from each location. The data included attributes such as location, place name place id, rating, total user ratings and the like. Next, for each of the places now listed in the dataset, 5 top reviews and 10 photograph ids were programmatically collected using the ‘Place Details’ API call. For each review, the query returned attributes such as author name, language, rating, review text and a timestamp. For each photo id, the query also returned author name, timestamp and the photo url. The final step in the data collection pipeline was to programmatically download each photograph using the photo url, and to store the files with meaningful filenames corresponding to the photo ids recorded in the dataset.

SENTIMENT ANALYSIS

The first step in the analysis pipeline was the extraction of overall sentiment from each review. This was carried out in Python, using the NLTK Vader pretrained sentiment analyzer (Hutto and Gilbert 2014). This resulted in a single compound sentiment score (between -1 and +1) being associated with each review. While such a method of sentiment analysis does not yield high level understandings of the content of the reviews themselves, it is nevertheless an effective tool for grading the reviews on a linear scale, and thus separating the negative, positive and neutral reviews. The compound scores were then aggregated by location (tourist attraction), and visualized on a map to check for possible spatial dimensions to sentiment distribution.

Overall compound sentiment scores for tourist locations across the three cities. Size indicates number of ratings (used here as a proxy for number of visits)

TOPIC MODELING

Extraction of overall sentiment, while useful for discriminating between positive and negative reviews, does not provide us with insights into the themes of discussion associated with each review. For this, an unsupervised Latent Dirichlet Allocation (LDA) (Blei et al. 2003) model was implemented on the city-specific datasets, in order to extract the most frequently occurring high level topics that were emerging from the reviews. An LDA approach was preferred to clustering, since a single review could contain more than one topic. Stop words were first removed from each review text, and the remaining words (tokens) were lemmatized (grouping together of inflected forms so that they can be analyzed as a single item). These words were converted into a dictionary of tokens which defined the feature space of the reviews themselves. The reviews were then encoded as bags of words (frequencies of each word in the dictionary). These formed the input vectors for the LDA model.

Based on iterative testing and evaluation, the 10 most common topic-clusters were extracted for each city using LDA (Fig. 2). The clusters were in the form of collections of individual tokens, and corresponding coefficients which described the degree to which they contributed to the topic that defined that cluster. The high-level topics that defined each cluster thus needed to be manually inferred from the individual tokens and their coefficients.

Topic modeling on review texts from Boston, along with their spatial distributions.

VISUAL FEATURE EXTRACTION AND UNSUPERVISED CLUSTERING

For extraction of visual features of interest, a faster-RCNN object detection model trained on the Open-Images V4 dataset (Kuznetsova et al. 2020) and provided by TensorFlow (Abadi et al. 2016) was run on the tourist photographs from each city. The model inferred the high-level semantic content contained within the images. This generated a dataset for each photograph containing the top 100 objects detected in them, along with the confidence scores for each detection. For analysis, only the features associated with confidence scores of 0.10 or higher were retained within the dataset.

Faster RCNN object detection model for extraction of features of interest.

In order to infer the features that were consistently correlated with tourist interest within the cities, an unsupervised k-means clustering algorithm was used. Each data point associated with each photograph was converted into a bag of features, with the frequency of each feature occurrence recorded within the feature vector. Based on iterative implementation, between 10 to 25 clusters were extracted from the photographs of each city.

RESULTS: THE DIMENSIONS OF TOURIST EXPERIENCE

The Talk of the Town: Themes of tourist discussion

The results of the topic modeling exercise successfully mapped out several interesting dimensions of the unique themes of discussion that remain associated with tourist sentiment in different cities. They also provided valuable insights into the unique symbolic associations that are forged between the cities and the tourists through these seemingly innocuous everyday discussions.

In Boston, most common high-level topics revolved around historic buildings, churches and sites, conducted tours revolving around museums, aquariums, and statues, enjoyable experiences in parks such as beautiful and peaceful walks, and also enjoyable experiences with children and kids. Most of these topics were associated with strong positive sentiment, and overlapped greatly with the topics emerging from the positive review dataset. Topic modeling on the negative review dataset however revealed interesting themes such as complaints about Boston’s weather (often in the context of tours being cancelled due to rain), discussions and debates around the history of slavery and the independence movement, consistent frustration with traffic at specific points in the city (such as its tunnels and bridges), and sporadic complaints about having to climbing stairs in some tourist locations.

The discussions emerging from Singapore were radically different. Unlike Boston, there was far less focus on history, and greater focus on nature, landscapes, and serenity. Common topics in this regard included numerous discussions surrounding birds and bird-watching in gardens, small and quiet playgrounds, beautiful and relaxing temples and churches, enjoyable tours in museums, good times with children and family, and very interestingly, numerous discussions surrounding staff at tourist establishments. The positive review dataset also revealed high-points in the tourist experience such as enjoyable rides in parks and jogging activities by the waterfronts. Interestingly, the negative discussions also surrounded very similar themes. There emerged multiple instances of frustration with long entrance queues in front of tourist establishments, bad staff, unpleasant experiences with children in parks, and general complaints about dirt and lack of cleanliness.

Sydney, on the other hand, had a very different identity as emergent from tourist discussions. While the Sydney Harbor and specifically the Harbor Bridge recurred in multiple discussions, there was also a lot of talk about picnics by the waterfronts and also by the city’s beaches. Discussions surrounding food was also a dominant theme in this context. Interestingly, there were multiple discussions surrounding toilets in the context of beach visits by tourists. Other topics were more in common with the other cities, such as talk surrounding children’s playgrounds and churches. Unlike Boston and Singapore, the reviews with extreme good and bad reviews had topics which were different from the topics emerging from all reviews taken together. High-points associated with positive tourist sentiment revolved around specific sites such as the zoo and art galleries. Others were more generic and pertained to conducted tours and beaches. Low-points emerging from the city surrounded themes such as bad staff, poor equipment in parks, lack of proper public toilets and, interestingly, complaints about dogs.

Overall, the discussions and themes emerging from these three cities tie back strongly to the core experiential identities that they present to their tourists – that of Boston as a city steeped in history, Singapore as a city cradled within nature but with exciting activities, and Sydney as a city dominated by its harbor and its beaches.

Summary of topics associated with highest (+ve) and lowest (-ve) tourist sentiment across the three cities.

The Image of the City: Visual features of tourist interest

The overarching popular identity of Boston as a city steeped in history and historic sites was validated through this exercise. The most common cluster of visual features of interest surrounded those pertaining to historic brick facades that characterize many neighborhoods in Boston. Interestingly enough, a vast majority of these features appeared in photographs that were captured by tourists along Boston’s famous ‘Freedom Trail’ – a 2.5-mile path through the city that weaves through important anchors of the history of the United States. In this context, another important set of features frequently capturing tourist attention were plaques, signage and information boards, which frequently accompany tourist destinations (Fig. 5). Urban parks also formed a major visual attractor, with a significant percentage of tourist photographs capturing such spaces. Selfies, group photographs and photos of other tourists formed another major cluster that captured tourist attention.

Plaques and signages around tourist attractions formed one of the strong features of tourist interest in Boston

The identity of Singapore as a city surrounded by nature emerged through the photographs as well, with a vast majority of features of interest pertaining to parks and semi-urban landscapes. These photographs were spread out finely across the length and breadth of the island city-state. The photos however also revealed how local vegetation characteristics can generate lasting visual identities. Elements such as palm trees were a recurring feature in tourist photographs and, in many cases, formed the primary subject matter of the photographs themselves. Other imageable features included the skyscrapers and skylines that characterize downtown Singapore (Fig. 6). In many cases these elements were framed across waterfronts or the harbor. Finally, a very unique set of features of tourist interest emerging from the city were those surrounding fish – both in the form of living creatures in aquariums as well as in the form of food.

The visual features capturing tourist attention in Sydney tied back strongly to the themes emerging from the topics of discussion as discussed in the earlier section. The Sydney Harbor, and specific anchors such as the Harbor Bridge and the famous Opera House dominated tourist photographs. These appeared
both as primary subjects, as well as backdrops in generic harbor front photographs. What was even more interesting was the prominence of boats (including ferries and yachts) as strong visual attractors. It is worth noting here that while both Boston and Singapore are harbor cities as well, boats did not contribute as much to the visual attention-field of the tourist. Moreover, images of fine dining, with features such as wineglasses and food in formal settings frequently recurred as a visual theme underlying the visitor’s experience of Sydney.

Skyscrapers and skylines – a popular subject of tourist interest in Singapore.

The features of visual interest emerging from the three cities opened up unique insights with regards to the specific elements within a tourist’s visual field that actually capture tourist attention. It also revealed the unique ways in which low level visual features combine and recombine in different ways to generate high-level visual identities of cities.

Summary of the top clusters for visual features of tourist interest across the three cities

CONCLUSIONS: EMPIRICAL APPROACHES TO URBAN IDENTITY

This work relied upon hybrid (textual and photographic) data to analyze the major themes of discussion and features of interest that contribute to overall sentiment towards urban tourist attractions in three cities. More importantly, it compared these patterns across cities, thus exploring the unique ways in which visual elements and textual themes come together to define urban identity.

The analytical methodology demonstrated through this work has potential to become a valuable tool for future studies. As discussed earlier, while topic modeling and sentiment analysis have been carried out for multiple urban studies in the past, the analysis of hybrid visual and textual data through unsupervised clustering is a novel approach. Extracted visual features and textual themes complement each other in the context of urban sentiment, and can provide nuanced and comprehensive insights which are often otherwise missed within mainstream sentiment analysis and topic modeling workflows. It is hoped that the methods outlined in this paper are taken forward to address future questions within an undeniably complex domain – that of subjective urban experience.

The Tourist’s Image of the City

Post navigation