Dataset

The data used in this project originate from the NUS Global Streetscapes dataset, which was published by the National University of Singapore. The dataset contains large-scale street-level imagery with a variety of labels. For this project, these labels were combined into a single large table that includes the complete set of images and metadata.

Notebook

For the code and steps used to build the table, see Initial Table Generation.

Steps for preparing the subsets

Step 1

The following table shows how the total count of the chosen cities in the study change when using the mly_quality_score. This is quite important to notice, hence, the amount changes quite drastically when using a different thresholds.

Note

This approach is useful when computation time needs to be reduced and fewer pairs are expected.

City	Total	50 %	60 %	70 %	80 %	90 %
Berlin	198184	61606	59728	56517	51531	41767
Washington	197080	76859	70041	60128	44313	24971
Sydney	69227	63944	61771	57759	52210	41051
Cape Town	12639	11135	10136	8764	6708	4068
Taipei	198538	171232	161789	146595	122037	84761
Sao Paulo	197964	129330	108546	78852	46080	19913

Since testing showed, that the heading variable is not a 100 % reliable, mapillary's computed heading was used as well to determine if there is an offset greater than 10 degrees. The following reduction of the table looks the following:

City	Total	50 %	60 %	70 %	80 %	90 %
Berlin	45650	13035	12476	11629	10287	8018
Washington	132952	51160	47204	41233	30535	16644
Sydney	15304	12849	12164	11228	9924	7573
Cape Town	9511	8276	7416	6303	4618	2594
Taipei	26417	19805	18686	16351	12673	8040
Sao Paulo	124394	79460	66792	48881	27401	11107

As a result, the difference between the headings and the score were being used to reduce the size of the images.

Step 2

Since the initial amount of images was still too big, we decided to use the tool by Danish et al. (2024) which can be found on GitHub. For our usage, the tools were slightly modified. How to use it, is explained in the See the Advanced guide.

Final dataset

The result of these steps is a cleaned and filtered subset of the Global Streetscapes dataset. In the Berlin example, this produced a smaller but higher quality collection of images that can be used for tasks such as pair identification and perception studies.