Welcome! The TBD pedestrian dataset is a dataset primarily collected in pedestrian rich environments. Our dataset highlights the following components: human verified labels grounded in the metric space, a combination of top-down and perspective views, and naturalistic human behavior in the presence of a socially appropriate “robot”.

Dataset Highlights
Human verified metric space labels:
Having labels grounded in metric space eliminates the possibility that camera poses might have an effect on the scale of the labels. It also makes the dataset useful for robot navigation related research because robots plan in the metric space rather than image space.
Top-down and perspective views:
A dataset that contains both top-down views and perspective views will be useful for research projects that rely on perspective views. This allows perspective inputs to their models, while still having access to ground truth knowledge of the entire scene.
Naturalistic human behavior with the presence of a “robot”:
The “robot” that provides perspective view data collection is a cart being pushed by human. Doing so reduces the novelty effects from the surrounding pedestrians. Having the “robot” being pushed by humans also ensures safety for the pedestrians and its own motion has more natural human behavior. As such, the pedestrians also react naturally around the robot by treating it as another human agent.
Downloads
The button below will take you to our dataset Google drive folder. Descriptions of the data organization and label files are in the readme file.
This effort is IRB approved under Carnegie Mellon University protocols STUDY2019_00000116 and STUDY2021_00000199. Please contact Aaron Steinfeld for IRB related issues.
Publication
Wang, A., Biswas, A., Admoni, H., & Steinfeld, A. (2022). Towards Rich, Portable, and Large-Scale Pedestrian Data Collection. ArXiv, abs/2203.01974. [link]
Hardware

We positioned three FLIR Blackfly RGB cameras surrounding the scene on the upper floors overlooking the ground level at roughly 90 degrees apart from each other. Compared to a single overhead camera, multiple cameras ensure better pedestrian labeling accuracy. The RGB cameras are connected to portable computers powered by lead-acid batteries. We also positioned three more units on the ground floor but did not use them for pedestrian labeling.
In addition to the RGB cameras, we pushed a cart through the scene, which was equipped with a ZED stereo camera to collect both perspective RGB views and depth information of the scene. A GoPro Fusion 360 camera for capturing high definition 360 videos of nearby pedestrians was mounted above the ZED. Data from the on-board cameras are useful in capturing pedestrian pose data and facial expressions. The ZED camera was powered by a laptop with a power bank.
Statistics
Our dataset current contains labels for 1412 pedestrians in 8 recording sessions. More data are currently being collected on other locations. Stay tuned!
Session # | Time Length | # Of Pedestrians |
1 | 01:13 | 18 |
2 | 14:59 | 151 |
3 | 07:13 | 51 |
4 | 40:02 | 369 |
5 | 27:50 | 447 |
6 | 19:25 | 127 |
7 | 11:48 | 147 |
8 | 11:03 | 102 |
Acknowledgements
This project was supported by grants (IIS-1734361 and IIS-1900821) from the National Science Foundation.