Loch Prospector: Metadata Visualization for Lakes of Open Data
Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, Cody Dunne
External link (DOI)
View presentation:2020-10-28T17:00:00ZGMT-0600Change your timezone on the schedule page
2020-10-28T17:00:00Z

Fast forward
Direct link to video on YouTube: https://youtu.be/BL0J-ZVQskY
Keywords
Computing: Software, Networks, Security, Performance Engr., Distr. Systems, Databases, Coordinated and Multiple Views, Data Analysis, Reasoning, Problem Solving, and Decision Making, Application Motivated Visualization, High-dimensional Data
Abstract
Data lakes are an emerging storage paradigm that promotes data availability over integration. A prime example are repositories of Open Data which show great promise for transparent data science. Due to the lack of proper integration, Data Lakes may not have a common consistent schema and traditional data management techniques fall short with these repositories. Much recent research has tried to address the new challenges associated with these data lakes. Researchers in this area are mainly interested in the structural properties of the data for developing new algorithms, yet typical Open Data portals offer limited functionality in that respect and instead focus on data semantics. We propose Loch Prospector, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work. Our visualization enables researchers to navigate the contents of data lakes effectively and easily accomplish what were previously laborious tasks. A copy of this paper with all supplemental material is available at osf.io/zkxv9