Identifying erroneous data using outlier detection techniques
Zhuang, W.; Zhang, Y.; Grassle, J.F. (2007). Identifying erroneous data using outlier detection techniques, in: Vanden Berghe, E. et al. (Ed.) Proceedings Ocean Biodiversity Informatics: International Conference on Marine Biodiversity Data Management, Hamburg, Germany 29 November to 1 December, 2004. VLIZ Special Publication, 37: pp. 187-192
In: Vanden Berghe, E. et al. (2007). Proceedings Ocean Biodiversity Informatics: International Conference on Marine Biodiversity Data Management, Hamburg, Germany 29 November to 1 December, 2004. VLIZ Special Publication, 37. IOC Workshop Report, 202. VI, 192 pp., more
In: VLIZ Special Publication. Vlaams Instituut voor de Zee (VLIZ): Oostende. ISSN 1377-0950, more
Common data quality problems observed in OBIS are described. BSCAN, a density-based clustering algorithm for large spatial data bases is employed to identify geographical outliers in federated data from a public Web service on the OBIS Portal. The algorithm is shown to be effective and efficient for this purpose. The relationship between outliers and erroneous data points are discussed and the future plan to develop an operational data quality checking tool based on this algorithm is discussed.
All data in the Integrated Marine Information System (IMIS) is subject to the VLIZ privacy policy