How we manage trust in the Data.

More details on Verification will be available soon. An overview is provided below. We welcome suggestions, examples, and inputs!

Speed vs. Trust

One of the inherent challenges of crowd-sourced data is veracity (aka truthfulness). To address this question, we have to balance two factors: speed and trust.

Speed is a hallmark feature of crowd-sourcing, with many different inputs and updates being possible across a large number of people in a short period of time. Unfortunately, the resulting data may be inaccurate, resulting in the second factor.

Trust is required for users to apply data they come across to their daily lives, which is an objective of this project on numerous levels. With crowd-sourcing not providing an inherent trust in data veracity, we need some method of creating it, at least in some situations.

Human-Mediated Verification

One of the primary means we expect to use for Verification, especially at first or with new Data sets, will be human-in-the-loop mediation.

While specific details and instructions will depend on the tool(s) employed (and be documented in Workflow Overview), the general idea as that at least one (though ideally at least two) individuals will evaluate an input or update for whether it is satisfactory (again, using criteria listed elsewhere). If so, it will be marked as Verified as of that time and be updated later if there are additional changes (meaning that either some Entities or Connections will have their Verification status change over time or updates will not be shown until verified (meaning a lag-time between submission and when an update reflects in the broader Data, though a public holding tank for unverified updates could mediate this in some cases).

The downside of this approach is that it will be somewhat resource-intensive and prone to human failure. That's where the next option comes in.

Automated Verification

Where possible, we would love to be able to automate verification. Then again, easier said than done.

Our current hypothesis is that there are probably certain data sources (e.g. those ending in .gov) that are more likely to be inherently trustworthy and could substitute for human intervention in certain cases.

If you have ideas or examples, we're all ears!

Unverified Data

While there are certainly cases where Verification will matter, there will be many cases where just exploring may be sufficient. This is similar to how social media works, with all the incumbent risks on the part of those who use the Data (but with some of the speed that can make it still useful).

More to Follow

This is an important topic and we're just starting to scratch the surface.

Last updated