# Methodology

> How the House Music Intelligence Database collects, verifies, and scores information.

## Principles

1. **API-first, compliance-first.** We use official APIs wherever available (Discogs, MusicBrainz,
   Spotify, YouTube, Wikidata, Last.fm, Songkick, Bandsintown). For sources without APIs we respect
   `robots.txt`, rate limits, and Terms of Service. We never bypass logins, paywalls, CAPTCHAs, or
   anti-bot systems, and we collect only public information.
2. **Provenance on every fact.** Each important field stores a source URL, an extraction method, a
   confidence score (0–100), and a last-verified date.
3. **No silent overwrites.** Higher-confidence data is never overwritten by lower-confidence data.
   Conflicts are flagged to a human review queue.
4. **Open and reproducible.** The full dataset is downloadable as CSV, JSON, NDJSON, and JSON-LD.

## Confidence scoring

A field's confidence reflects source authority and corroboration. An overall record confidence is
the mean of its field confidences. Encyclopedic topics are scored on the strength and number of
corroborating sources.

## The knowledge graph

Entities (artists, topics, venues, people, genres, places, labels) are connected by **typed,
sourced relationships** — e.g. *pioneered*, *originated_in*, *resident_at*, *influenced_by*,
*descended_from*. The graph is queryable at [/api/graph](https://database.worldfamoushousecrew.org/api/graph) and exported as
linked data at [/datasets/graph.jsonld](https://database.worldfamoushousecrew.org/datasets/graph.jsonld).

## Corrections

Found an error or have a better source? Corrections are welcome — every record is versioned in the
[change log](https://database.worldfamoushousecrew.org/changelog.md).
