House Music Intelligence DB

Methodology

How the House Music Intelligence Database collects, verifies, and scores information.

Principles

1. API-first, compliance-first. We use official APIs wherever available (Discogs, MusicBrainz,

Spotify, YouTube, Wikidata, Last.fm, Songkick, Bandsintown). For sources without APIs we respect

`robots.txt`, rate limits, and Terms of Service. We never bypass logins, paywalls, CAPTCHAs, or

anti-bot systems, and we collect only public information.

2. Provenance on every fact. Each important field stores a source URL, an extraction method, a

confidence score (0–100), and a last-verified date.

3. No silent overwrites. Higher-confidence data is never overwritten by lower-confidence data.

Conflicts are flagged to a human review queue.

4. Open and reproducible. The full dataset is downloadable as CSV, JSON, NDJSON, and JSON-LD.

Confidence scoring

A field's confidence reflects source authority and corroboration. An overall record confidence is

the mean of its field confidences. Encyclopedic topics are scored on the strength and number of

corroborating sources.

The knowledge graph

Entities (artists, topics, venues, people, genres, places, labels) are connected by **typed,

sourced relationships** — e.g. pioneered, originated_in, resident_at, influenced_by,

descended_from. The graph is queryable at /api/graph and exported as

linked data at /datasets/graph.jsonld.

Corrections

Found an error or have a better source? Corrections are welcome — every record is versioned in the

change log.