SQL Style

2023-01-19 SQL style guide

This is set of rules I use when composing my SQL and designing my databases. However more often than not I have to use a database designed by some external team that use their own rules.

  • consistent and descriptive names that use domain terminology
  • lowercase for table and column names, put underscore _ where would be space
  • avoid abbreviations unless they are common in the domain
  • prefer numeric primary key named id, autogenerated
  • foreign keys named according to target table, e.g. user_id, use them to ensure data integrity. Also define your ON UPDATE and ON DELETE to simplify your database operations and prevent ghost data
  • keywords and functions in uppercase, always use AS keyword
  • in lists start lines with the comma, it is easier to comment out or add lines
  • use whitespace to make the query easy to read, line up the items
  • sub-queries use similar indentations, wrapped in the parenthesis
  • prefer BETWEEN and IN to combined AND/ORs
  • prefer ENUM with descriptive values to cryptic codes

All above are useful, but probably most important is cleanup of the data before they get into the database and all effort to make sure the data stays consistent. Before loading random Excel produced by management into the database trim all data, define enumeration of values for columns and fix anything that does not fall into defined buckets.

Small formatting example

SELECT 
    team.name AS Teamname
  , team.id   AS TeamId
  , user.name AS Approvername
  , user.id   AS UserId
FROM 
    team_user AS TU 
        JOIN team ON team.id = TU.team_id 
        JOIN user ON user.id = TU.user_id 
WHERE 
    TU.role = 'Lead'
ORDER BY 
    team.name ASC

Somewhat larger example

SELECT
    PA.name                 AS `Moderator`
  , PR.name                 AS `Project`
  , P.review_id             AS `ReviewID`
  , P.filename              AS `Filename`
  , P.completed_date        AS `Approval date`
  , P.total_product_size    AS `Size [pages]`
  , P.sum_review_time       AS `Hours`
  , COUNT(DISTINCT CT.id)   AS `Total Technical`
  , COUNT(DISTINCT CP.id)   AS `Total Process`
  , COUNT(DISTINCT CN.id)   AS `Total Non-technical`
  , COUNT(DISTINCT CS.id)   AS `Total Suggestion`
  , COUNT(DISTINCT CD.id)   AS `Total Duplicate`
  , COUNT(DISTINCT CND.id)  AS `Total Not a defect`
  , PA.review_time          AS `Hours`
  , COUNT(DISTINCT CASE WHEN CT.author  LIKE CONCAT('%', PA.name ,'%') THEN CT.id  END) AS `Technical`
  , COUNT(DISTINCT CASE WHEN CP.author  LIKE CONCAT('%', PA.name ,'%') THEN CP.id  END) AS `Process`
  , COUNT(DISTINCT CASE WHEN CN.author  LIKE CONCAT('%', PA.name ,'%') THEN CN.id  END) AS `Non-technical`
  , COUNT(DISTINCT CASE WHEN CS.author  LIKE CONCAT('%', PA.name ,'%') THEN CS.id  END) AS `Suggestion`
  , COUNT(DISTINCT CASE WHEN CD.author  LIKE CONCAT('%', PA.name ,'%') THEN CD.id  END) AS `Duplicate`
  , COUNT(DISTINCT CASE WHEN CND.author LIKE CONCAT('%', PA.name ,'%') THEN CND.id END) AS `Not a defect`
  , P.OversightType         AS `OversightType`
FROM
    packets P
        JOIN programs PR        ON PR.id = P.program_id
        JOIN participants PA    ON PA.packet_id = P.id
        LEFT JOIN comments CT   ON (P.id = CT.packet_id  AND  CT.IsDefectState = 'DefectTechnical')
        LEFT JOIN comments CP   ON (P.id = CP.packet_id  AND  CP.IsDefectState = 'DefectProcess')
        LEFT JOIN comments CN   ON (P.id = CN.packet_id  AND  CN.IsDefectState = 'DefectNonTechnical')
        LEFT JOIN comments CD   ON (P.id = CD.packet_id  AND  CD.IsDefectState = 'Duplicate')
        LEFT JOIN comments CS   ON (P.id = CS.packet_id  AND  CS.IsDefectState = 'Observation')
        LEFT JOIN comments CND  ON (P.id = CND.packet_id AND CND.IsDefectState IN ('Nondefect', 'NotDefect'))
WHERE
    PA.role LIKE '%Moderator%'
GROUP BY 
    P.id
ORDER BY 
    Moderator