Query Your Way to Insights: SQL for Data Science Exploration.

What is SQL ?

SQL - structured query language - is a classical tool for data analysis. Together with Excel and Python it builds modern must have package for successful Data Science carrier.

SQL aka SEQUEL is a programming language designed for managing data in a relational database, it helps to build complex models and perform analyses quickly and offers a great ability for data manipulation.

SQL DS cheat sheet.
SQL for Data Science meme.

Python Knowledge Base: Make coding great again.
- Updated: 2024-07-23 by Andrey BRATUS, Senior Data Analyst.

    Compared to Python using SQL to transform data inside the warehouse is usually much faster for many basic queries and aggregations as it moves code to data rather than data to code.

    Structured Query Language (SQL) is a vital skill for data scientists, as it enables efficient manipulation and analysis of large datasets stored in relational databases. To make the most of this powerful language, here are 10 essential SQL tips every data scientist should know.

    1. Familiarize yourself with the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses, which are crucial for constructing complex SQL queries and extracting insightful data.

    2. Efficiently manage multiple tables using JOIN operations - INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN - to combine and retrieve related data seamlessly.

    3. Use aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX() to perform calculations across sets of data, providing insightful summaries and statistical metrics.

    4. Optimize database performance by creating indexes on frequently queried columns, improving query response times and query execution efficiency.

    5. Make use of subqueries, also known as inner or nested queries, to filter or preprocess data, enabling more refined and targeted query results.

    6. Utilize the CASE statement for conditional logic, allowing the creation of dynamic and adaptive queries that adjust based on specific conditions within the dataset.

    7. Save time and reduce errors by implementing stored procedures and user-defined functions to encapsulate frequently used sections of SQL code.

    8. Manage data integrity with constraints like PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK, ensuring data consistency and maintaining relationships between tables.

    9. Enhance query readability and maintainability by employing aliases, which temporarily rename tables and columns, helping simplify complex queries.

    10. Backup and restore database operations using SQL's built-in backup commands, safeguarding your valuable data and enabling recovery when needed.

    In conclusion, mastering these essential SQL tips will significantly improve your data manipulation and analysis capabilities, making you a more effective data scientist in the realm of SQL-driven data science projects.

  1. Simple calculations with SQL:

  2. Making a single calculation.

    SELECT 13+15  from dual;

    Making multiple calculations.

    SELECT 1+17, 11-3, 7*7, 18/6 from dual;

    Making calculations with multiple numbers.

    SELECT 5*2*3, 7+2+7 from dual;

    Naming results.

    SELECT 4*3 AS res1, 5+2+3 AS res2 from dua;

  3. Selecting tables, columns, and rows:

  4. Simple select - the whole table.

    SELECT * FROM  table1;

    Select columns from a table.

    SELECT column1, column2 FROM table2;

    Selecting the first 10 rows on a table.

    SELECT * from table3
    fetch first 10 rows only;
    -- LIMIT 10; --otherwise

    Selecting UNIQUE VALUES.

    SELECT DISTINCT(column1) FROM table_name;

    HAVING - returns only those results where the counted value is greater than certain value.

    SELECT column1, column2 FROM table_name 
    GROUP BY column1 
    HAVING COUNT(column2) > 100; 

  5. Adding comments to SQL code:

  6. Single-line comments.

    -- Useful comment 
    SELECT column1, column2, column3 -- Useful comment 2
    FROM table4; -- Useful comment 3 

    Block comments.

    /* Multiple 
    comment */
    SELECT column1, column2, column3 
    FROM table5; 


  8. Counting the number of rows.

    SELECT COUNT(*) FROM table_name; 

    Calculating the average (mean) of the values in column1.

    SELECT AVG(column1) FROM table_name; 

    Calculating the sum of the values in column1.

    SELECT SUM(column1) FROM table_name;  

    Calculating the MIN of the values in column1.

    SELECT MIN(column1) FROM table_name;  

    Calculating the MAXof the values in column1.

    SELECT MAX(column1) FROM table_name;  

    Calculating the MEDIAN of the values in column1.

    SELECT MEDIAN(column1) FROM table_name;  

  9. Using Joins & Complex Queries:

  10. Joining tables with INNER JOIN.

    SELECT * FROM table1 
    INNER JOIN table2 ON table1.column1 = table2.column1;   

    Joining tables using a LEFT JOIN.

    SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id;   

    Joining tables using a RIGHT JOIN.

    SELECT f.col1, c.col2
    FROM t1 c  RIGHT JOIN t2 f ON f.id = c.id;  

    Joining tables using a FULL OUTER JOIN.

    SELECT f.col1, c.col2
    FROM t1 c   FULL OUTER JOIN   t2 f ON f.id = c.id;  

    Sorting a column without specifying a column name.

    SELECT col1, col2
    FROM table
    ORDER BY 2 desc; -- descending order by 2 column  

    Using a join within a subquery.

    SELECT c.name, f.*
    FROM table f INNER JOIN 
    ( SELECT * FROM table2 WHERE name = 1 ) c ON c.id= f.id;  

  11. Common SQL operations:

  12. Concatinating columns and text into a single column.

    SELECT id, name, 
    "Original id is " || column1, 
    "Details are : " || column2 || column3
    FROM table  

    Matching part of a string.

    SELECT * FROM table 
    WHERE name LIKE "%Andrew%";  

    Applying if / then logic with CASE.

    WHEN age < 18 THEN 'teenager' 
    WHEN  age > 18  THEN 'adult'
    ELSE 'unknown' 
    END AS column_name  

    Using the WITH clause.

    WITH stuff AS 
    ( SELECT col1,  col2 , col3
    FROM table10) 
    SELECT * FROM stuff 
    WHERE  col1 = 15;    

    Creating a table from select.

    CREATE TABLE mytable2
    AS SELECT  col1,  col2  FROM fulltable;   

    Creating a table from scratch.

    create table mytable (
    start_date date , code varchar2 (200),
    salary number

    Inserting data into a table.

    INSERT INTO mytable  (start_date, code, salary )
    VALUES ('10.10.2010', '12345', 12000);  

    Dropping a table.

    DROP TABLE mytable;  

    Cleaning a table.

    truncate table  mytable2;  

    Creating a view.

    CREATE VIEW mytable3
    AS SELECT  col1,  col2 , col3  FROM fulltable;   

    Results from 2 different select statesments.

    select * from a
    select * from b;    

    Results that occur in both SELECT statements.

    select * from a
    select * from b;    

    Results that occur in the first SELECT statement but not the second SELECT statement.

    select * from a
    select * from b;    


  14. SQL is extremely sensitive to keyword order. So make sure you keep it right:

    1. SELECT
    2. FROM
    3. JOIN (ON)
    4. WHERE
    5. GROUP BY
    6. HAVING
    7. ORDER BY
    8. LIMIT or fetch first x rows only

See also related topics: