What is SQL ?
SQL - structured query language - is a classical tool for data analysis. Together with Excel and Python it builds modern must have package for successful Data Science carrier.
SQL aka SEQUEL is a programming language designed for managing data in a relational database, it helps to build complex models and perform analyses quickly and offers a great ability for data manipulation.
Python Knowledge Base: Make coding great again.
- Updated:
2024-12-20 by Andrey BRATUS, Senior Data Analyst.
Simple calculations with SQL:
Selecting tables, columns, and rows:
Adding comments to SQL code:
SQL FUNCTIONS FOR AGGREGATION:
Using Joins & Complex Queries:
Common SQL operations:
CORRECT KEYWORDS ORDER:
Compared to Python using SQL to transform data inside the warehouse is usually much faster for many basic queries and aggregations as it moves code to data rather than data to code.
Structured Query Language (SQL) is a vital skill for data scientists, as it enables efficient manipulation and analysis of large datasets stored in relational databases. To make the most of this powerful language, here are 10 essential SQL tips every data scientist should know.
1. Familiarize yourself with the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses, which are crucial for constructing complex SQL queries and extracting insightful data.
2. Efficiently manage multiple tables using JOIN operations - INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN - to combine and retrieve related data seamlessly.
3. Use aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX() to perform calculations across sets of data, providing insightful summaries and statistical metrics.
4. Optimize database performance by creating indexes on frequently queried columns, improving query response times and query execution efficiency.
5. Make use of subqueries, also known as inner or nested queries, to filter or preprocess data, enabling more refined and targeted query results.
6. Utilize the CASE statement for conditional logic, allowing the creation of dynamic and adaptive queries that adjust based on specific conditions within the dataset.
7. Save time and reduce errors by implementing stored procedures and user-defined functions to encapsulate frequently used sections of SQL code.
8. Manage data integrity with constraints like PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK, ensuring data consistency and maintaining relationships between tables.
9. Enhance query readability and maintainability by employing aliases, which temporarily rename tables and columns, helping simplify complex queries.
10. Backup and restore database operations using SQL's built-in backup commands, safeguarding your valuable data and enabling recovery when needed.
In conclusion, mastering these essential SQL tips will significantly improve your data manipulation and analysis capabilities, making you a more effective data scientist in the realm of SQL-driven data science projects.
Making a single calculation.
SELECT 13+15 from dual;
Making multiple calculations.
SELECT 1+17, 11-3, 7*7, 18/6 from dual;
Making calculations with multiple numbers.
SELECT 5*2*3, 7+2+7 from dual;
Naming results.
SELECT 4*3 AS res1, 5+2+3 AS res2 from dua;
Simple select - the whole table.
SELECT * FROM table1;
Select columns from a table.
SELECT column1, column2 FROM table2;
Selecting the first 10 rows on a table.
SELECT * from table3
fetch first 10 rows only;
-- LIMIT 10; --otherwise
Selecting UNIQUE VALUES.
SELECT DISTINCT(column1) FROM table_name;
HAVING - returns only those results where the counted value is greater than certain value.
SELECT column1, column2 FROM table_name
GROUP BY column1
HAVING COUNT(column2) > 100;
Single-line comments.
-- Useful comment
SELECT column1, column2, column3 -- Useful comment 2
FROM table4; -- Useful comment 3
Block comments.
/* Multiple
lines
comment */
SELECT column1, column2, column3
FROM table5;
Counting the number of rows.
SELECT COUNT(*) FROM table_name;
Calculating the average (mean) of the values in column1.
SELECT AVG(column1) FROM table_name;
Calculating the sum of the values in column1.
SELECT SUM(column1) FROM table_name;
Calculating the MIN of the values in column1.
SELECT MIN(column1) FROM table_name;
Calculating the MAXof the values in column1.
SELECT MAX(column1) FROM table_name;
Calculating the MEDIAN of the values in column1.
SELECT MEDIAN(column1) FROM table_name;
Joining tables with INNER JOIN.
SELECT * FROM table1
INNER JOIN table2 ON table1.column1 = table2.column1;
Joining tables using a LEFT JOIN.
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id;
Joining tables using a RIGHT JOIN.
SELECT f.col1, c.col2
FROM t1 c RIGHT JOIN t2 f ON f.id = c.id;
Joining tables using a FULL OUTER JOIN.
SELECT f.col1, c.col2
FROM t1 c FULL OUTER JOIN t2 f ON f.id = c.id;
Sorting a column without specifying a column name.
SELECT col1, col2
FROM table
ORDER BY 2 desc; -- descending order by 2 column
Using a join within a subquery.
SELECT c.name, f.*
FROM table f INNER JOIN
( SELECT * FROM table2 WHERE name = 1 ) c ON c.id= f.id;
Concatinating columns and text into a single column.
SELECT id, name,
"Original id is " || column1,
"Details are : " || column2 || column3
FROM table
Matching part of a string.
SELECT * FROM table
WHERE name LIKE "%Andrew%";
Applying if / then logic with CASE.
CASE
WHEN age < 18 THEN 'teenager'
WHEN age > 18 THEN 'adult'
ELSE 'unknown'
END AS column_name
Using the WITH clause.
WITH stuff AS
( SELECT col1, col2 , col3
FROM table10)
SELECT * FROM stuff
WHERE col1 = 15;
Creating a table from select.
CREATE TABLE mytable2
AS SELECT col1, col2 FROM fulltable;
Creating a table from scratch.
create table mytable (
start_date date , code varchar2 (200),
salary number
);
Inserting data into a table.
INSERT INTO mytable (start_date, code, salary )
VALUES ('10.10.2010', '12345', 12000);
Dropping a table.
DROP TABLE mytable;
Cleaning a table.
truncate table mytable2;
Creating a view.
CREATE VIEW mytable3
AS SELECT col1, col2 , col3 FROM fulltable;
Results from 2 different select statesments.
select * from a
UNION ALL
select * from b;
Results that occur in both SELECT statements.
select * from a
INTERSECT
select * from b;
Results that occur in the first SELECT statement but not the second SELECT statement.
select * from a
EXCEPT
select * from b;
SQL is extremely sensitive to keyword order. So make sure you keep it right:
1. SELECT
2. FROM
3. JOIN (ON)
4. WHERE
5. GROUP BY
6. HAVING
7. ORDER BY
8. LIMIT or fetch first x rows only