What is SQL ?
SQL - structured query language - is a classical tool for data analysis. Together with Excel and Python it builds modern must have package for successful Data Science carrier.
SQL aka SEQUEL is a programming language designed for managing data in a relational database, it helps to build complex models and perform analyses quickly and offers a great ability for data manipulation.
Compared to Python using SQL to transform data inside the warehouse is usually much faster for many basic queries and aggregations as it moves code to data rather than data to code.
Structured Query Language (SQL) is a vital skill for data scientists, as it enables efficient manipulation and analysis of large datasets stored in relational databases. To make the most of this powerful language, here are 10 essential SQL tips every data scientist should know.
1. Familiarize yourself with the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses, which are crucial for constructing complex SQL queries and extracting insightful data.
2. Efficiently manage multiple tables using JOIN operations - INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN - to combine and retrieve related data seamlessly.
3. Use aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX() to perform calculations across sets of data, providing insightful summaries and statistical metrics.
4. Optimize database performance by creating indexes on frequently queried columns, improving query response times and query execution efficiency.
5. Make use of subqueries, also known as inner or nested queries, to filter or preprocess data, enabling more refined and targeted query results.
6. Utilize the CASE statement for conditional logic, allowing the creation of dynamic and adaptive queries that adjust based on specific conditions within the dataset.
7. Save time and reduce errors by implementing stored procedures and user-defined functions to encapsulate frequently used sections of SQL code.
8. Manage data integrity with constraints like PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK, ensuring data consistency and maintaining relationships between tables.
9. Enhance query readability and maintainability by employing aliases, which temporarily rename tables and columns, helping simplify complex queries.
10. Backup and restore database operations using SQL's built-in backup commands, safeguarding your valuable data and enabling recovery when needed.
In conclusion, mastering these essential SQL tips will significantly improve your data manipulation and analysis capabilities, making you a more effective data scientist in the realm of SQL-driven data science projects.
Simple calculations with SQL:
Making a single calculation.
SELECT 13+15 from dual;
Making multiple calculations.
SELECT 1+17, 11-3, 7*7, 18/6 from dual;
Making calculations with multiple numbers.
SELECT 5*2*3, 7+2+7 from dual;
SELECT 4*3 AS res1, 5+2+3 AS res2 from dua;
Selecting tables, columns, and rows:
Simple select - the whole table.
SELECT * FROM table1;
Select columns from a table.
SELECT column1, column2 FROM table2;
Selecting the first 10 rows on a table.
SELECT * from table3 fetch first 10 rows only; -- LIMIT 10; --otherwise
Selecting UNIQUE VALUES.
SELECT DISTINCT(column1) FROM table_name;
HAVING - returns only those results where the counted value is greater than certain value.
SELECT column1, column2 FROM table_name GROUP BY column1 HAVING COUNT(column2) > 100;
Adding comments to SQL code:
-- Useful comment SELECT column1, column2, column3 -- Useful comment 2 FROM table4; -- Useful comment 3
/* Multiple lines comment */ SELECT column1, column2, column3 FROM table5;
SQL FUNCTIONS FOR AGGREGATION:
Counting the number of rows.
SELECT COUNT(*) FROM table_name;
Calculating the average (mean) of the values in column1.
SELECT AVG(column1) FROM table_name;
Calculating the sum of the values in column1.
SELECT SUM(column1) FROM table_name;
Calculating the MIN of the values in column1.
SELECT MIN(column1) FROM table_name;
Calculating the MAXof the values in column1.
SELECT MAX(column1) FROM table_name;
Calculating the MEDIAN of the values in column1.
SELECT MEDIAN(column1) FROM table_name;
Using Joins & Complex Queries:
Joining tables with INNER JOIN.
SELECT * FROM table1 INNER JOIN table2 ON table1.column1 = table2.column1;
Joining tables using a LEFT JOIN.
SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id;
Joining tables using a RIGHT JOIN.
SELECT f.col1, c.col2 FROM t1 c RIGHT JOIN t2 f ON f.id = c.id;
Joining tables using a FULL OUTER JOIN.
SELECT f.col1, c.col2 FROM t1 c FULL OUTER JOIN t2 f ON f.id = c.id;
Sorting a column without specifying a column name.
SELECT col1, col2 FROM table ORDER BY 2 desc; -- descending order by 2 column
Using a join within a subquery.
SELECT c.name, f.* FROM table f INNER JOIN ( SELECT * FROM table2 WHERE name = 1 ) c ON c.id= f.id;
Common SQL operations:
Concatinating columns and text into a single column.
SELECT id, name, "Original id is " || column1, "Details are : " || column2 || column3 FROM table
Matching part of a string.
SELECT * FROM table WHERE name LIKE "%Andrew%";
Applying if / then logic with CASE.
CASE WHEN age < 18 THEN 'teenager' WHEN age > 18 THEN 'adult' ELSE 'unknown' END AS column_name
Using the WITH clause.
WITH stuff AS ( SELECT col1, col2 , col3 FROM table10) SELECT * FROM stuff WHERE col1 = 15;
Creating a table from select.
CREATE TABLE mytable2 AS SELECT col1, col2 FROM fulltable;
Creating a table from scratch.
create table mytable ( start_date date , code varchar2 (200), salary number );
Inserting data into a table.
INSERT INTO mytable (start_date, code, salary ) VALUES ('10.10.2010', '12345', 12000);
Dropping a table.
DROP TABLE mytable;
Cleaning a table.
truncate table mytable2;
Creating a view.
CREATE VIEW mytable3 AS SELECT col1, col2 , col3 FROM fulltable;
Results from 2 different select statesments.
select * from a UNION ALL select * from b;
Results that occur in both SELECT statements.
select * from a INTERSECT select * from b;
Results that occur in the first SELECT statement but not the second SELECT statement.
select * from a EXCEPT select * from b;
CORRECT KEYWORDS ORDER:
SQL is extremely sensitive to keyword order. So make sure you keep it right:
3. JOIN (ON)
5. GROUP BY
7. ORDER BY
8. LIMIT or fetch first x rows only