Extraction from Databases| SQL
Overview
SQL stands for Structured Query Language, which is used to interact with Relational Databases. SQL is a clause-based query language that can enable extracting large amounts of data within a short period of time. It is an important skill for Data Scientists and Data Analysts, who often query and extract data from databases.
Introduction
- SQL stands for Structured Query Language which is used to query, update, and manage Relational Databases (RDBMS) and extract data. It is an essential skill for Data Scientists and Analysts as Relational Databases are very common in organizations due to their simplicity and ease of maintenance.
- A Relational Database (RDBMS) is a collection of data items with pre-defined relationships with them. It organizes data in the form of tables, i.e. in the form of rows and columns. In RDBMS, the schema for each feature is pre-defined. A few of the most common Relational Databases are - MySQL, Oracle, etc.
- Relational Databases are managed and interacted with using SQL language. So to extract data from Relational Database(s), you need to write queries using SQL statements and clauses.
SQL Statements
One of the main advantages of SQL is its simple syntax and learning curve. To extract data from RDBMS, you need to write SQL statements which include _clauses, table names, field names, logical expressions,_ etc. As shown in the below table, there is a basic order for clauses you need to follow while writing SQL statements for data extraction.
Order | Clause | Description |
1 | SELECT | SELECT statement includes data that you want to display. It can include a list of columns or computed features. |
2 | FROM | FROM clause indicates the table name which you want to query for data extraction |
3 | WHERE | WHERE is used to filter data based on a logical expression |
4 | GROUP BY | GROUP BY clause is used to aggregate data |
5 | HAVING | HAVING is used to filter aggregated data |
6 | ORDER BY | ORDER BY clause is used to sort data in ascending or descending order based on one or multiple columns |
7 | LIMIT | LIMIT is used to limit the number of rows in the final data |
Let’s see some examples of SQL queries to understand how to use the above clauses. Suppose we have a table named _student_ as shown below -
Student ID | Name | Section | Marks | Grade |
1 | Chandler | A | 92 | A |
2 | Monica | B | 90 | A |
3 | Phoebe | C | 89 | B |
4 | Joey | C | 71 | C |
5 | Rachel | B | 65 | C |
6 | Ross | A | 97 | A |
Suppose, if you want to display student name, their marks, section when their grade is A, you can use below query -
The above query will display below table -
Name | Section | Marks |
Chandler | A | 92 |
Monica | B | 90 |
Ross | A | 97 |
If you want to display average marks for each section, you can use the GROUP BY clause as shown in the below query -
Above query will display below table -
Section | AverageMarks |
A | 94.5 |
B | 77.5 |
C | 80 |
Now, let’s say you want to display section where average marks are >= 80. Then you can use the below query -
Section | AverageMarks |
A | 94.5 |
C | 80 |
Now, if you want to display the above table in ascending orders of average marks, then you can use the below query -
Section | AverageMarks |
C | 80 |
A | 94.5 |
If you want to display the section with highest average marks, you can use the query below.
Section | AverageMarks |
A | 94.5 |
Filtering Data with SQL
In SQL, WHERE clause is used to filter data. Let’s get into some of the methods which you can use in the WHERE clause to filter data -
Using ‘=’ Expression
It is an equal sign which can be used to filter data when a column is equal to the specified value. Below is one example query for the equal operator -
Name | Marks | Grade |
Chandler | 92 | A |
Ross | 97 | A |
Using Relational Expressions
Below are the relational expressions which you can use in WHERE clause for data filtering -
- > or <: select rows with greater or less than specified value
- >=: select rows with greater than or equal to specified value
- <=: select rows with less than or equal to specified value
Let’s have a look at the example mentioned below -
Name | Section | Marks | Grade |
Chandler | A | 92 | A |
Monica | B | 90 | A |
Phoebe | C | 89 | B |
Ross | A | 97 | A |
Using Logical Expressions
Below are the logical expressions you can use in the WHERE clause to filter data -
- AND (&) - When two specified conditions are met
- OR (|) - When any of the two specified conditions is met
- NOT - When a specified condition is negated
Let’s have a look at below SQL query to understand how logical expression works -
Name | Section | Marks | Grade |
Chandler | A | 92 | A |
Monica | B | 90 | A |
Phoebe | C | 89 | B |
Joey | C | 71 | C |
Ross | A | 97 | A |
Using IN Operator
IN operator can be used to filter data when you want to specify a limited number of options to match for a column. Below is a query that uses IN operator to filter data
Name | Section | Marks | Grade |
Chandler | A | 92 | A |
Phoebe | C | 89 | B |
Joey | C | 71 | C |
Ross | A | 97 | A |
Using BETWEEN Operator
The BETWEEN operator is used when you want to specify a range in filtering condition.
Name | Section | Marks | Grade |
Monica | B | 90 | A |
Phoebe | C | 89 | B |
Using LIKE Operator
The LIKE operator is used on String columns when you want to match a substring to filter input data.
Name | Section | Marks | Grade |
Phoebe | C | 89 | B |
Joey | C | 71 | C |
Extraction From Multiple Tables
In a real-world scenario, data is stored in disparate databases, and you must extract data from multiple tables. One way of extracting data from multiple tables is by using JOIN clauses. JOIN clauses combine data from multiple tables based on matching conditions of specified key(s). There are various types of join -
- Left Join
- Right Join
- Inner Join
- Outer Join
Suppose we have another table named _student_details_ containing the student’s city and age details as shown below -
Student ID | City | Age |
1 | Chicago | 21 |
2 | New York | 23 |
3 | Seattle | 20 |
4 | Boston | 19 |
Now, if we want to fetch a student's city and age and combine it with another table, we can use a join operation. We can join these two tables based on _StudentID_ as it is a common key between these two tables. Let’s use INNER join for our case. Inner Join will display results only for those _StudentID_ which are present in both tables.
Name | Section | City | Age |
Chandler | A | Chicago | 21 |
Monica | B | New York | 23 |
Phoebe | C | Seattle | 20 |
Joey | C | Boston | 19 |
Combined Extraction
Another way to combine data from multiple tables is by using the UNION clause. The UNION operator combines data from multiple tables using multiple SELECT commands. While applying the UNION clause, you must ensure that columns in each SELECT statement should be of the same data types and in the same order.
General SQL Conventions
- SQL is not case sensitive. So it can be written in lower or upper case letters. However, it is good practice to write SQL clauses such as SELECT, WHERE, FROM, etc. in uppercase letters.
- Stored data in tables is case-sensitive. So while using LIKE conditions to match a substring, you must ensure that the specified string is in the required case order.
- You should avoid spaces in table and column names.
- SQL ignores white space in code, and no indentation is required while writing SQL queries.
- It is considered a good practice to write comments in your SQL queries.
Conclusion
- SQL stands for Structured Query Language, and it is used to extract data from Relational Databases quickly.
- Its main advantage is its simplified syntax, and you can combine its clauses to write a complex query.
- Using SQL, you can filter data by specifying multiple conditions, and pull out data from multiple tables using JOIN and UNION clauses.