SQL Queries – Order of Execution

Any aspirant data professional should be knowledgeable about the Order of Execution of SQL Queries because it is crucial in both interviews and the workplace. By allowing you to create effective queries and prevent unexpected outputs, it simplifies your life. As a data scientist, I attended interviews with numerous firms, and 2 out of 5 interviewers inquired about the SQL queries’ order of execution.

Order of execution in SQL

Although SQL may appear to be a simple language, as we dig deeper, we will understand how powerful it is. Its support for complex queries is simply incredible.. SQL queries mainly has following statements

  • Select
  • From
  • Join
  • Where
  • Group by
  • Having
  • Order by
  • Offset/Limit

With the following example we can understand the order of execution.

Let us consider 2 tables, a transaction table and product table.

Transaction Table Name : ‘transaction’

Product Table Name:  ‘Product’

The transaction table contains txn_id, prd_id, qty and msrp columns.
The product table contains prd_id and prd_cat columns.

FROM

The FROM statement, which identifies the table from which data is fetched, is the first statement to be run in a SQL query. The query in the example below has two problems: the table name is misspelled , and the column ’emp_id’ isn’t present in the transaction table. Still, we only receive the message that ‘statscribbles.statscribbles_db.transaction_1’ table cannot be located.

When I correct the table name  in below example we could see that SQL identifies the issue in column name specified. This demonstrates unequivocally that the first statement to be executed in a query is ‘FROM’.

JOIN CLAUSE

A Join clause will get executed second in the SQL Query ,it is used to combine rows from two or more tables based on the common columns between the tables. Even in the example below, I used the wrong column name in the join condition and in select statement. We could see  that SQL first identifies the incorrect column name in join condition rather than the column name in select statement. This demonstrates unequivocally that the ‘join’ statement is executed prior to the ‘select’ statement.

WHERE

The Where statement is the third to be run in a SQL query. By using the specified criteria, it is used to filter the output of the tables obtained from the “FROM” statement and join clause. In the below example also incorrect column in where condition is identified first rather than incorrect column name in select statement.

GROUP BY

The fourth statement to get executed in an SQL query is Group by clause. It is used to aggregate the data based on one or more columns. If the query contains a Where clause, Group by will run after the Where clause has filtered the data. Despite the fact that the select, where, and group by statements in the example below use wrong column names, SQL initially recognizes the error in the where condition.  This demonstrates unequivocally that Group by is executed only after where statement.

HAVING CLAUSE

The Having clause is the fifth statement that a SQL query executes. It is used to filter the data in the table similar to  Where clause, however the distinction is that the Where clause is executed before the Group by clause and the Having clause is executed after the Group by clause.When establishing a condition in a where clause, aggregate functions cannot be used; however, they may be used in a having clause. The below example clearly shows that aggregate functions cannot be used in where clause.

SELECT

The sixth statement to get executed in SQL query is select. In spite of being present at the very beginning Select statement is actually executed at sixth place. The above mentioned statements decides the records which has to be present in the result table where as Select statement specifies the columns that needs to be present in the result.

ORDER BY

The seventh statement to get executed in SQL query is Order by. Once the tables have been chosen ,necessary records and columns have been filtered using the statements mentioned above we can now order the result based on one or more columns using Order By clause. We can order the result either in Ascending or Descending order of specified columns.

Offset/Limit

The last statement to get executed is offset & limit statements with which we can retrieve a subset of data. In the below example we have taken  list of product category and number of products in that category provided that the ‘msrp’ of those products are  greater than 70.

From the above example, if we only wanted the product categories that have the second most number of products we can use offset and limit as mentioned in the query below. Offset 1 gets a subset of the table from row number 2 till the end and limit 1 gets only the first row from that subset.