5/7/11

SQL Joins

SQL Joins

Tables are joined in Oracle when data must be retrieved from more than one table. As a result of normalization the tables in a database are stored in their simplest forms with little or no redundancy. This results in separating data based on objects and still maintained relationships based on the values in the tables. Joining tables is an essential part of learning how to query the database, because most of the time all the data is not present in one table. Everytime you need to read information that is stored in multiple table, you will need to perform a join operation. In the screenshot below, you are looking at two tables called COUNTRIES and REGIONS. The two countries are related based on the Region_ID column. For example the Region whose ID is 1, and name is Europe, is made up of the countries Belgium, Switzerland, Germany, Denmark, and France in the Countries table. All these countries have their REGION_ID column set to 1.

If we want to see an output based on these two tables, displaying the Region_Name and Country_Name, you would need to join the two tables using the relationship created by the REGION_ID column.



There are different types of joins in Oracle, namely:

Cartesian Join

• Equi Join

• Non-Equi Join

• Outer Join

• Self Join

Joins can be written in two ways in Oracle. One method is the Traditional method, that uses Oracle Proprietary syntax. The other method is called the SQL-1999 syntax. In this article we will discuss the Traditional method where the table are joined using a join condition in the WHERE clause. In a subsequent article, the joining of tables using the SQL-1999 syntax will be discussed.

Basic Syntax when joining tables using traditional syntax:

SELECT column, column, [table1].column

FROM table1 , table2

WHERE [table1].column=[table2].column;

Observe the FROM clause, more than one table is mentioned, separated by commas.

Observe the WHERE clause, a join conditon is created that related a column of the first table, with a column of the second table.

The query to retrieve the REGION_NAME and its COUNTRY_NAME would be:

SELECT REGION_NAME, COUNTRY_NAME

FROM COUNTRIES, REGIONS

WHERE COUNTRIES.REGION_ID=REGIONS.REGION_ID;

Note: Prefixing the table name before the column name becomes mandatory only when the column name is the same in both the tables. If we were to retrieve the REGION_ID column also, then the query would have been written as:

SELECT REGION_NAME, COUNTRY_NAME, REGIONS.REGION_ID

FROM COUNTRIES, REGIONS

WHERE COUNTRIES.REGION_ID=REGIONS.REGION_ID;

Sometimes you can even use table aliases in the from clause. This is useful when table names are long and appear many times in the statement. Table aliases are used to simplify queries, improve readability and improve performance.

SELECT REGION_NAME, R.REGION_ID, COUNTRY_NAME

FROM COUNTRIES C, REGIONS R

WHERE C.REGION_ID=R.REGION_ID;

In the example, the alias C is given for the COUNTRIES table, and the alias R is given for the REGIONS table. The aliases are defined in the FROM clause, beside the table name separated by a space. Note that once you have defined a table alias in a query the alias must be used in that query whereever the name of the table would appear.

A basic important rule to remember when joining tables is:

Rule: When n tables are joined, a minimum of n-1 join conditions must be written for a correct output. If you do not create n-1 conditions to join the tables you will create a cartesian join, and the output might not be what you want.


In the image above, there are three tables being joined, LOCATIONS, COUNTRIES, REGIONS. Three tables, would require at least 2 valid join conditions.

The join condition for the LOCATIONS (alias L) table and COUNTRIES (alias C) table is L.COUNTRY_ID=C.COUNTRY_ID.

The join condition for the COUNTRIES (alias C) and the REGIONS (alias R) table is R.REGION_ID=C.REGION_ID.


Cartesian Joins

A cartesian join is formed when data is retrieved from multiple tables without writing necessary join conditions. In the example below there are 25 rows in the COUNTRIES table and 4 rows in the REGIONS table. The Cartesian join, will create a total of 100 rows. Every row in the COUNTRIES table will be matched with every row in the REGIONS table.

SELECT COUNTRY_NAME, REGION_NAME

FROM COUNTRIES, REGIONS;




Equi-Join

This type of join (non-equi join) is performed when data to be retrieved is in more than one table, and the data is related based a condition of equality. The example query that we started with is an equi-join:

SELECT REGION_NAME, COUNTRY_NAME

FROM COUNTRIES, REGIONS

WHERE COUNTRIES.REGION_ID=REGIONS.REGION_ID;

The REGION_ID in the COUNTRIES table, being equal to the REGION_ID column in the REGIONS table creates an equi-join condition. Equi-joins are the most common type of join that is performed.

Non-Equi Join

This type of join (non-equi join) is performed when data to be retrieved is in more than one table, but the data is not related based a condition of equality. The image below shows the grade details (SALGRADE table) and employee details (EMPLOYEES table). Employees are assigned grades based on their salary. Observe the data in the SALGRADE table. The grade A is assigned to employees whose salary is within the range of 1 and 10000. The grade B is assigned to employees whose salary is within the range of 10001 and 20000, etc. The query on the EMPLOYEES table displays the name, job_id and salary of the employees. If we have to retrieve the name of the employee and their grade, you will have to perform a join. But, the join will not be based on a condition of equality. It is based on a range, and is therefore called a Non-Equi Join.



The query to retrieve the first_name, and the grade would be:

SELECT FIRST_NAME, GRADE

FROM EMPLOYEES, SALGRADE

WHERE EMPLOYEES.SALARY BETWEEN MINSAL AND MAXSAL;

Outer Join

An outer join is based on an equi-join when additional information from a table needs to be retrieved. Consider the image below, showing the DEPARTMENTS table:


Now consider a query that is written based on an equi-join that would display the DEPARTMENT_NAME and the FIRST_NAME of the manager of the department. The departments table has only the MANAGER_ID. A manager of the department is an employee of the company. The employee details are in the EMPLOYEES table, therefore to display the name of the department and the manager's name, the query to write would be:

SELECT DEPARTMENT_NAME, FIRST_NAME

FROM DEPARTMENTS D, EMPLOYEES E

WHERE D. MANAGER_ID=E.EMPLOYEE_ID;

This query will display all the departments who have manager's and will accurately display the department name and its manager's name. (See image below)


What if we not only wanted to see in this output, the department names that have managers and the departments that do not have managers. This would mean that in addition to the output that you see in the image above, there should be department names that should appear, without a corresponding first_name (for the manager's first_name). This type of query, which needs to display additional information over and above an equi-join is called an outer join.

In the outer join type of query, the table without values to display must display nulls. Building on the example we are using, if we wish to see departments that do not have managers also, the DEPARTMENTS table will have data to display, but the EMPLOYEES table will have to display nulls (for first_names). To create an outer join type of query, we make use of an outer join operator. This operator is (+) and needs to be added to the query, in the join condition, against the table that has the null values to display.

As per our discussion, the EMPLOYEES table had the nulls to display and therefore, if an outer join had to be created, the equi-join query would be rewritten as:

SELECT DEPARTMENT_NAME, FIRST_NAME

FROM DEPARTMENTS D, EMPLOYEES E

WHERE D. MANAGER_ID=E.EMPLOYEE_ID(+);




Self Join (Inner Join)

A self join is a type of join where all the data to retreive is in one table, but not in a single row. Consider the image below and the query that is written.



The employee whose ID is 107, and first name is Diana, has a manager whose ID is 103. Based on the data, the employee whose ID is 103, has a firstname of Alexander. Therefore, we can conclude that Alexander is Diana's manager.

If we had to write a query to display the firstname of the employee and the firstname of the manager then all the data we want to retrieve is in one table, however not in a single row of this table. The data to retrieve is based on the two rows, one row which corresponds to the employee, and the second row which corresponds to the manager. This is an example where we would need to perform a self join or an inner join.

When performing a self join, we must imagine that we have two versions of the same table. That is we write the same table name twice in the FROM clause, but because we want to interpret them as separate, we must give them two different table aliases. The FROM clause would look like:

FROM EMPLOYEES E, EMPLOYEES R

One version of the EMPLOYEES table is called E. The second version of the EMPLOYEES table is called R.

Now, with the two separate versions, we can create a relationship between them, just as we did an equi-join. The relationship is based on the MANAGER_ID value for the Employee row, being equal to the EMPLOYEE_ID in the Manager row.


The relationship can be written as: E.MANAGER_ID=R.EMPLOYEE_ID

Therefore, to write out a query that will display the employee's firstname and their manager's firstname we would write:

SELECT E.FIRST_NAME, M.FIRST_NAME

FROM EMPLOYEES E, EMPLOYEES M

WHERE E.MANAGER_ID=R.MANAGER_ID;

So far we have discussed what a join in Oracle is, and the different types of joins that are available. To conclude we must note that a query may require additional restrictions to be performed. You may add additional restrictions on the data returned by a join by adding the condition to the WHERE clause using the AND operator. In the example below, we want to see all countries whose name begins with the letter B, along with the region name and region ID.

SELECT REGION_NAME, R.REGION_ID, COUNTRY_NAME

FROM COUNTRIES C, REGIONS R

WHERE C.REGION_ID=R.REGION_ID

AND COUNTRY_NAME LIKE 'B%';





0 comments:

Post a Comment