Saturday, March 19, 2022

Transposing data in SQL - Example Tutorial

What is Transposition?

A transposition is when data is rotated from one row or column to another in order to modify the data layout and make observations from a different perspective. Row to the column, column to column, and bidirectional transposition are examples of basic transposition algorithms. Others, such as dynamic transposition, transposition with inter-row computations, and join-based transposition, are more complicated. All are prevalent in data analytics and so worthy of investigation.


Transposition Types


Basic Transposition

The most basic transpositions are row-to-column and column-to-row. Each is the inverse computation of the other. To show an example, 



Transposing data in SQL - Example Tutorial



This would be transformed into as below : 








SQL Solutions for transposition

Because SQL didn't have the unique PIVOT function in its early days (and still doesn't in MySQL and HSQLDB), it managed to row to column transpositions by combining and coordinating various fundamental procedures. A computer problem might be solved in a variety of ways.

Solution 1: using CASE, WHEN + grouping, and aggregation

Select year, max(Q1)   'Q1', max(Q2)  'Q2', max (Q3) 'Q3', max   (Q4) 'Q4'
from (
  select year,
     CASE WHEN  quarter = 'Q1' then amount end Q1,
     CASE WHEN  quarter = 'Q2' then amount end Q2,
     CASE WHEN  quarter = 'Q3' then amount end Q3,
     CASE WHEN  quarter = 'Q4' then amount end Q4
  from QuarterResults
) A group by  year;


Solution 2: SUM, IF + grouping, and aggregation

SELECT year,
  MAX(IF(quarter = 'Q1',  amount, null)) AS 'Q1',
  MAX (IF(quarter = 'Q2',  amount, null)) AS 'Q2',
  MAX (IF(quarter = 'Q3',  amount, null)) AS 'Q3',
  MAX (IF(quarter = 'Q4', amount, null))   AS  'Q4'
FROM QuarterResults GROUP BY year;


The above two solutions generate the same result which is shown on top, transposing rows into columns.


Other options include WITH ROLLUP + grouping and aggregation, UNION + grouping and aggregation, and so forth. They are practically the same: after grouping, compute the year value, then construct additional columns Q1-Q4 by enumeration and aggregation.

Even for the most basic and simplest transposition, the SQL code is long. This is because each additional column must be enumerated. The longer the code becomes, the more additional columns there are. Consider what the code would look like if the additional columns were 12 months, states, and provinces of a country.

Only if the additional columns are known will the poor enumeration of the new columns affect the code length, not the code complexity. It's tough to enumerate additional columns if they can't be known ahead of time. Converting dynamic row-based VIP customers into field names is one example. 

It's difficult to accomplish this with SQL alone. To deal with it, we usually use a stored procedure or a high-level language like Java. However, the coding complexity and maintenance costs will skyrocket.


PIVOT/UNPIVOT functions

Database providers supplied custom methods to implement the techniques to make the transposition easier.
The PIVOT function is used to transpose rows to columns.

Now, considering one of the biggest SQL vendors, ORACLE, they provide an inbuilt function to pivot a table. let's see this in our example. (Note: this function is only available in oracle with this name)

select * from   QuarterResults
  pivot(
    max(amount)  for quarter in(
      Q1'as Q1,'Q2'as Q2,'Q3'as  Q3,'Q4' as Q4))


The PIVOT function shortens the code, but it misses the main issue. SQL's flaws are still present.
It is unable of dealing with the issue of dynamic columns. Java or a stored procedure is still required. It's also challenging to write and maintain the code.

It is unable to address the issue of set-based operations. All it can do is aggregate all situations, even if they aren't essential. That's the tough nut to crack for new users, and it takes a lot of extra effort.

The aggregation is required in some instances. To do a row to column transposition in order to obtain numerous rows for each quarter's record, and to determine the largest amount each quarter based on the grouped sales table.

The UNPIVOT function is easier than the PIVOT to write. This is because when you are performing the UNPIVOT function, SQL already knows that you have PIVOTED/TRANSPOSED the table earlier and now you want to redo/UNPIVOT it.

select * from   QuarterResults
  unpivot(
    amount for  quarter in(
      Q1,Q2,Q3,Q4))


Conclusion

Following the full explanations, you can see that there are only three sorts of basic transpositions that can be handled directly in huge databases using SQL PIVOT and UNPIVOT. Although the Pivot option was demonstrated to be the simplest, it was shown to be the least optimum due to its inability to accommodate dynamic columns. We will discuss more efficient options in upcoming articles.

Other SQL Articles and Tutorials you may like
Thanks for reading this article so far. If you like this Transposing data in SQL  tutorial and example then please share it with your friends and colleagues. If you have any questions or feedback then please drop a note. 

No comments:

Post a Comment