top of page

Here are some common data transformation queries using SQL

Tools like PowerBI and Tableau can connect directly to a database. If you need to transform your data, you can use Power Query or Tableau Data Prep.  SQL is used in certain cases to transform your data instead of using the said tools. Also, if you are working with a very large database, you may want to aggregate your data using SQL and then use Power BI or Tableau to display the to information.  Here are a few things that you can do with SQL to transform or aggregate your data.  Please see the footer where credit is give to on the author of the book: SQL for Data Analysis.
 

PIVOT / UNPIVOT

This is an example of what is sometimes know as a Pivot Table - a table that summarizes data in a table which will aggregate the specified column(s). Let say we have a simple table with transactional data:  customer id and order_amount.  Here is the Pivot:

SELECT customer_id , sum(order_amount) as total_amount
FROM orders GROUP BY customer_id;

Many times data is "pivoted" (but not aggregated) in order to create a more readable table.  However programs prefer to deal with less columns and more rows.  Therefore, many times there is a need to UN- Pivot the data. Let's say we want to eliminate multiple year columns and only have one year column with one population column that hold the data that was in the year column.   Here is the Unpivot:

SELECT country , '1980' as year, year_1980 as population
FROM county_populations
UNION ALL
SELECT country , '1990' as year, year_1990 as population
FROM county_populations
UNION ALL;

FIND ANOMALIES

A good way to find anomalies is to sort each column by ascending and/or descending order and just view and identify the records.  Then you can decide how to address.  Below is an example of a sort and then a way to select records that do not have a null for a specific column.

SELECT mag
FROM earthquakes
ORDER BY mag desc;

SELECT mag
FROM earthquakes
WHERE mag is not null
ORDER BY mag ;

SELECT mag
FROM earthquakes
WHERE mag > 0.0
ORDER BY mag ;
 

Clean/Replace Text

There is a vast collection of string functions and operators that SQL has to help you transform your data.  This includes the use of Regular Expressions.

SELECT TRIM(first_name) , TRIM(last_name);

SELECT REGEXP_REPLACE( name, '[[:punct:]]', '');
 

This page has examples and is inspired by the book "SQL for Data Analysis" by Cathy Tanimura, 978-1-492-08878-3

©2021 by Impactful Analytics.

bottom of page