top of page

Crosstab vs Pivot

Crosstab and Pivot are terms used quite interchangeably. But are they the same or is there a difference between them ? This post describes them and discusses the similarities and differences.


Crosstab

A crosstab (also called crosstabulation or a contingency table) is a table with frequency distribution of more than one variables in a single matrix like form (across rows and columns). It offers to quickly depict visually the inter-relations or interactions between the variables. The main requirement is that the data should be categorical (values of the variables should be mutually exclusive) and numerical (countable). Crosstabs are used a lot in survey-data analysis, business intelligence and scientific research.


For example, consider that we want to understand - How does the prevalence of anemia differ between men and women. A survey was conducted on a population of 100 men & women. 52 men and 48 women. Out of them 36 were found anemic. This gives us a rough idea that overall 36% of population is anemic. But suppose - we further want to understand does the propensity to be anemic depend on gender.


Let us put the values in the following contingency table or Crosstab. The variables here are Anemic and Gender


This table gives us a better idea in one glance that women to be more anemic then men.

Plotting the same table in terms of % of rows (gender) gives an even better view. For each cell calculate the % of the Total row.

67% of the total anemics in the survey are women as compared to 33% of men found anemic.


And if we plot the other way round. For each cell calculate the % of the Total column.

50% of all women who were surveyed were found anemic as compared to only 23% men.


We quickly see that women look to have a more propensity to be anemic then men.

But by how much ? So - Let us plot the "index" now.

For each cell calculate the following Proportion of the cell in the total / Proportion of the population of the total population. i.e: for Anemic Women = (24 / 48) / (24/36) and likewise all of them.



So - women are 1.39 times more likely to be anemic than general population. Men are 1.2 times more likely to be non-anemic than the general population.

So we see that a crosstab gives a very quick visual overview of the data. The above example was a very simplistic example. Let us suppose that each of these variables had a number of possible values (categories), like in the following example and the task was to understand the patterns, it could have been mentally difficult.



But just a mere look at this crosstab makes it effortless to understand the patterns.


The examples above show just the counts (or sums), percentages or index as the statistical parameters. There are a number of other statistical parameters are used with crosstabs. Some of the notable ones being Odds-Ratio, Phi Coefficient and Chi-Square.


Crosstabs usually refer to the basic two variable table with basic statistical features. While it is simple but its simplicity is its power.

As software and statistical tools became more advanced, features kept getting added to crosstabs, leading to what we call as Pivots


Pivot

A pivot is a more extensive table of data showing inter-relationships between variables or a group of variables. A pivot also consists of rows, columns and data (also called facts or values). But the feature set of a Pivot is much extensive. Features consist of filtering, a large number of different kinds of statistical values or data items, aggregations like sum, average, standard deviation etc., ability to handle much larger number of columns of rows. The data grids are interactive, advanced views as well as drill-downs are supported.


Most databases, spreadsheets, and statistical packages support Pivot tables. They allow easy creation by drag-drop of rows, columns, filters, values and aggregations. Some of them also provide connected pivot charts having bar charts, scatter charts, line charts etc.


A list of such popular spreadsheets and databases is given below


Spreadsheets
  1. Microsoft Excel - supports PivotTables and PivotCharts

  2. OpenOffice - supports pivot tables

  3. Google sheets - supports pivot tables

  4. Numbers (from Apple) - supports pivot tables

Databases

Databases also support pivot functions or querying capabilities that present you the results in a pivot table format format directly from the database itself.

  1. Postgresql - pivot tables - tablefunc module

  2. Microsoft Access - pivot queries - crosstab queries

  3. Microsoft SQL Server - pivot queries - FROM.. PIVOT keywords

  4. Oracle - supports PIVOT operation


Python

Programming languages, packages, like python, built for data science also support methods for both Crosstab as well as Pivot. The popular data analysis package pandas provides it.


Crosstab - pd.crosstab(<arrays of index/columns/values>) # pandas as pd

Pivot - pd.pivot_table(<dataframe>, index, columns, aggfun, fill_value, margins_name)


As it can be seen here also that the pivot_table has more options, but if you need a basic frequency table, crosstab is better.


crosstab has only the len aggfunc, while pivot_table has methods like np.mean,

margins_name is only in pivot_table. In pivot_table you can use Grouper for index and columns keywords.


However one of the options - normalize - is only available in crosstab. possible values of this argument are

  • 'index' - normalize by dividing each cell by the sum of its row

  • 'columns' - normalize by dividing each cell by the sum of its column

  • True - divide each cell by the total of all cells in the table


References


102 views0 comments

Recent Posts

See All

Harnessing the Power of TimescaleDB

TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL and packaged

Comentarios


bottom of page