PostgreSQL – CUME_DIST Function

The CUME_DIST() function is used to calculate the cumulative distribution of values within a result set. This function gives you the proportion of values that are less than or equal to the current value in the ordered result set. It’s often used to analyze the relative position of a value within a dataset, especially in the context of percentiles and cumulative distributions.

The syntax of the CUME_DIST() function is as follows:

CUME_DIST() OVER (PARTITION BY partition_expression ORDER BY sort_expression)
  • PARTITION BY: An optional clause that divides the result set into partitions or groups. The function operates separately within each partition. If omitted, the function operates on the entire result set.
  • ORDER BY: Specifies the column(s) by which you want to order the result set for calculating the cumulative distribution.

Here’s a simple example of how to use the CUME_DIST() function in PostgreSQL:

Suppose you have a table named “scores” with columns “student_name” and “score”, and you want to calculate the cumulative distribution of scores:

SELECT student_name, score, CUME_DIST() OVER (ORDER BY score) AS cumulative_distribution FROM scores;

In this example, the CUME_DIST() function calculates the cumulative distribution of scores in ascending order. The result might look like this:

student_name | score | cumulative_distribution

-------------+-------+-------------------------

Alice | 80 | 0.333

Bob | 85 | 0.500

Charlie | 90 | 0.666

David | 92 | 0.833

Emily | 95 | 1.000

In this output, the “cumulative_distribution” column contains the proportion of scores that are less than or equal to the current score. For example, for “Alice” with a score of 80, the cumulative distribution is approximately 0.333, indicating that about 33.3% of the scores are less than or equal to 80.

The CUME_DIST() function is useful when you want to understand the relative position of a value within a dataset’s distribution.

Similar Posts