Advanced Query Features¶

This section covers the range of functions that are not widely standardized across all SQL databases or meet special needs. PyPika intends to support as many features across different platforms as possible. If there are any features specific to a certain platform that PyPika does not support, please create a GitHub issue requesting that it be added.

Handling different database platforms¶

There can sometimes be differences between how database vendors implement SQL in their platform, for example which quote characters are used. To ensure that the correct SQL standard is used for your platform, the platform-specific Query classes can be used.

from pypika import MySQLQuery, MSSQLQuery, PostgreSQLQuery, OracleQuery, VerticaQuery

You can use these query classes as a drop in replacement for the default Query class shown in the other examples. Again, if you encounter any issues specific to a platform, please create a GitHub issue on this repository.

GROUP BY Modifiers¶

The ROLLUP modifier allows for aggregating to higher levels that the given groups, called super-aggregates.

from pypika import Rollup, functions as fn

products = Table('products')

query = Query.from_(products) \
    .select(products.id, products.category, fn.Sum(products.price)) \
    .rollup(products.id, products.category)

SELECT "id","category",SUM("price") FROM "products" GROUP BY ROLLUP("id","category")

Pseudo Column¶

A pseudo-column is an SQL assigned value (pseudo-field) used in the same context as an column, but not stored on disk. The pseudo-column can change from database to database, so here it’s possible to define them.

from pypika import Query, PseudoColumn

CurrentDate = PseudoColumn('current_date')

Query.from_('products').select(CurrentDate)

SELECT current_date FROM "products"

Analytic Queries¶

The package pypika.analytic contains analytic function wrappers. These can be used in SELECT clauses when building queries for databases that support them. Different functions have different arguments but all require some sort of partitioning.

`NTILE` and `RANK`¶

The NTILE function requires a constant integer argument while the RANK function takes no arguments. clause.

from pypika import Query, Table, analytics as an, functions as fn

store_sales_fact, date_dimension = Table('store_sales_fact', schema='store'), Table('date_dimension')

total_sales = fn.Sum(store_sales_fact.sales_quantity).as_('TOTAL_SALES')
calendar_month_name = date_dimension.calendar_month_name.as_('MONTH')
ntile = an.NTile(4).order_by(total_sales).as_('NTILE')

query = Query.from_(store_sales_fact) \
    .join(date_dimension).using('date_key') \
    .select(calendar_month_name, total_sales, ntile) \
    .groupby(calendar_month_name) \
    .orderby(ntile)

SELECT "date_dimension"."calendar_month_name" "MONTH",SUM("store_sales_fact"."sales_quantity") "TOTAL_SALES",NTILE(4) OVER(PARTITION BY  ORDER BY SUM("store_sales_fact"."sales_quantity")) "NTILE" FROM "store"."store_sales_fact" JOIN "date_dimension" USING ("date_key") GROUP BY "date_dimension"."calendar_month_name" ORDER BY NTILE(4) OVER(PARTITION BY  ORDER BY SUM("store_sales_fact"."sales_quantity"))

`FIRST_VALUE` and `LAST_VALUE`¶

FIRST_VALUE and LAST_VALUE both expect a single argument. They also support an additional IGNORE NULLS clause.

from pypika import Query, Table, analytics as an

t_month = Table('t_month')

first_month = an.FirstValue(t_month.month) \
    .over(t_month.season) \
    .orderby(t_month.id)

last_month = an.LastValue(t_month.month) \
    .over(t_month.season) \
    .orderby(t_month.id) \
    .ignore_nulls()

query = Query.from_(t_month) \
    .select(first_month, last_month)

SELECT FIRST_VALUE("month") OVER(PARTITION BY "season" ORDER BY "id"),LAST_VALUE("month" IGNORE NULLS) OVER(PARTITION BY "season" ORDER BY "id") FROM "t_month"

`MEDIAN`, `AVG` and `STDDEV`¶

These functions take one or more arguments

from pypika import Query, Table, analytics as an

customer_dimension = Table('customer_dimension')

median_income = an.Median(customer_dimension.annual_income).over(customer_dimension.customer_state).as_('MEDIAN')
avg_income = an.Avg(customer_dimension.annual_income).over(customer_dimension.customer_state).as_('AVG')
stddev_income = an.StdDev(customer_dimension.annual_income).over(customer_dimension.customer_state).as_('STDDEV')

query = Query.from_(customer_dimension) \
    .select(median_income, avg_income, stddev_income) \
    .where(customer_dimension.customer_state.isin(['DC', 'WI'])) \
    .orderby(customer_dimension.customer_state)

SELECT MEDIAN("annual_income") OVER(PARTITION BY "customer_state") "MEDIAN",AVG("annual_income") OVER(PARTITION BY "customer_state") "AVG",STDDEV("annual_income") OVER(PARTITION BY "customer_state") "STDDEV" FROM "customer_dimension" WHERE "customer_state" IN ('DC','WI') ORDER BY "customer_state"

Window Frames¶

Functions which use window aggregation expose the functions rows() and range() with varying parameters to define the window. Both of these functions take one or two parameters which specify the offset boundaries. Boundaries can be set either as the current row with an.CURRENT_ROW or a value preceding or following the current row with an.Preceding(constant_value) and an.Following(constant_value). The ranges can be unbounded preceding or following the current row by omitting the constant_value parameter like an.Preceding() or an.Following().

FIRST_VALUE and LAST_VALUE also support window frames.

from pypika import Query, Table, analytics as an

t_transactions = Table('t_customers')

rolling_7_sum = an.Sum(t_transactions.total) \
    .over(t_transactions.item_id) \
    .orderby(t_transactions.day) \
    .rows(an.Preceding(7), an.CURRENT_ROW)

query = Query.from_(t_transactions) \
    .select(rolling_7_sum)

SELECT SUM("total") OVER(PARTITION BY "item_id" ORDER BY "day" ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) FROM "t_customers"