Open source packages - Quarto, Shiny, and more Commercial enterprise offerings

Fifty Releases of Pointblank: a Year of Building Data Quality Tooling

Written by Rich Iannone
2022-09-26
Logo with the gold number 50 above the word Pointblank in a stylized dark red font on a cream background.

It is difficult to believe that we have reached the fiftieth release of the Python package Pointblank (from initial release v0.1.0 to the just-released v0.19.0). When this project began in late 2024, the goal was straightforward: create a Python library that makes it easy to validate tabular data and communicate results effectively. What has emerged over the past year is a comprehensive toolkit for data quality that spans multiple interfaces, supports dozens of spoken languages for reporting, and integrates with virtually any tabular data source you might encounter in your work. This post will take you through the journey of these fifty releases, highlighting the features and capabilities that have been built along the way.

The development of Pointblank has been guided by several core principles that have remained constant even as the feature set has expanded considerably. I have always held that excellent reporting should accompany comprehensive validation methods. And I have striven to make the library work with data in whatever form it takes, whether that means Polars DataFrames, Pandas DataFrames, database tables via Ibis, or even just paths to CSV and Parquet files. There are now myriad ways to perform validation checks because different contexts call for different approaches. Throughout it all, I have tried to be responsive to what users actually need, building features that data owners have requested and ensuring that the library integrates well into pipeline processes. Each of these themes warrants some exploration, so let us examine them in turn with concrete examples drawn from the documentation.

Comprehensive Validation Methods with Excellent Reporting

The foundation of any data quality library is its validation methods, and Pointblank has accumulated a substantial collection of them over these fifty releases. The validation workflow follows a consistent pattern: you create a Validate object with your data, chain together validation methods that define your data quality rules, and call interrogate() to execute the validation plan. The result is a validation report that provides detailed information about each step.

import pointblank as pb
import polars as pl

# Sample data for demonstration
data = pl.DataFrame({
    "id": range(1, 11),
    "value": [120, 85, 47, 210, 30, 155, 175, 95, 205, 140],
    "category": ["A", "B", "C", "A", "D", "B", "A", "E", "A", "C"],
    "ratio": [0.5, 0.7, 0.3, 1.2, 0.8, 0.9, 0.4, 1.5, 0.6, 0.2],
})

validation = (
    pb.Validate(
        data=data,
        tbl_name="sales_data",
        label="Example validation demonstrating multiple methods"
    )
    .col_vals_gt(columns="value", value=50)
    .col_vals_in_set(columns="category", set=["A", "B", "C"])
    .col_vals_le(columns="ratio", value=1.0)
    .rows_distinct()
    .col_exists(columns=["id", "value", "category"])
    .interrogate()
)

validation
Pointblank Validation
Example validation demonstrating multiple methods
Polarssales_data
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_gt
col_vals_gt()
value 50 10 8
0.80
2
0.20
#4CA64C66 2
col_vals_in_set
col_vals_in_set()
category A, B, C 10 8
0.80
2
0.20
#4CA64C66 3
col_vals_le
col_vals_le()
ratio 1.0 10 8
0.80
2
0.20
#4CA64C 4
rows_distinct
rows_distinct()
ALL COLUMNS 10 10
1.00
0
0.00
#4CA64C 5
col_exists
col_exists()
id 1 1
1.00
0
0.00
#4CA64C 6
col_exists
col_exists()
value 1 1
1.00
0
0.00
#4CA64C 7
col_exists
col_exists()
category 1 1
1.00
0
0.00
2026-04-13 12:47:02 UTC< 1 s2026-04-13 12:47:02 UTC

The validation methods span several categories that address different aspects of data quality. Column value validations like col_vals_gt(), col_vals_between(), and col_vals_in_set() check individual values against specified constraints. Row-based validations like rows_distinct() and rows_complete() examine entire rows for duplicates or missing values. Table structure validations like col_exists() and col_schema_match() verify that the data conforms to expected schemas. Aggregate validations like col_sum_gt() and col_sd_lt() check summary statistics. And the recently added data_freshness() method allows you to verify that your data is up to date.

The validation report itself is a Great Tables object, which means it inherits all the formatting and display capabilities that Great Tables provides. Each row in the report corresponds to a validation step, with columns showing the validation type, the columns involved, the parameters used, and the results. The results section shows the total number of test units, how many passed, how many failed, and whether any threshold levels were exceeded. The visual status indicators on the left edge of each row provide an immediate signal about the health of each validation step.

One feature that emerged relatively early in development was the step report, which provides a detailed view of a single validation step. When you encounter failures in your validation report, you often want to know exactly which rows failed and why. The get_step_report() method provides this information in a focused, customizable format.

# Get a detailed step report for the `col_vals_in_set()` validation (step 2)
step_report = validation.get_step_report(i=2)
step_report
Report for Validation Step 2
ASSERTION category ∈ {A, B, C}
2 / 10 TEST UNIT FAILURES IN COLUMN 3
EXTRACT OF ALL 2 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
category
String
ratio
Float64
5 5 30 D 0.8
8 8 95 E 1.5

Step reports highlight the failing rows and clearly indicate which column was under examination. They can be customized with parameters like columns_subset= to show only relevant columns, limit= to cap the number of rows displayed, and header= to provide custom context. The reports even support templating in the header, allowing you to include dynamic information about the validation step alongside your own explanatory text.

Schema validation step reports have a completely different structure, comparing expected versus actual column data types and presence. This format makes it easy to see exactly where the schema expectations differ from reality.

# Define an expected schema for the small_table dataset
schema = pb.Schema(
    columns=[
        ("date_time", "timestamp"),
        ("dates", "date"),
        ("a", "int64"),
        ("b",),
        ("c",),
        ("d", "float64"),
        ("e", ["bool", "boolean"]),
        ("f", "str"),
    ]
)

# Create a validation with a schema check
validation_schema = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="duckdb"),
        tbl_name="small_table",
        label="Schema validation example"
    )
    .col_schema_match(schema=schema)
    .interrogate()
)

# Display the step report for the schema validation
validation_schema.get_step_report(i=1)
Report for Validation Step 1
COLUMN SCHEMA MATCH
COMPLETE
IN ORDER
COLUMN ≠ column
DTYPE ≠ dtype
float ≠ float64
TARGET EXPECTED
COLUMN DATA TYPE COLUMN DATA TYPE
1 date_time timestamp(6) 1 date_time timestamp
2 date date 2 dates date
3 a int64 3 a int64
4 b string 4 b
5 c int64 5 c
6 d float64 6 d float64
7 e boolean 7 e bool | boolean
8 f string 8 f str
Supplied Column Schema:
[('date_time', 'timestamp'), ('dates', 'date'), ('a', 'int64'), ('b',), ('c',), ('d', 'float64'), ('e', ['bool', 'boolean']), ('f', 'str')]

The schema step report presents a side-by-side comparison of expected and actual column information, making mismatches immediately apparent. This is particularly valuable when working with data from external sources where the schema may drift over time, or, when onboarding new datasets where you need to verify that they conform to established standards.

The combination of comprehensive validation methods and detailed reporting creates a workflow where data quality issues can be identified quickly and communicated clearly. Whether you are checking individual values, examining row level patterns, or validating entire schemas, Pointblank provides the tools to understand your data and share that understanding with others.

Data in Many Forms

One of the design decisions that has paid dividends over time is the commitment to supporting data in whatever form it arrives. Pointblank does not require you to convert your data to a particular format before validation. If you have a Polars DataFrame, validate it directly. If you have a Pandas DataFrame, that works too! If your data lives in a database that Ibis supports, you can validate it there without materializing the entire table into memory. And if your data is sitting in a CSV or Parquet file, you can simply provide the path.

# Validate data from a file path directly
validation_from_file = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        tbl_name="small_table",
        label="Validation from loaded dataset"
    )
    .col_vals_gt(columns="a", value=0)
    .col_vals_between(columns="d", left=0, right=10000)
    .interrogate()
)

validation_from_file
Pointblank Validation
Validation from loaded dataset
Polarssmall_table
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 13 13
1.00
0
0.00
#4CA64C 2
col_vals_between
col_vals_between()
d [0, 10000] 13 13
1.00
0
0.00
2026-04-13 12:47:02 UTC< 1 s2026-04-13 12:47:03 UTC

The underlying implementation uses Narwhals to provide a unified interface across DataFrame libraries, and Ibis to handle database connections. This architecture means that your validation code remains largely the same regardless of where your data lives. You write your validation plan once, and it works with Polars, Pandas, DuckDB, PostgreSQL, MySQL, SQLite, Spark, BigQuery, and many other backends that Ibis supports. The recent addition of native Spark DataFrame support through Narwhals (without requiring Ibis) further expanded the range of data sources that can be validated efficiently.

The preview() function exemplifies this data-agnostic philosophy. It provides a quick look at any supported table type with consistent formatting, showing row numbers and optionally an information header with details about the table.

# Preview a dataset with informative header
pb.preview(
    pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
    n_head=5,
    n_tail=3,
    incl_header=True
)
PolarsRows2,000Columns11
player_id
String
session_id
String
session_start
Datetime
time
Datetime
item_type
String
item_name
String
item_revenue
Float64
session_duration
Float64
start_day
Date
acquisition
String
country
String
1 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:31:27+00:00 iap offer2 8.99 16.3 2015-01-01 google Germany
2 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:36:57+00:00 iap gems3 22.49 16.3 2015-01-01 google Germany
3 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:37:45+00:00 iap gold7 107.99 16.3 2015-01-01 google Germany
4 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:42:33+00:00 ad ad_20sec 0.76 16.3 2015-01-01 google Germany
5 ECPANOIXLZHF896 ECPANOIXLZHF896-hdu9jkls 2015-01-01 11:50:02+00:00 2015-01-01 11:55:20+00:00 ad ad_5sec 0.03 35.2 2015-01-01 google Germany
1998 RMOSWHJGELCI675 RMOSWHJGELCI675-vbhcsmtr 2015-01-21 02:39:48+00:00 2015-01-21 02:40:00+00:00 ad ad_5sec 0.03 8.4 2015-01-10 other_campaign France
1999 RMOSWHJGELCI675 RMOSWHJGELCI675-vbhcsmtr 2015-01-21 02:39:48+00:00 2015-01-21 02:47:12+00:00 iap offer5 26.09 8.4 2015-01-10 other_campaign France
2000 GJCXNTWEBIPQ369 GJCXNTWEBIPQ369-9elq67md 2015-01-21 03:59:23+00:00 2015-01-21 04:06:29+00:00 ad ad_5sec 0.12 18.5 2015-01-14 organic United States

The ability to work with data in many forms has implications beyond the core validation workflow. The preview() function, for instance, has found use in exploratory data analysis sessions where analysts want a quick look at unfamiliar datasets without writing boilerplate code. Two other functions help you get better acquainted with a new dataset: col_summary_tbl() provides a comprehensive statistical summary of each column, while missing_vals_tbl() gives you an immediate view of missing data patterns across all columns.

# Get a statistical summary of each column
pb.col_summary_tbl(pb.load_dataset(dataset="game_revenue", tbl_type="polars"))
PolarsRows2,000Columns11
Column NA UQ Mean SD Min P5 Q1 Med Q3 P95 Max IQR
string
player_id
String
0
0
76
0.04
15 0 15 15 15 15 15 15 15 0
string
session_id
String
0
0
618
0.31
24 0 24 24 24 24 24 24 24 0
date
session_start
Datetime(time_unit='us', time_zone='UTC')
0
0
618
0.31
- - 2015
01
01 01:31:03+00:00
- - - - - 2015
01
21 03:59:23+00:00
-
date
time
Datetime(time_unit='us', time_zone='UTC')
0
0
1989
0.99
- - 2015
01
01 01:31:27+00:00
- - - - - 2015
01
21 04:06:29+00:00
-
string
item_type
String
0
0
2
<.01
2.22 0.42 2 2 2 2 2 3 3 0
string
item_name
String
0
0
24
0.01
7.35 1.27 5 5 7 8 8 9 11 1
numeric
item_revenue
Float64
0
0
508
0.25
4.34 13.21 0.004 0.01 0.09 0.38 1.25 21.99 142.99 1.16
numeric
session_duration
Float64
0
0
310
0.15
25.63 9.55 3.2 4.2 18.5 26.5 33.82 39.5 41.0 15.32
date
start_day
Date
0
0
20
0.01
- - 2015
01
01
- - - - - 2015
01
21
-
string
acquisition
String
0
0
6
<.01
7.97 2.54 5 5 7 7 8 14 14 1
string
country
String
0
0
23
0.01
8.53 3.13 5 5 6 7 12 13 14 6
String columns statistics regard the string's length.

The column summary table provides a wealth of information at a glance: data types, counts of unique values, missing value percentages, and key statistics. This kind of overview is really valuable when you first encounter a dataset and need to understand its structure before diving into validation rules.

Complementing the column summary, the missing_vals_tbl() function focuses specifically on missing data patterns, which is often one of the first things you want to understand about a new dataset.

# View missing data patterns across all columns
pb.missing_vals_tbl(pb.load_dataset(dataset="game_revenue", tbl_type="polars"))
Missing Values
PolarsRows2,000Columns11
Column Row Sector
1 2 3 4 5 6 7 8 9 10
player_id
session_id
session_start
time
item_type
item_name
item_revenue
session_duration
start_day
acquisition
country
NO MISSING VALUES     PROPORTION MISSING:  
0%
100%
ROW SECTORS
  1. 1 – 200
  2. 201 – 400
  3. 401 – 600
  4. 601 – 800
  5. 801 – 1000
  6. 1001 – 1200
  7. 1201 – 1400
  8. 1401 – 1600
  9. 1601 – 1800
  10. 1801 – 2000

These utilities emerged from the same architectural foundation that supports validation, demonstrating how a commitment to data source flexibility creates opportunities for useful tooling beyond the original scope of the project. When you encounter a new dataset, these functions provide the quick orientation that helps you understand the data before you begin writing validation rules.

Flexible Approaches to Validation

Different contexts call for different approaches to data validation. A data scientist exploring a new dataset in a Jupyter notebook has different needs than a data engineer building an automated pipeline. Pointblank addresses this by providing multiple interfaces for performing validation: the Python API, YAML configuration files, and a command line utility.

The Python API offers the most flexibility and is ideal for complex validation logic. You have access to the full power of Python for preprocessing, custom expressions, and integration with other libraries. The interface makes validation plans readable and maintainable.

One of the most powerful features of the Python API is the pre= parameter, which enables preprocessing of data before validation without modifying the original dataset. This creates enormous flexibility in how you construct validation steps. You can create a computed column and validate that column in a single validation step, and the mutation of the table is completely isolated to that step. The original data remains untouched, and subsequent validation steps see the original table unless they define their own preprocessing.

# Python API with preprocessing: create and validate a computed column
validation_with_pre = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Validation with preprocessing"
    )
    .col_vals_gt(
        columns="a",
        value=5,
        pre=lambda df: df.with_columns(pl.col("a") * 2)  # Double values before checking
    )
    .col_vals_gt(
        columns="computed_ratio",
        value=0,
        pre=lambda df: df.with_columns((pl.col("d") / pl.col("c")).alias("computed_ratio"))
    )
    .interrogate()
)

validation_with_pre
Pointblank Validation
Validation with preprocessing
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_gt
col_vals_gt()
a 5 13 9
0.69
4
0.31
#4CA64C66 2
col_vals_gt
col_vals_gt()
computed_ratio 0 13 11
0.85
2
0.15
2026-04-13 12:47:03 UTC< 1 s2026-04-13 12:47:03 UTC

Notes

Step 1 (pre_applied) Precondition applied: no table dimension change.

Step 2 (pre_applied) Precondition applied: table dimensions [13 rows, 8 columns][13 rows, 9 columns].

Step 2 (syn_target_col) Synthetic target column computed_ratio created via preprocessing.

Notice the notes that appear in the validation report for steps that use pre=. These notes indicate that preprocessing was applied, helping readers understand that the validation was performed on transformed data rather than the original table. The notes system appears throughout Pointblank to provide contextual information when steps encounter conditions worth flagging, such as column selectors that do not resolve to any columns, schema comparison results, or other situations where additional explanation helps users understand what happened during validation.

This preprocessing approach is particularly valuable when you need to validate derived metrics, normalized values, or relationships between columns that require computation. Rather than preprocessing your entire dataset before validation, you can express the transformation and the validation together, making your validation plan self-documenting and easier to maintain.

YAML configuration files provide a declarative, readable approach that works well for teams. They can be version controlled alongside your data processing code, enable non-programmers to contribute to data quality definitions, and provide clear separation between validation logic and execution code. The yaml_interrogate() function executes these configurations. Here is an example YAML validation file:

demo_validation.yaml
tbl: small_table
df_library: polars
tbl_name: "YAML Validation Example"
label: "Validation defined in YAML"
steps:
  - rows_distinct
  - col_exists:
      columns: [a, b, c, d]
  - col_vals_not_null:
      columns: [a, b]
  - col_vals_gt:
      columns: a
      value: 0

Running this validation is straightforward with yaml_interrogate():

# Execute the YAML-defined validation
yaml_result = pb.yaml_interrogate("demo_validation.yaml")
yaml_result
Pointblank Validation
Validation defined in YAML
PolarsYAML Validation Example
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
rows_distinct
rows_distinct()
ALL COLUMNS 13 9
0.69
2
0.15
#4CA64C 2
col_exists
col_exists()
a 1 1
1.00
0
0.00
#4CA64C 3
col_exists
col_exists()
b 1 1
1.00
0
0.00
#4CA64C 4
col_exists
col_exists()
c 1 1
1.00
0
0.00
#4CA64C 5
col_exists
col_exists()
d 1 1
1.00
0
0.00
#4CA64C 6
col_vals_not_null
col_vals_not_null()
a 13 13
1.00
0
0.00
#4CA64C 7
col_vals_not_null
col_vals_not_null()
b 13 13
1.00
0
0.00
#4CA64C 8
col_vals_gt
col_vals_gt()
a 0 13 13
1.00
0
0.00
2026-04-13 12:47:03 UTC< 1 s2026-04-13 12:47:03 UTC

The command line interface, accessed through the pb command, enables quick data quality checks directly in the terminal. This is particularly useful for CI/CD pipelines, shell scripts, and automation workflows where you need immediate feedback without starting a Python session. The pb validate command runs common validation checks with simple arguments, while pb run executes Python validation scripts from the command line.

The above demonstration shows some of the more essential CLI workflows: using pb info to get table dimensions, running pb validate with checks like rows-distinct and col-vals-not-null, and using the --show-extract option to see exactly which rows are problematic when validation fails. The CLI provides clear, color-coded output that makes it easy to interpret results at a glance. Failing rows can also be saved to CSV files for further investigation, and the --exit-code option makes it straightforward to integrate validation into pipelines that need to halt on data quality failures.

Communication is Key

Data quality is fundamentally a communication challenge. It is not enough to detect problems; you must also communicate those problems to the people who can fix them, in a form they can understand. Pointblank approaches this challenge from several angles.

The validation report itself is designed for communication. It provides a complete picture of data quality in a single view, with visual indicators that draw attention to problems. The report can be customized with labels and governance metadata including owner, consumers, and version information, which helps document data ownership and dependencies.

# Validation with governance metadata and automatic briefs
governance_validation = (
    pb.Validate(
        data=data,
        tbl_name="sales_data",
        label="Q4 Sales Data Quality Report",
        owner="data-platform-team",
        consumers=["ml-team", "analytics", "finance"],
        version="2.1.0",
        brief=True,
    )
    .col_vals_gt(columns="value", value=0)
    .col_vals_in_set(columns="category", set=["A", "B", "C", "D", "E"])
    .interrogate()
)

governance_validation
Pointblank Validation
Q4 Sales Data Quality Report
Polarssales_data
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()

Expect that values in value should be > 0.

value 0 10 10
1.00
0
0.00
#4CA64C 2
col_vals_in_set
col_vals_in_set()

Expect that values in category should be in the set of A, B, C, and 2 more.

category A, B, C, D, E 10 10
1.00
0
0.00
2026-04-13 12:47:03 UTC< 1 s2026-04-13 12:47:03 UTC
Owner: data-platform-teamConsumers: ml-team, analytics, financeVersion: 2.1.0

Notice the automatic brief text that appears in each row of the report, describing in plain language what the validation step is checking. Setting brief=True in the Validate() call enables these autobriefs for all steps, making the report immediately understandable to stakeholders who may not be familiar with validation method names like col_vals_gt() or col_vals_in_set(). You can also provide custom brief text on individual steps if the automatic description does not fit your needs.

Language localization is another dimension of communication that has received significant attention. Validation reports can be generated in dozens of languages, making data quality accessible to international teams. All official EU languages are supported, along with many others, including Arabic, Hindi, Greek, Traditional Chinese, Japanese, Korean, Thai, Hebrew, Indonesian, Ukrainian, Farsi, and more. The lang= parameter in Validate controls the report language.

# Validation report translated to French
french_validation = (
    pb.Validate(
        data=data,
        tbl_name="sales_data",
        label="Rapport de qualité des données",
        lang="fr",
        brief=True,
    )
    .col_vals_gt(columns="value", value=50)
    .col_vals_in_set(columns="category", set=["A", "B", "C"])
    .col_vals_le(columns="ratio", value=1.0)
    .interrogate()
)

french_validation
Validation Pointblank
Rapport de qualité des données
Polarssales_data
ÉTAPE COLONNES VALEURS TBL EVAL UNITÉ RÉUS ÉCHEC W E C EXT
#4CA64C66 1
col_vals_gt
col_vals_gt()

On s'attend à ce que les valeurs de value soient > 50.

value 50 10 8
0,80
2
0,20
#4CA64C66 2
col_vals_in_set
col_vals_in_set()

On s'attend à ce que les valeurs de category soient une des valeurs suivantes :A, B, C.

category A, B, C 10 8
0,80
2
0,20
#4CA64C66 3
col_vals_le
col_vals_le()

On s'attend à ce que les valeurs de ratio soient <= 1.0.

ratio 1.0 10 8
0,80
2
0,20
2026-04-13 12:47:03 UTC< 1 s2026-04-13 12:47:03 UTC

The actions system enables proactive communication when validation problems occur. You can define actions that trigger when specific threshold levels are exceeded, such as printing messages to the console or sending Slack notifications with the send_slack_notification() function. This transforms validation from a passive reporting exercise into an active monitoring system.

# Validation with threshold-based actions
action_validation = (
    pb.Validate(
        data=data,
        label="Validation with actions",
        thresholds=pb.Thresholds(warning=1, error=3)
    )
    .col_vals_gt(
        columns="value",
        value=100,
        actions=pb.Actions(
            warning="Warning: Some values below 100 threshold",
            error="Error: Many values below 100 threshold"
        )
    )
    .interrogate()
)

action_validation
Error: Many values below 100 threshold
Pointblank Validation
Validation with actions
PolarsWARNING1ERROR3CRITICAL
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#EBBC14 1
col_vals_gt
col_vals_gt()
value 100 10 6
0.60
4
0.40
2026-04-13 12:47:03 UTC< 1 s2026-04-13 12:47:03 UTC

The actions system is quite flexible beyond simple string messages. You can provide callable functions that execute when thresholds are exceeded, enabling custom logging, alerting, or remediation workflows. There is also a templating system that lets you include dynamic information in action messages, such as the step number, the column being validated, or the count of failing test units. This makes it possible to generate informative, context-aware notifications that help recipients understand exactly what went wrong without having to dig through the full validation report.

Integration with notebooks and Quarto is seamless because validation reports are Great Tables objects. They render beautifully in Jupyter notebooks, JupyterLab, and Quarto documents without any additional configuration. This makes it easy to include data quality reporting in analytical notebooks and published reports.

Responsive to User Needs

The development of Pointblank has been shaped substantially by user feedback and feature requests. The contributors who have joined the project over these fifty releases have brought valuable perspectives and concrete improvements. The changelog for each release tells the story of this collaborative development.

The DraftValidation class emerged from users who wanted help getting started with validation on new datasets. It uses large language models to analyze your data and generate a validation plan tailored to its structure and content. You provide your data and specify an LLM provider (Anthropic, OpenAI, Ollama, or Amazon Bedrock), and receive executable Python code that you can use directly or modify as needed.

The segmentation feature arose from users who needed to validate data across different groups without writing separate validation plans for each segment. With the segments= parameter, you can split a validation step by the values in a categorical column, generating separate results for each segment.

# Validation with segmentation
segmented_data = pl.DataFrame({
    "id": range(1, 10),
    "value": [10, 20, 3, 40, 50, 2, 6, 8, 60],
    "region": ["North", "North", "South", "South", "East", "East", "West", "West", "West"]
})

segmented_validation = (
    pb.Validate(data=segmented_data, tbl_name="regional_data")
    .col_vals_gt(
        columns="value",
        value=10,
        segments="region"
    )
    .interrogate()
)

segmented_validation
Pointblank Validation
2026-04-13|12:47:03
Polarsregional_data
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
SEGMENT  region / East
col_vals_gt
col_vals_gt()
value 10 2 1
0.50
1
0.50
#4CA64C66 2
SEGMENT  region / North
col_vals_gt
col_vals_gt()
value 10 2 1
0.50
1
0.50
#4CA64C66 3
SEGMENT  region / South
col_vals_gt
col_vals_gt()
value 10 2 1
0.50
1
0.50
#4CA64C66 4
SEGMENT  region / West
col_vals_gt
col_vals_gt()
value 10 3 1
0.33
2
0.67
2026-04-13 12:47:03 UTC< 1 s2026-04-13 12:47:03 UTC

The assert_passing() and assert_below_threshold() methods were added for users integrating validation into test suites. They provide clean interfaces for asserting that validations pass, with informative error messages when they do not. The above_threshold() method complements these by returning a boolean indicating whether any steps exceeded a specified threshold level.

Pipeline integration has been a recurring theme. The write_file() and read_file() functions allow validation objects to be serialized to disk and restored later, which is useful for archiving results or passing validations between pipeline stages.

Looking Forward

Fifty releases represents a considerable milestone, and one that furnishes a solid foundation for continued endeavours. The architecture that supports multiple data sources, multiple interfaces, and multiple languages provides a stable base for adding new capabilities. Equally important is the investment in testing: Pointblank has many thousands of tests with high code coverage, and this rigor is deliberate. A data quality library must hold itself to the highest standards of correctness, because users who rely on such a tool to find problems in their data should not also have to worry about problems in the tool itself. The contributors who have participated in this project bring diverse perspectives and concrete improvements. Their involvement has made the library stronger in manifold ways that a single developer could not anticipate.

NoteFeatured Talk: Making Things Nice in Python

If you are interested in the design philosophy behind Pointblank, I gave a talk called Making Things Nice in Python that covers how Pointblank and Great Tables prioritize user experience in Python package design. The talk explores why convenient options, extensive documentation, and thoughtful API decisions matter for everyone, even when they challenge conventional Python patterns and practices.

The steadfast commitment to data quality as a discipline remains unchanged. I really do believe that organizations succeed when they can trust their data, and that trust is built through systematic validation and clear communication of results. Pointblank aims to make that validation and communication as straightforward as possible, so that data quality becomes a routine part of every data workflow rather than an afterthought.

Thank you to everyone who has used Pointblank, reported issues, contributed code, or simply provided feedback. The journey from the initial release to the fiftieth has been made richer by your participation. If you have thoughts on the library, whether feature ideas, questions, bug reports, or just general observations, please share them on GitHub Discussions or GitHub Issues. Any sort of feedback is welcome and helps shape the direction of future development. We look forward to the next fifty releases and the continued advancement of data quality tooling in Python.

Rich Iannone

Software Engineer at Posit, PBC
Richard is a software engineer and table enthusiast. He and R go way back and he's been getting better at writing code in Python too. For the most part, Rich enjoys creating open source packages in R and Python so that people can do great things in their own work.