Skip to main content
You can include CSV or Parquet files directly in your packages. When you publish your package, these data files are published along with your models and become queryable via DuckDB.

Why Embed Data?

Embedding data files in your packages is valuable when you need to:
  • Package sample data - Example datasets for testing or demos
  • Build standalone models - Models that don’t require database connections
  • Version control data - Keep data synchronized with model changes in your package
Currently, embedded data files work best for standalone models. Support for querying embedded data alongside database connections (e.g., joining embedded lookup tables with warehouse data) is coming soon.

Adding Data Files

File Structure

Create a data/ folder in your package directory and add your CSV or Parquet files:
my-package/
├── publisher.json
├── ecommerce.malloy
└── data/
    ├── country_codes.csv
    ├── product_categories.parquet
    └── exchange_rates.csv

Supported File Formats

  • CSV files (.csv) - Comma-separated values with header row
  • Parquet files (.parquet) - Columnar binary format, efficient for larger datasets

Referencing Embedded Data in Models

Use duckdb.table() to reference embedded files in your Malloy models:
// Reference a CSV file
source: country_codes is duckdb.table('data/country_codes.csv') extend {
  dimension:
    country_code is code
    country_name is name
    region is geographic_region
}

// Reference a Parquet file
source: product_categories is duckdb.table('data/product_categories.parquet') extend {
  dimension:
    category_id is id
    category_name is name
    parent_category is parent_id
}

Next Steps

AI-Assisted Modeling

Learn how to build semantic models with AI assistance