[PULL REQUEST] add logic for employment point-based xref by bryce-sandag · Pull Request #204 · SANDAG/Estimates-Program

bryce-sandag · 2026-03-05T23:46:42Z

Describe this pull request. What changes are being made?

Add in sql script and update logic in employment module to handle employment point-based cross reference from block to mgra where possible, otherwise default to area based cross reference

What issues does this pull request address?

close #198

Copilot

Pull request overview

Adds a new GIS-server SQL crosswalk that prefers EDD point-based job locations for mapping Census blocks to Series 15 MGRAs, falling back to area-overlap allocations when point data isn’t available, and wires this crosswalk into the employment aggregation pipeline.

Changes:

Add sql/employment/edd_land_use_split.sql to generate block→MGRA allocations using EDD points with area-based fallback.
Update python/employment.py to query the new crosswalk from GIS_ENGINE and use pct_edd vs pct_area during LODES aggregation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
sql/employment/edd_land_use_split.sql	New SQL script to build EDD point-based + area-based fallback block→MGRA allocation percentages.
python/employment.py	Switches employment module to use the new crosswalk and applies EDD/area allocation logic in aggregation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sql/employment/edd_land_use_split.sql

python/employment.py

sql/employment/edd_land_use_split.sql

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

python/employment.py:175

Commented-out debug printing left in the module adds noise and is easy to accidentally re-enable. Please remove these lines (or replace with utils.logger.debug(...) behind a config flag if ongoing observability is needed).

                con=con,
                params={

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sql/employment/edd_land_use_split.sql

Eric-Liu-SANDAG

For next review, please provide a [run_id] containing a full run of the Estimates Program so I can take a look at the output data. Also provide an older [run_id] so I can compare how the employment data has changed with the new methodology

sql/employment/edd_land_use_split.sql

bryce-sandag · 2026-03-09T22:56:52Z

For next review, please provide a [run_id] containing a full run of the Estimates Program so I can take a look at the output data. Also provide an older [run_id] so I can compare how the employment data has changed with the new methodology

Have run production database for 2020-2023 with run_id = 186. Can compare with land base xref run_id = 181

Can you this code to check where jobs differences happen

SELECT [jobs].[run_id]
      ,[jobs].[year]
      ,[jobs].[mgra]
      ,[jobs].[naics_code]
      ,[jobs].[value] jobs_edd_xref
      ,X.[value] AS jobs_area_xref
FROM [EstimatesProgram].[outputs].[jobs]
INNER JOIN 
    (SELECT 
        [run_id]
        ,[year]
        ,[mgra]
        ,[naics_code]
        ,[value]
    FROM [EstimatesProgram].[outputs].[jobs]
    WHERE run_id = 181 -- area xref
    ) AS X
    ON [jobs].[year] = X.[year] 
        AND [jobs].[mgra] = X.[mgra]
        AND [jobs].[naics_code] = X.[naics_code]
WHERE [jobs].[run_id] = 186 -- edd based xref
    AND [jobs].[value] != X.[value]   
ORDER BY year, mgra, naics_code

Eric-Liu-SANDAG

Code changes, pretty much some weird formatting notes:

I thought we had to keep the [EMPCORE]?
Why is SUM(SUM([jobs])) OVER (PARTITION BY [CENSUSBLOCKS].[GEOID20]) AS [block_jobs] on two lines?
I would actually prefer FULL OUTER JOIN over FULL JOIN. I feel including OUTER better distinguishes it from LEFT or RIGHT

Otherwise, the data itself likely has some errors, as I think some of the changes I am seeing are too large for such a small methodology change. It's probable that either 181 had issues to begin with, or the changes between 181 and 186 caused the issues. See query below:

WITH [jobs] AS (
    SELECT [jobs].[run_id]
          ,[jobs].[year]
          ,[jobs].[mgra]
          ,[jobs].[naics_code]
          ,[jobs].[value] jobs_edd_xref
          ,X.[value] AS jobs_area_xref,
          [X].[value] - [jobs].[value] AS [diff]
    FROM [EstimatesProgram].[outputs].[jobs]
    INNER JOIN 
        (SELECT 
            [run_id]
            ,[year]
            ,[mgra]
            ,[naics_code]
            ,[value]
        FROM [EstimatesProgram].[outputs].[jobs]
        WHERE run_id = 181 -- area xref
        ) AS X
        ON [jobs].[year] = X.[year] 
            AND [jobs].[mgra] = X.[mgra]
            AND [jobs].[naics_code] = X.[naics_code]
    WHERE [jobs].[run_id] = 186 -- edd based xref
        AND [jobs].[value] != X.[value]   
), [agg] AS (
    SELECT 
        [year],
        [cpa],
        SUM([jobs_edd_xref]) AS [jobs_edd_xref],
        SUM([jobs_area_xref]) AS [jobs_area_xref],
        SUM([diff]) AS [diff],
        CASE
            WHEN SUM([jobs_edd_xref]) = 0 THEN NULL
            ELSE 100.0 * SUM([diff]) / SUM([jobs_edd_xref])
        END AS [pct_change]
    FROM [jobs]
    INNER JOIN [demographic_warehouse].[dim].[mgra_denormalize]
        ON [jobs].[mgra] = [mgra_denormalize].[mgra]
        AND [mgra_denormalize].[series] = 15
    GROUP BY [year], [cpa]
)

SELECT *
FROM [agg]
ORDER BY ABS([pct_change]) DESC

Especially focus on Del Mar Mesa and Torrey Highlands, but in general just verify the changes are expected

bryce-sandag · 2026-03-11T23:28:38Z

I investigated 3 differences and the edd point-based xref is working as intended. The bigggest changes come from large blocks and MGRAs where the largest or larger MGRAs that were in one cpa are large based on area but are parks and a smaller mgra that actually includes buildings where the edd businesses are located are in another cpa.

For example the differences in Del Mar Mesa and Torrey Highlands CPAs can be explained by this. Looking at the year 2020 from lehd jobs data 2,620 jobs are included in Census Block 60730083663008 (highleted in the larger blue outline in map below). The largest MGRA 12814 is larger area in the middle of the Census Block (in the green outline with no buildings) by area is 69.7% of the area of the block and this MGRA is in the Del Mar Mesa CPA and just using area would be assigned 1,825 jobs. But using edd split MGRA 12553 (highlighted by the blue smaller outline) is where most of the jobs are cited in this block and accounts for 91.3% of the jobs or 2,388 and this MGRA is within the Torrey Highlands CPA.

This similar finding can explain the other differences. I also checked two other cases in the Tijuana River Valley and Los Penasquitos Canyon Preserve CPAs and found similar findings.

GregorSchroeder · 2026-03-11T23:40:08Z

I investigated 3 differences and the edd point-based xref is working as intended. The bigggest changes come from large blocks and MGRAs where the largest or larger MGRAs that were in one cpa are large based on area but are parks and a smaller mgra that actually includes buildings where the edd businesses are located are in another cpa.

Oh wow. This definitely supports our use of EDD point-based xref over area-based.

bryce-sandag · 2026-03-11T23:42:16Z

Code changes, pretty much some weird formatting notes:

I thought we had to keep the [EMPCORE]?

Why is SUM(SUM([jobs])) OVER (PARTITION BY [CENSUSBLOCKS].[GEOID20]) AS [block_jobs] on two lines?

I would actually prefer FULL OUTER JOIN over FULL JOIN. I feel including OUTER better distinguishes it from LEFT or RIGHT

[EMPCORE] not needed as should be connecting straight to that database, but could add it back in incase connect to different database in GIS server.

Line length would be 90 if included on same line

I can make change to include OUTER in FULL OUTER JOIN

#198 add logic for employment based xref

1be889f

bryce-sandag requested a review from Copilot March 5, 2026 23:46

bryce-sandag self-assigned this Mar 5, 2026

Copilot started reviewing on behalf of bryce-sandag March 5, 2026 23:47 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

bryce-sandag requested a review from Copilot March 6, 2026 00:28

Copilot started reviewing on behalf of bryce-sandag March 6, 2026 00:28 View session

#198 updates based on copilot pr feedback

901c91a

Copilot AI reviewed Mar 6, 2026

View reviewed changes

sql/employment/edd_land_use_split.sql Show resolved Hide resolved

sql/employment/edd_land_use_split.sql Outdated Show resolved Hide resolved

bryce-sandag requested review from Eric-Liu-SANDAG and GregorSchroeder March 6, 2026 00:32

Eric-Liu-SANDAG requested changes Mar 6, 2026

View reviewed changes

#198 address Eric's PR feedback

784d683

bryce-sandag requested a review from Eric-Liu-SANDAG March 9, 2026 22:59

Eric-Liu-SANDAG requested changes Mar 10, 2026

View reviewed changes

Conversation

bryce-sandag commented Mar 5, 2026

Describe this pull request. What changes are being made?

What issues does this pull request address?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bryce-sandag commented Mar 9, 2026

Uh oh!

Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

Uh oh!

bryce-sandag commented Mar 11, 2026

Uh oh!

GregorSchroeder commented Mar 11, 2026

Uh oh!

bryce-sandag commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants