Skip to content

[PULL REQUEST] add logic for employment point-based xref#204

Open
bryce-sandag wants to merge 3 commits intomainfrom
add-employment-based-cross-reference
Open

[PULL REQUEST] add logic for employment point-based xref#204
bryce-sandag wants to merge 3 commits intomainfrom
add-employment-based-cross-reference

Conversation

@bryce-sandag
Copy link
Contributor

Describe this pull request. What changes are being made?

Add in sql script and update logic in employment module to handle employment point-based cross reference from block to mgra where possible, otherwise default to area based cross reference

What issues does this pull request address?

close #198

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new GIS-server SQL crosswalk that prefers EDD point-based job locations for mapping Census blocks to Series 15 MGRAs, falling back to area-overlap allocations when point data isn’t available, and wires this crosswalk into the employment aggregation pipeline.

Changes:

  • Add sql/employment/edd_land_use_split.sql to generate block→MGRA allocations using EDD points with area-based fallback.
  • Update python/employment.py to query the new crosswalk from GIS_ENGINE and use pct_edd vs pct_area during LODES aggregation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
sql/employment/edd_land_use_split.sql New SQL script to build EDD point-based + area-based fallback block→MGRA allocation percentages.
python/employment.py Switches employment module to use the new crosswalk and applies EDD/area allocation logic in aggregation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

python/employment.py:175

  • Commented-out debug printing left in the module adds noise and is easy to accidentally re-enable. Please remove these lines (or replace with utils.logger.debug(...) behind a config flag if ongoing observability is needed).
                con=con,
                params={

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@Eric-Liu-SANDAG Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For next review, please provide a [run_id] containing a full run of the Estimates Program so I can take a look at the output data. Also provide an older [run_id] so I can compare how the employment data has changed with the new methodology

@bryce-sandag
Copy link
Contributor Author

For next review, please provide a [run_id] containing a full run of the Estimates Program so I can take a look at the output data. Also provide an older [run_id] so I can compare how the employment data has changed with the new methodology

Have run production database for 2020-2023 with run_id = 186. Can compare with land base xref run_id = 181

Can you this code to check where jobs differences happen

SELECT [jobs].[run_id]
      ,[jobs].[year]
      ,[jobs].[mgra]
      ,[jobs].[naics_code]
      ,[jobs].[value] jobs_edd_xref
      ,X.[value] AS jobs_area_xref
FROM [EstimatesProgram].[outputs].[jobs]
INNER JOIN 
    (SELECT 
        [run_id]
        ,[year]
        ,[mgra]
        ,[naics_code]
        ,[value]
    FROM [EstimatesProgram].[outputs].[jobs]
    WHERE run_id = 181 -- area xref
    ) AS X
    ON [jobs].[year] = X.[year] 
        AND [jobs].[mgra] = X.[mgra]
        AND [jobs].[naics_code] = X.[naics_code]
WHERE [jobs].[run_id] = 186 -- edd based xref
    AND [jobs].[value] != X.[value]   
ORDER BY year, mgra, naics_code

Copy link
Contributor

@Eric-Liu-SANDAG Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes, pretty much some weird formatting notes:

  • I thought we had to keep the [EMPCORE]?
  • Why is SUM(SUM([jobs])) OVER (PARTITION BY [CENSUSBLOCKS].[GEOID20]) AS [block_jobs] on two lines?
  • I would actually prefer FULL OUTER JOIN over FULL JOIN. I feel including OUTER better distinguishes it from LEFT or RIGHT

Otherwise, the data itself likely has some errors, as I think some of the changes I am seeing are too large for such a small methodology change. It's probable that either 181 had issues to begin with, or the changes between 181 and 186 caused the issues. See query below:

WITH [jobs] AS (
    SELECT [jobs].[run_id]
          ,[jobs].[year]
          ,[jobs].[mgra]
          ,[jobs].[naics_code]
          ,[jobs].[value] jobs_edd_xref
          ,X.[value] AS jobs_area_xref,
          [X].[value] - [jobs].[value] AS [diff]
    FROM [EstimatesProgram].[outputs].[jobs]
    INNER JOIN 
        (SELECT 
            [run_id]
            ,[year]
            ,[mgra]
            ,[naics_code]
            ,[value]
        FROM [EstimatesProgram].[outputs].[jobs]
        WHERE run_id = 181 -- area xref
        ) AS X
        ON [jobs].[year] = X.[year] 
            AND [jobs].[mgra] = X.[mgra]
            AND [jobs].[naics_code] = X.[naics_code]
    WHERE [jobs].[run_id] = 186 -- edd based xref
        AND [jobs].[value] != X.[value]   
), [agg] AS (
    SELECT 
        [year],
        [cpa],
        SUM([jobs_edd_xref]) AS [jobs_edd_xref],
        SUM([jobs_area_xref]) AS [jobs_area_xref],
        SUM([diff]) AS [diff],
        CASE
            WHEN SUM([jobs_edd_xref]) = 0 THEN NULL
            ELSE 100.0 * SUM([diff]) / SUM([jobs_edd_xref])
        END AS [pct_change]
    FROM [jobs]
    INNER JOIN [demographic_warehouse].[dim].[mgra_denormalize]
        ON [jobs].[mgra] = [mgra_denormalize].[mgra]
        AND [mgra_denormalize].[series] = 15
    GROUP BY [year], [cpa]
)

SELECT *
FROM [agg]
ORDER BY ABS([pct_change]) DESC

Especially focus on Del Mar Mesa and Torrey Highlands, but in general just verify the changes are expected

@bryce-sandag
Copy link
Contributor Author

I investigated 3 differences and the edd point-based xref is working as intended. The bigggest changes come from large blocks and MGRAs where the largest or larger MGRAs that were in one cpa are large based on area but are parks and a smaller mgra that actually includes buildings where the edd businesses are located are in another cpa.

For example the differences in Del Mar Mesa and Torrey Highlands CPAs can be explained by this. Looking at the year 2020 from lehd jobs data 2,620 jobs are included in Census Block 60730083663008 (highleted in the larger blue outline in map below). The largest MGRA 12814 is larger area in the middle of the Census Block (in the green outline with no buildings) by area is 69.7% of the area of the block and this MGRA is in the Del Mar Mesa CPA and just using area would be assigned 1,825 jobs. But using edd split MGRA 12553 (highlighted by the blue smaller outline) is where most of the jobs are cited in this block and accounts for 91.3% of the jobs or 2,388 and this MGRA is within the Torrey Highlands CPA.

This similar finding can explain the other differences. I also checked two other cases in the Tijuana River Valley and Los Penasquitos Canyon Preserve CPAs and found similar findings.

image

@GregorSchroeder
Copy link
Contributor

I investigated 3 differences and the edd point-based xref is working as intended. The bigggest changes come from large blocks and MGRAs where the largest or larger MGRAs that were in one cpa are large based on area but are parks and a smaller mgra that actually includes buildings where the edd businesses are located are in another cpa.

Oh wow. This definitely supports our use of EDD point-based xref over area-based.

@bryce-sandag
Copy link
Contributor Author

Code changes, pretty much some weird formatting notes:

  • I thought we had to keep the [EMPCORE]?
  • Why is SUM(SUM([jobs])) OVER (PARTITION BY [CENSUSBLOCKS].[GEOID20]) AS [block_jobs] on two lines?
  • I would actually prefer FULL OUTER JOIN over FULL JOIN. I feel including OUTER better distinguishes it from LEFT or RIGHT

[EMPCORE] not needed as should be connecting straight to that database, but could add it back in incase connect to different database in GIS server.

Line length would be 90 if included on same line

I can make change to include OUTER in FULL OUTER JOIN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add employment point-based cross reference

4 participants