Conversation
ValerioGiuffrida
left a comment
There was a problem hiding this comment.
@odai-saleh changes advised below should deliver a shorter and more appropriate code to the data processing -> indicator calculation step.
| FCS_flag_low = ifelse(!is.na(FCS) & FCS < 14, 1L, 0L), | ||
| FCS_flag_high = ifelse(!is.na(FCS) & FCS > 100, 1L, 0L) | ||
| ) %>% | ||
|
|
There was a problem hiding this comment.
These scripts are for processing data, not flagging data quality issues.
Data quality control is a different step.
@odai-saleh please remove rows above.
| # ------------------------------------------------------------ | ||
| # Data Quality Diagnostics | ||
| # ------------------------------------------------------------ | ||
|
|
||
| # Descriptives for FCS | ||
| summary(df$FCS) | ||
| sd(df$FCS, na.rm = TRUE) | ||
|
|
||
| # Flag frequencies | ||
| table(df$FCS_flag_low, useNA = "ifany") | ||
| table(df$FCS_flag_high, useNA = "ifany") | ||
|
|
||
| # Category distribution | ||
| table(df$FCSCat28, useNA = "ifany") | ||
|
|
||
| # Sample check of food group distributions (min/max/mean) | ||
| food_group_stats <- df %>% | ||
| summarise(across(all_of(fcs_vars), | ||
| list(min = ~min(.x, na.rm = TRUE), | ||
| max = ~max(.x, na.rm = TRUE), | ||
| mean = ~mean(.x, na.rm = TRUE)))) | ||
| food_group_stats |
There was a problem hiding this comment.
@odai-saleh same rationale as above.
Data quality is a different step in the process, hence should rely on a different code/script.
| # ---------------------------------------------------------- | ||
| # 3) Clean impossible FCS values (should be 0-112) | ||
| # ---------------------------------------------------------- | ||
| mutate( | ||
| FCS = ifelse(FCS < 0 | FCS >= 113, NA_real_, FCS) | ||
| ) %>% | ||
|
|
There was a problem hiding this comment.
Given the code of section one, this is redundant as it checks a mathematical impossibility.
@odai-saleh kindly remove
| # ---------------------------------------------------------- | ||
| # 3) Clean any impossible FCS values (should be 0-112) | ||
| # ---------------------------------------------------------- | ||
| mutate( | ||
| FCS = ifelse(FCS < 0 | FCS >= 113, NA_real_, FCS) | ||
| ) %>% | ||
|
|
There was a problem hiding this comment.
Given logical cleaning done in step 1 of this code, the above is a mathematical impossibility.
@odai-saleh kindly remove these lines.
| # ---------------------------------------------------------- | ||
| # 4) Data quality flags | ||
| # low if FCS < 14; high if FCS > 100 | ||
| # ---------------------------------------------------------- | ||
| mutate( | ||
| FCS_flag_low = ifelse(!is.na(FCS) & FCS < 14, 1L, 0L), | ||
| FCS_flag_high = ifelse(!is.na(FCS) & FCS > 100, 1L, 0L) | ||
| ) %>% | ||
|
|
There was a problem hiding this comment.
Data quality control is a different step in the process.
@odai-saleh Kindly remove rows above.
| # ------------------------------------------------------------ | ||
| # Data Quality Diagnostics | ||
| # ------------------------------------------------------------ | ||
|
|
||
| # Descriptives for FCS | ||
| summary(df$FCS) | ||
| sd(df$FCS, na.rm = TRUE) | ||
|
|
||
| # Frequencies of flags | ||
| table(df$FCS_flag_low, useNA = "ifany") | ||
| table(df$FCS_flag_high, useNA = "ifany") | ||
|
|
||
| # Distribution of final categories | ||
| table(df$FCSCat21, useNA = "ifany") | ||
|
|
||
| # Sample check of food group distributions: min/max/mean | ||
| food_group_stats <- df %>% | ||
| summarise(across(all_of(fcs_vars), | ||
| list(min = ~min(.x, na.rm = TRUE), | ||
| max = ~max(.x, na.rm = TRUE), | ||
| mean = ~mean(.x, na.rm = TRUE)))) | ||
| food_group_stats |
No description provided.