Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 221 additions & 0 deletions Doc/DUPLICATION_ANALYSIS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
# Code Duplication Analysis: `get_text` and Related Methods

## Executive Summary

This document analyzes the duplicate functionality between the **legacy** `MainPage` class in `mw_api/super/S_Page/super_page.py` and the **new** refactored architecture in `mw_api/repositories/page_repository.py` and related modules.

**Key Finding:** The duplications are **intentional** and part of an ongoing refactoring effort. The new `repositories/` and `core/` modules are designed to eventually replace the legacy implementation in `super/S_Page/`.

---

## 1. Identified Duplications

### 1.1 `get_text` Method

| Aspect | Legacy (`MainPage.get_text`) | New (`PageRepository.get_text`) |
|--------|------------------------------|--------------------------------|
| **Location** | `mw_api/super/S_Page/super_page.py:186` | `mw_api/repositories/page_repository.py:33` |
| **Lines** | ~70 lines | ~30 lines |
| **Signature** | `def get_text(self, redirects=False)` | `def get_text(self, page: Page) -> str` |
| **Side Effects** | Updates `self.text`, `self.user`, `self.ns`, `self.meta`, `self.revisions_data` | None (pure function) |
| **Testability** | Hard (requires full `MainPage` setup) | Easy (mock `LoginBotProtocol`) |
| **Type Hints** | Minimal | Full |

**Recommendation:** Keep both during migration. Eventually, `MainPage.get_text()` should delegate to `PageRepository.get_text()`.

---

### 1.2 `get_categories` Method

| Aspect | Legacy (`MainPage.get_categories`) | New (`PageRepository.get_categories`) |
|--------|-----------------------------------|--------------------------------------|
| **Location** | `super_page.py:496` | `page_repository.py:182` |
| **Implementation** | Returns cached data from `get_infos()` | Makes direct API call |
| **Side Effects** | Lazy loads via `get_infos()` | None |

**Recommendation:** The implementations differ in approach. Legacy uses batch loading; new uses individual queries. Consider adding a caching service wrapper for the new implementation.

---

### 1.3 `get_langlinks` Method

| Aspect | Legacy | New |
|--------|--------|-----|
| **Location** | `super_page.py:514` | `page_repository.py:263` |
| **Implementation** | Returns cached `self.langlinks` | Makes direct API call |

**Recommendation:** The implementations differ similarly to `get_categories`. Consider adding a caching service wrapper for the new implementation.

---

### 1.4 `get_templates` Method

| Aspect | Legacy (`get_templates_API`) | New (`get_templates`) |
|--------|------------------------------|----------------------|
| **Location** | `super_page.py:521` | `page_repository.py:209` |
| **Implementation** | Returns cached data | Makes direct API call |

---

### 1.5 `save` Method

| Aspect | Legacy (`MainPage.save`) | New (`PageRepository.save`) |
|--------|--------------------------|----------------------------|
| **Location** | `super_page.py:657` | `page_repository.py:118` |
| **Lines** | ~90 lines | ~40 lines |
| **Features** | Diff display, user prompt, error handling, state updates | Basic API call |

**Recommendation:** The legacy `save` has important user interaction features. Create a `PageEditor` service that combines `PageRepository.save` with the UX features.

---

### 1.6 `page_exists` / `exists`

| Aspect | Legacy | New |
|--------|--------|-----|
| **Location** | `super_page.py:634` | `page_repository.py:159` |
| **Implementation** | Lazy loads via `get_text()` | Direct query |

---

## 2. Architecture Comparison

### 2.1 Legacy Architecture (super/S_Page/)

```
MainPage (God Object - 987 lines)
├── Inherits from PAGE_APIS, ASK_BOT
├── Owns login_bot reference
├── Mixes data access, business logic, and UI concerns
├── Uses instance variables for caching
└── Side effects on every operation
```

**Files:**
- `super/S_Page/super_page.py` - Main god object
- `super/S_Page/bot.py` - API helpers
- `super/S_Page/data.py` - Data classes
- `super/S_Page/ar_err.py` - Arabic error handling

### 2.2 New Architecture (repositories/, core/, services/)

```
New Clean Architecture:
├── core/page.py - Page dataclass (pure data entity)
├── core/protocols.py - LoginBotProtocol interface
├── repositories/page_repository.py - Data access layer (API calls)
├── services/template_service.py - Template parsing logic
└── services/edit_validator.py - Edit permission logic
```

**Advantages:**
- Testable (dependency injection via protocols)
- Type-safe (full type hints)
- Single Responsibility Principle
- No side effects in data access

---

## 3. Refactoring Status (from refactor.md)

The `refactor.md` checklist shows the following relevant items:

### Completed ✅
- [x] `mw_api/core/config.py` - `BotConfig` dataclass
- [x] `mw_api/core/page.py` - `Page` and `PageMetadata` dataclasses
- [x] `mw_api/repositories/page_repository.py` - `PageRepository` class
- [x] `mw_api/services/template_service.py` - `TemplateService` class
- [x] `mw_api/services/edit_validator.py` - `EditValidator` class
- [x] `mw_api/core/protocols.py` - Interface definitions
- [x] `tests/test_repositories.py` - Unit tests for `PageRepository`

### Pending ⏳
- [ ] Update `MainPage` to delegate text retrieval to `PageRepository`
- [ ] Update `MainPage` to delegate template operations to `TemplateService`
- [ ] Update `MainPage` to delegate edit permission checks to `EditValidator`
- [ ] Remove duplicate implementations once delegation is complete

---

## 4. Recommendations

### 4.1 Short-Term (Do Not Remove)

**DO NOT remove either implementation yet.** The legacy code is still in active use via `ALL_APIS.MainPage()`. Removing it would break existing users.

### 4.2 Medium-Term (Integration Phase)

Modify `MainPage.get_text()` to delegate to `PageRepository`:

```python
# In super_page.py
def get_text(self, redirects=False):
"""Retrieves the current wikitext content."""
# Create Page entity
page = Page(title=self.title, lang=self.lang, family=self.family)

# Delegate to repository
repository = PageRepository(self.login_bot)
self.text = repository.get_text(page)

# Update legacy instance attributes for backward compatibility
# ... existing metadata updates ...

return self.text
```

### 4.3 Long-Term (New API)

Provide a new, clean API that uses the repository pattern directly:

```python
from mw_api.core.page import Page
from mw_api.repositories import PageRepository

# New clean usage
repo = PageRepository(login_bot)
page = Page(title="Example")
text = repo.get_text(page)
```

---

## 5. Files to Keep vs. Migrate

| File | Status | Action |
|------|--------|--------|
| `repositories/page_repository.py` | **NEW** | Keep - This is the target architecture |
| `core/page.py` | **NEW** | Keep - Core data entities |
| `core/protocols.py` | **NEW** | Keep - Interface definitions |
| `services/template_service.py` | **NEW** | Keep - Extracted from MainPage |
| `services/edit_validator.py` | **NEW** | Keep - Extracted from botEdit.py |
| `super/S_Page/super_page.py` | **LEGACY** | Keep for now, migrate incrementally |
| `super/S_Page/bot.py` | **LEGACY** | Migrate to repositories/services |
| `super/S_Page/data.py` | **LEGACY** | Replace with `core/page.py` dataclasses |

---

## 6. Testing Strategy

| Component | Tests | Status |
|-----------|-------|--------|
| `PageRepository` | `tests/test_repositories.py` | ✅ Complete |
| `TemplateService` | (needs tests) | ⏳ Pending |
| `EditValidator` | (needs tests) | ⏳ Pending |
| `MainPage` | `tests/test_MainPage.py` | ✅ Exists |

---

## 7. Conclusion

The duplicate `get_text` methods exist as part of a planned refactoring effort:

1. **`PageRepository.get_text`** is the **new, correct implementation** following clean architecture principles
2. **`MainPage.get_text`** is the **legacy implementation** that will eventually delegate to the repository

**Do not remove either file.** Instead, follow the migration path in `refactor.md` to gradually delegate legacy functionality to the new architecture.

---

*Analysis Date: 2026-01-26*
*Related: See `refactor.md` for the complete refactoring roadmap*
Loading