Skip to content

Taceyes/JHU_AgenticAI_Project_1_Learners_Notebook

Repository files navigation

JHU AgenticAI Project 1 - DualLens Analytics

Overview

DualLens Analytics is a comprehensive investment analysis tool that combines quantitative financial metrics with qualitative insights from organizational AI initiatives. By applying a dual-lens approach using Retrieval-Augmented Generation (RAG), the project merges financial growth data with strategic insights from organizational reports to provide a holistic view of organizational potential.

Problem Statement

Traditional investment analysis often focuses solely on financial metrics (e.g., stock growth, revenue, market cap), missing the qualitative dimension of how prepared a company is for the future. On the other hand, qualitative documents like strategy PDFs contain valuable insights about innovation and AI initiatives, but they are difficult to structure, query, and integrate with numeric financial data.

Core Challenges Addressed

  1. Fragmented Data Sources: Financial data (stock prices) and strategic insights (PDFs) exist in silos
  2. Limited Analytical Scope: Manual analysis of growth trends and PDF reports is time-consuming and error-prone
  3. Decisional Blind Spots: Without integrating both quantitative (growth trends) and qualitative (AI initiatives) signals, investors may miss out on high-potential organizations

Features

  • Financial Data Analysis: Automated collection and visualization of stock market data for multiple companies (GOOGL, MSFT, IBM, NVDA, AMZN)
  • PDF Document Processing: Extraction and chunking of AI initiative documents from company reports
  • Vector Store Integration: ChromaDB vector store for semantic search and retrieval
  • RAG Pipeline: Retrieval-Augmented Generation system for querying company AI initiatives
  • Unified Analysis: Combined financial metrics and AI initiative insights for comprehensive investment decisions

Project Structure

.
├── JHU AgenticAI Project 1 Learners Notebook (1).ipynb  # Main notebook
├── AMZN.pdf                                               # Amazon AI initiatives document
├── GOOGL.pdf                                              # Google AI initiatives document
├── IBM.pdf                                                # IBM AI initiatives document
├── MSFT.pdf                                               # Microsoft AI initiatives document
├── NVDA.pdf                                               # NVIDIA AI initiatives document
└── README.md                                              # This file

Technologies Used

  • Python: Core programming language
  • Jupyter Notebook: Interactive development environment
  • yfinance: Stock market data collection
  • LangChain: RAG pipeline and document processing
  • OpenAI: LLM and embeddings (GPT-4o-mini, text-embedding-ada-002)
  • ChromaDB: Vector database for document storage and retrieval
  • Pandas: Data manipulation and analysis
  • Matplotlib: Data visualization

Key Components

1. Financial Data Collection

  • Automated stock price history retrieval
  • Financial metrics extraction (Market Cap, P/E Ratio, Dividend Yield, Beta, Total Revenue)
  • Data visualization and comparison across companies

2. Document Processing

  • PDF text extraction from company AI initiative reports
  • Text chunking using RecursiveCharacterTextSplitter
  • Document vectorization using OpenAI embeddings

3. RAG System

  • Vector store creation with ChromaDB
  • Semantic search and retrieval
  • LLM-powered question answering based on retrieved context

Setup Instructions

  1. Install Dependencies:

    pip install langchain_openai
    pip install langchain-text-splitters
    pip install langchain-community
    pip install chromadb
    pip install yfinance
    pip install pandas
    pip install matplotlib
    pip install PyPDF2
  2. Configure API Keys:

    • Set up your OpenAI API key in the notebook
    • The notebook includes configuration for API key management
  3. Upload PDF Documents:

    • Upload the 5 company PDF files (AMZN, GOOGL, IBM, MSFT, NVDA)
    • The notebook will process and extract text from these documents
  4. Run the Notebook:

    • Execute cells sequentially
    • The notebook will:
      • Fetch financial data for all companies
      • Process and chunk PDF documents
      • Create vector embeddings
      • Build the RAG system
      • Enable querying of company AI initiatives

Usage

  1. Financial Analysis: Run the financial data collection cells to view stock trends and metrics
  2. Document Processing: Execute the PDF processing cells to extract and chunk company documents
  3. Query AI Initiatives: Use the RAG function to ask questions about company AI initiatives:
    response = RAG("What are Google's main AI initiatives?")

Project Status

✅ All placeholders resolved
✅ Financial data collection implemented
✅ PDF processing pipeline complete
✅ RAG system functional
✅ Vector store integration working

Notes

  • The notebook includes resolved code with all placeholders filled
  • API key configuration is required before running
  • PDF documents should be uploaded before document processing
  • The RAG system requires an active OpenAI API key

License

This project is part of the JHU AgenticAI course curriculum.

Author

Taceyes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors