Retrieve financials with python
|

2. Python for Fundamental Analysis: Automate Financial Statement Retrieval with Python

Welcome back to the Python for Finance series! In our first post, we calculated the P/E ratio, seeing how Python could automate fetching specific data points. However, as insightful as ratios like P/E are, they are derived from the company’s core financial reports. To truly understand why a P/E ratio is high or low, or to assess a company’s overall health, we need the underlying data from the Income Statement, Balance Sheet, and Cash Flow Statement (the source of the ‘E’ in P/E, for instance!).

Manually downloading these statements for multiple companies or tracking them over time is exactly the kind of tedious work Python excels at automating. Using Python, the requests library for API communication, and the powerful pandaslibrary for data handling, we can programmatically pull down years’ worth of structured financial data. This lays the essential groundwork for calculating a vast array of ratios, visualizing financial trends, and building more sophisticated analyses.

In this post, we’ll use the Python script below, leveraging the Financial Modeling Prep (FMP) API, to automatically retrieve annual and quarterly financial statements. While we were using yfinance in my previous posts, in this post we use a financial API for retrieving the data, since yahoo finance may not have all financial statement data available. FMP offers a free plan with up to 200 calls per day.

The Three Pillars of Financial Reporting

Before coding, a quick refresher on the three essential statements:

1. The Income Statement (Profit & Loss Statement)

  • What it is: Reports financial performance over a period (quarter/year), showing Revenue, Expenses, and ultimately Net Income (Profit/Loss).
  • Why it’s important: Shows profitability, revenue growth, cost control, and allows margin analysis. The Earnings Per Share (EPS) we used for the P/E ratio in the prior post is derived from this statement’s Net Income.
  • Key Items: Revenue, COGS, Gross Profit, Operating Expenses (SG&A, R&D), Operating Income (EBIT), Interest, Taxes, Net Income, EPS.

2. The Balance Sheet

  • What it is: A snapshot of financial position at a specific point in time, following Assets = Liabilities + Equity.
  • Why it’s important: Shows financial health (what’s owned vs. owed), liquidity (short-term obligations), leverage (debt usage), and asset base.
  • Key Items: Current/Non-Current Assets (Cash, Receivables, PP&E), Current/Non-Current Liabilities (Payables, Debt), Shareholders’ Equity.

3. The Cash Flow Statement

  • What it is: Tracks the actual movement of cash over a period, categorized into Operating, Investing, and Financing activities. Crucially, it reconciles Net Income (from Income Statement) back to actual cash changes.
  • Why it’s important: Shows cash generation ability, quality of earnings (is profit turning into cash?), investment activities (CapEx), and financing activities (debt/equity changes, dividends paid).
  • Key Sections: Cash Flow from Operations (CFO), Investing (CFI), Financing (CFF), Net Change in Cash.

Accessing all three provides a comprehensive financial picture.

Python Implementation: Fetching Financial Statements

Let’s look at the Python code to retrieve these statements using the FMP API.

API Key & Libraries: You’ll need your free API key from Financial Modeling Prep (https://site.financialmodelingprep.com/developer/docs/). The code below attempts to load it from an environment variable (FMP_API_KEY) for better security, falling back to a placeholder you’ll need to replace if the environment variable isn’t set. You’ll also need the pandas and requests libraries (pip install pandas requests).

import requests
import pandas as pd
import os # Used to load API key from environment variable (more secure)

# --- Configuration ---
# Securely load API key from environment variable if available, otherwise use placeholder
# To set environment variable: export FMP_API_KEY='YOUR_KEY' (Linux/macOS) or set FMP_API_KEY=YOUR_KEY (Windows)
API_KEY = os.getenv('FMP_API_KEY', "YOUR_API_KEY") # Replace "YOUR_API_KEY" only if not using env var

if API_KEY == "YOUR_API_KEY":
    print("Warning: API Key not found in environment variables. Using placeholder.")
    print("Please replace 'YOUR_API_KEY' in the script or set the FMP_API_KEY environment variable.")

# Ticker symbol of the company to analyze
TICKER = "AAPL"
# Number of past periods (years/quarters) to retrieve
LIMIT = 5
# Base URL for FMP API version 3
BASE_URL = "https://financialmodelingprep.com/api/v3"

# --- Function to Fetch Data from FMP API (Error Handling Enhanced) ---
def get_financial_data(url):
    """
    Fetches data from the specified Financial Modeling Prep API endpoint.
    Designed to handle endpoints that return a list of records (like statements).

    Args:
        url (str): The API endpoint URL.

    Returns:
        list or None: A list of dictionaries containing the fetched data if successful,
                      otherwise None.
    """
    print(f"Attempting to fetch data from: {url}")
    try:
        response = requests.get(url, timeout=15) # Increased timeout slightly
        response.raise_for_status() # Check for HTTP errors (4xx or 5xx)
        data = response.json()

        # Check if the response is a list (expected for statements)
        if isinstance(data, list):
            if len(data) > 0:
                print(f"Successfully fetched {len(data)} periods of data.")
                return data # Return the list of dictionaries
            else:
                # Handles cases where the API correctly returns an empty list (e.g., no data for period/ticker)
                print(f"Warning: Received empty list from {url}. No data available for this period/ticker?")
                return None
        # Handle cases where the API might return an error message as a dictionary
        elif isinstance(data, dict) and 'Error Message' in data:
              print(f"Error from API at {url}: {data['Error Message']}")
              return None
        else:
            # Handle other unexpected data formats
            print(f"Warning: Received unexpected data format from {url}. Type: {type(data)}, Data: {str(data)[:200]}...") # Show snippet
            return None
            
    except requests.exceptions.Timeout as e: # Specific timeout error
         print(f"Error: API request timed out for {url}: {e}")
         return None
    except requests.exceptions.HTTPError as e: # Specific HTTP error
         print(f"Error: HTTP Error for {url}: {e}")
         # Optionally add more specific checks based on status code
         if response.status_code == 401 or response.status_code == 403:
             print("       (Check if your API key is valid, active, and has access to this endpoint)")
         return None
    except requests.exceptions.RequestException as e: # Catch other request errors (DNS, connection, etc.)
        print(f"Error: Network-related error fetching data from {url}: {e}")
        return None
    except requests.exceptions.JSONDecodeError as e: # Catch errors parsing JSON
        print(f"Error: Decoding JSON response from {url}: {e}")
        print(f"Response text that failed to parse: {response.text[:500]}") # Show more context
        return None
    except Exception as e: # Catch any other unexpected errors during fetch
        print(f"An unexpected error occurred while fetching {url}: {e}")
        return None

# --- Function to Fetch and Structure Statements ---
def fetch_statements(ticker, period='annual', limit=5):
    """
    Fetches Income Statement, Balance Sheet, and Cash Flow Statement
    for a given ticker and period type.

    Args:
        ticker (str): The stock ticker symbol.
        period (str): 'annual' or 'quarter'. Defaults to 'annual'.
        limit (int): The number of past periods to fetch. Defaults to 5.

    Returns:
        dict: A dictionary where keys are statement names ('income_statement', etc.)
              and values are pandas DataFrames containing the statement data,
              or None if fetching/conversion failed for that statement.
    """
    statements_data = {
        'income_statement': None,
        'balance_sheet': None,
        'cash_flow_statement': None # Corrected key name
    }
    print(f"\n--- Fetching {period.capitalize()} Statements for {ticker} (Limit: {limit}) ---")

    # Construct URLs based on period
    if period == 'annual':
        income_url = f"{BASE_URL}/income-statement/{ticker}?period=annual&limit={limit}&apikey={API_KEY}"
        balance_url = f"{BASE_URL}/balance-sheet-statement/{ticker}?period=annual&limit={limit}&apikey={API_KEY}"
        cashflow_url = f"{BASE_URL}/cash-flow-statement/{ticker}?period=annual&limit={limit}&apikey={API_KEY}"
    elif period == 'quarter':
        income_url = f"{BASE_URL}/income-statement/{ticker}?period=quarter&limit={limit}&apikey={API_KEY}"
        balance_url = f"{BASE_URL}/balance-sheet-statement/{ticker}?period=quarter&limit={limit}&apikey={API_KEY}"
        cashflow_url = f"{BASE_URL}/cash-flow-statement/{ticker}?period=quarter&limit={limit}&apikey={API_KEY}"
    else:
        print(f"Error: Invalid period '{period}'. Use 'annual' or 'quarter'.")
        return statements_data # Return empty structure

    # --- Fetch Income Statement ---
    print("\nFetching Income Statement...")
    income_data = get_financial_data(income_url)
    if income_data:
        try:
            # Convert list of dictionaries to pandas DataFrame
            statements_data['income_statement'] = pd.DataFrame(income_data)
            print("Income Statement DataFrame created.")
        except Exception as e:
             print(f"Error creating DataFrame for Income Statement: {e}")
             # Keep raw data if needed: statements_data['income_statement'] = income_data
    else:
        print("Failed to fetch or process Income Statement data.")

    # --- Fetch Balance Sheet ---
    print("\nFetching Balance Sheet Statement...")
    balance_data = get_financial_data(balance_url)
    if balance_data:
        try:
            statements_data['balance_sheet'] = pd.DataFrame(balance_data)
            print("Balance Sheet DataFrame created.")
        except Exception as e:
            print(f"Error creating DataFrame for Balance Sheet: {e}")
    else:
        print("Failed to fetch or process Balance Sheet data.")

    # --- Fetch Cash Flow Statement ---
    print("\nFetching Cash Flow Statement...")
    cashflow_data = get_financial_data(cashflow_url)
    if cashflow_data:
        try:
            # Corrected key name used here
            statements_data['cash_flow_statement'] = pd.DataFrame(cashflow_data)
            print("Cash Flow Statement DataFrame created.")
        except Exception as e:
            print(f"Error creating DataFrame for Cash Flow Statement: {e}")
    else:
        print("Failed to fetch or process Cash Flow Statement data.")

    return statements_data

# --- Main Execution Logic ---
if __name__ == "__main__":
    # Check if API key seems to be set before proceeding
    if API_KEY == "YOUR_API_KEY":
        print("\nCritical Error: Cannot proceed without a valid FMP API key.")
        print("Please set the FMP_API_KEY environment variable or replace the placeholder in the script.")
    else:
        # Fetch Annual Statements
        annual_statements = fetch_statements(TICKER, period='annual', limit=LIMIT)

        # Fetch Quarterly Statements
        quarterly_statements = fetch_statements(TICKER, period='quarter', limit=LIMIT)

        # --- Display Fetched Statements ---
        print(f"\n\n--- Displaying Statements for {TICKER} ---")

        print("\n\n=== Annual Statements ===")
        if annual_statements: # Check if dictionary itself is not None (it shouldn't be, but safety)
            for name, df in annual_statements.items():
                print(f"\n--- {name.replace('_', ' ').title()} (Annual) ---")
                # Check if the value associated with the key is actually a DataFrame and not empty
                if isinstance(df, pd.DataFrame) and not df.empty:
                    # Optional: Set index for better readability - uncomment desired lines
                    # if 'date' in df.columns:
                    #     df_display = df.set_index('date')
                    # elif 'calendarYear' in df.columns:
                    #     df_display = df.set_index('calendarYear')
                    # else:
                    #     df_display = df # Keep original if no suitable index found
                    
                    # Display the DataFrame (prints to console)
                    print(df.to_string()) # Use to_string() to print full DataFrame usually
                elif df is None:
                    print("No data fetched for this statement.")
                else: # Could be empty df or something else
                    print("Data fetched, but DataFrame is empty or invalid.")
        else:
            print("Failed to fetch annual statements dictionary.")


        print("\n\n=== Quarterly Statements ===")
        if quarterly_statements:
            for name, df in quarterly_statements.items():
                print(f"\n--- {name.replace('_', ' ').title()} (Quarterly) ---")
                if isinstance(df, pd.DataFrame) and not df.empty:
                    # Optional: Set index
                    # if 'date' in df.columns:
                    #     df_display = df.set_index('date')
                    # else:
                    #     df_display = df
                    
                    print(df.to_string())
                elif df is None:
                    print("No data fetched for this statement.")
                else:
                     print("Data fetched, but DataFrame is empty or invalid.")
        else:
             print("Failed to fetch quarterly statements dictionary.")


        print("\n\n--- Script Finished ---")

Code Explanation: Step-by-Step

  1. Import Libraries: Imports requestspandas, and os.
  2. Configuration:
    • Sets up the API_KEY by first checking the FMP_API_KEY environment variable (recommended for security) and falling back to the placeholder "YOUR_API_KEY" if not found. It prints a warning if the placeholder is used. Remember to replace the placeholder or set the environment variable.
    • Defines the target TICKERLIMIT for the number of periods, and the BASE_URL.
  3. get_financial_data(url) Function: This enhanced helper function fetches data from a given FMP URL.
    • It includes error handling for timeouts, HTTP errors (checking for common API key issues with 401/403 status codes), general request exceptions, and JSON decoding errors.
    • It expects the API to return a list of dictionaries for statement data. It checks the response type and returns the list if valid and non-empty, None otherwise. It also specifically checks for FMP’s dictionary-based error messages.
  4. Workspace_statements(ticker, period='annual', limit=5) Function: This is the core logic block.
    • It takes the ticker, desired period (‘annual’ or ‘quarter’), and limit.
    • It initializes a dictionary statements_data to hold the results, with keys for each statement type initially set to None.
    • It constructs the correct API URLs for the Income Statement (/income-statement/), Balance Sheet (/balance-sheet-statement/), and Cash Flow Statement (/cash-flow-statement/) based on the periodrequested, including the limit parameter.
    • It calls get_financial_data sequentially for each statement URL.
    • Pandas Conversion: If get_financial_data returns a valid list (income_databalance_datacashflow_data), it attempts to convert this list of dictionaries into a pandas DataFrame using pd.DataFrame(). This organizes the data into a table format. A try-except block handles potential errors during DataFrame creation.
    • The resulting DataFrame (or None if fetching/conversion failed) is stored in the statements_data dictionary under the appropriate key.
    • Returns the statements_data dictionary.
  5. Main Execution Logic (if __name__ == "__main__":):
    • Performs a crucial check to see if the API_KEY is still the placeholder; if so, it prints an error and exits to avoid making invalid API calls.
    • Calls Workspace_statements for both ‘annual’ and ‘quarterly’ periods.
    • Iterates through the returned dictionaries (annual_statementsquarterly_statements).
    • For each statement, it checks if a valid DataFrame exists (is not None and not df.empty).
    • If valid, it prints a header and displays the entire DataFrame using df.to_string() (which helps avoid truncation in the console). It also includes commented-out suggestions for setting the ‘date’ or ‘calendarYear’ column as the DataFrame index, which is often useful for analysis.
    • If no valid DataFrame was retrieved for a statement, it prints a corresponding message.

Interpreting and Using the Results

Running this script provides you with structured annual and quarterly financial statements directly in your console, formatted as pandas DataFrames.

  • Data Structure: In the resulting DataFrames, each dictionary from the API list typically becomes a row, representing a specific time period (year or quarter). The columns represent the financial line items (e.g., revenuenetIncometotalAssetscashAndCashEquivalents). You can easily explore this:
    • df.shape shows (rows, columns) – rows = number of periods fetched.
    • df.columns lists all available financial line items.
    • df.head() shows the most recent periods.
  • Next Steps – Analysis Powerhouse: This structured data is ready for analysis:
    • Direct Access: Select specific data easily, e.g., income_df['revenue'] to get a Series of revenues over time, or balance_df.loc[0, 'totalAssets'] to get total assets for the most recent period (index 0). (Note: You might want to set the ‘date’ column as the DataFrame index later for easier time-series selection: df.set_index('date', inplace=True))
    • Ratio Calculation: Compute ratios across periods (e.g., balance_df['totalDebt'] / balance_df['totalStockholdersEquity'] for Debt-to-Equity).
    • Trend Visualization: Plot key metrics over time using pandas plotting (df['netIncome'].plot()) or libraries like Matplotlib/Seaborn.
    • Common-Size Analysis: Divide Income Statement items by Revenue, or Balance Sheet items by Total Assets, to compare trends regardless of company size.

Conclusion: Building Your Data Foundation

We’ve successfully automated the retrieval of the three core financial statements using Python, requests, and pandas. Converting the API’s JSON response into pandas DataFrames makes the data immediately accessible and ready for sophisticated analysis. This script forms a crucial building block for almost any fundamental analysis task.

Having programmatic access to this data frees us from manual downloads and allows for scalable, repeatable analysis across different companies and time periods.

Next Up: In the next post, we’ll dive into calculating profitability ratios for last 10 quarters.


Disclaimer: This blog post is for educational purposes only. Financial data APIs provide data “as is”; always verify critical data from official sources (e.g., SEC filings) before making investment decisions. Ensure compliance with API provider terms of service.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.