Decades of Returns: A Look at the S&P 500, Nasdaq, and Sector Performance with Python

Understanding how different parts of the stock market have performed over the long term can provide valuable perspective for investors. Have tech stocks always outperformed? How have defensive sectors like utilities or consumer staples held up? To explore this, we’ll analyze the historical performance of major US market indices and various S&P 500 sectors using Python.

Our goal is to calculate the long-term growth rates for:

Broad Market Indices:
- S&P 500 (via SPY ETF)
- Nasdaq Composite Index (^IXIC)
S&P 500 Sector ETFs: Covering Technology (XLK), Financials (XLF), Health Care (XLV), Consumer Discretionary (XLY), Consumer Staples (XLP), Energy (XLE), Industrials (XLI), Utilities (XLU), Materials (XLB), Communication Services (XLC), and Real Estate (XLRE).

Methodology: Crunching the Numbers

To perform this analysis, we use historical daily stock price data obtained using the popular yfinance library in Python. Specifically, we focus on Adjusted Close Prices. This is crucial because adjusted prices account for stock splits and dividend distributions, giving a more accurate picture of total return compared to just looking at the nominal closing price.

We aimed to retrieve data for the longest possible period, up to a maximum of 50 years from the present day. However, it’s important to note that many of these assets, particularly the SPY ETF (launched in 1993) and the sector ETFs (mostly launched in the late 1990s or later), have significantly less than 50 years of trading history. Our analysis dynamically determines the actual earliest available data point for each asset within our target timeframe.

The key metric we calculate is the Compound Annual Growth Rate (CAGR). CAGR represents the average annual growth of an investment over a specified period, assuming the profits were reinvested. It provides a smoothed-out measure of return, making it ideal for comparing long-term performance across different assets and time horizons. The formula is:

CAGR = ((Ending Value / Beginning Value) ^ (1 / Number of Years)) - 1

Calculating SP500 returns with Python

Below is the Python script used to fetch the data and calculate the CAGR for each asset. It defines the tickers, sets the maximum time frame, downloads the adjusted close prices using yfinance (including robustness checks for data format), calculates the CAGR based on the actual available data period for each asset, and prints a summary table.

(For non-programmers: This code automates the process of going to a financial data source, downloading daily price history for each stock/index symbol listed, finding the first and last price in that history, calculating the total time span in years, and then applying the CAGR formula to find the average annual return.)

import yfinance as yf
import pandas as pd
import datetime
import numpy as np

# --- Configuration ---
# Define the tickers for the assets to analyze
tickers = [
    "SPY",       # S&P 500 ETF
    "^IXIC",     # Nasdaq Composite Index
    "XLK",       # Technology Select Sector SPDR Fund
    "XLF",       # Financial Select Sector SPDR Fund
    "XLV",       # Health Care Select Sector SPDR Fund
    "XLY",       # Consumer Discretionary Select Sector SPDR Fund
    "XLP",       # Consumer Staples Select Sector SPDR Fund
    "XLE",       # Energy Select Sector SPDR Fund
    "XLI",       # Industrial Select Sector SPDR Fund
    "XLU",       # Utilities Select Sector SPDR Fund
    "XLB",       # Materials Select Sector SPDR Fund
    "XLC",       # Communication Services Select Sector SPDR Fund (Newer)
    "XLRE"       # Real Estate Select Sector SPDR Fund (Newer)
]

# Define the maximum lookback period (in years)
max_years_lookback = 80

# Calculate the earliest possible start date
end_date = datetime.datetime.now()
earliest_start_date = end_date - datetime.timedelta(days=max_years_lookback * 365.25)

# --- Data Fetching and Calculation ---
results = {}

print(f"Attempting to fetch data up to {max_years_lookback} years back from {end_date.strftime('%Y-%m-%d')}")
print(f"Calculated earliest start date target: {earliest_start_date.strftime('%Y-%m-%d')}\n")

for ticker in tickers:
    print(f"Processing {ticker}...")
    try:
        # Download historical data using yfinance
        # Setting auto_adjust=False to ensure 'Adj Close' is fetched.
        # This seems to return a MultiIndex column like ('Adj Close', 'TICKER')
        data = yf.download(ticker, start=earliest_start_date, end=end_date, progress=False, auto_adjust=False)

        # Check if data was successfully downloaded and is not empty
        if data is not None and not data.empty:
            # Get the actual start and end dates from the downloaded data
            actual_start_date = data.index.min()
            actual_end_date = data.index.max()

            # Define the expected column name tuple
            adj_close_col_tuple = ('Adj Close', ticker)
            # Also check for potential case variations from yfinance (though unlikely for Adj Close)
            adj_close_col_tuple_lower = ('adj close', ticker)

            # --- Robustness Check for Adjusted Close Column ---
            target_col = None
            if adj_close_col_tuple in data.columns:
                target_col = adj_close_col_tuple
            elif adj_close_col_tuple_lower in data.columns: # Check lowercase just in case
                 target_col = adj_close_col_tuple_lower
            # If still not found, check if maybe it returned a *simple* index 'Adj Close'
            elif 'Adj Close' in data.columns and not isinstance(data.columns, pd.MultiIndex):
                 target_col = 'Adj Close'


            if target_col is None:
                print(f"  -> Error: Adjusted Close column ('Adj Close', '{ticker}') or similar not found for {ticker}.")
                # Store results indicating the issue
                results[ticker] = {
                    "Start Date": actual_start_date.strftime('%Y-%m-%d'),
                    "End Date": actual_end_date.strftime('%Y-%m-%d'),
                    "Years": "N/A",
                    "CAGR (%)": "Missing Adj Close Col"
                }
                continue # Skip to the next ticker

            # --- Proceed with Calculation using the identified target column ---
            # Get the adjusted close prices for the start and end dates
            # Use .loc for safety with potential duplicate index dates (unlikely here but good practice)
            start_price = data.loc[actual_start_date, target_col]
            # Handle cases where start/end date might have multiple entries (take first/last)
            if isinstance(start_price, pd.Series):
                start_price = start_price.iloc[0]

            end_price = data.loc[actual_end_date, target_col]
            if isinstance(end_price, pd.Series):
                end_price = end_price.iloc[-1]


            # Calculate the number of years
            num_years = (actual_end_date - actual_start_date).days / 365.25

            # Calculate CAGR, handle potential division by zero or negative/invalid start price
            if num_years > 0 and pd.notna(start_price) and start_price > 0 and pd.notna(end_price):
                cagr = ((end_price / start_price) ** (1 / num_years)) - 1
                cagr_percent = cagr * 100
            else:
                # Handle cases like zero years, invalid prices, or NaN values
                print(f"  -> Warning: Could not calculate CAGR for {ticker} (Years: {num_years:.2f}, Start: {start_price}, End: {end_price}).")
                cagr_percent = np.nan # Assign NaN if calculation is not possible

            # Store the results
            results[ticker] = {
                "Start Date": actual_start_date.strftime('%Y-%m-%d'),
                "End Date": actual_end_date.strftime('%Y-%m-%d'),
                "Years": round(num_years, 2),
                "CAGR (%)": round(cagr_percent, 2) if not np.isnan(cagr_percent) else np.nan # Store actual NaN
            }
            print(f"  -> Success: Data processed from {results[ticker]['Start Date']} to {results[ticker]['End Date']}.")

        else:
            print(f"  -> Warning: No data found for {ticker} within the specified date range.")
            results[ticker] = {
                "Start Date": "N/A",
                "End Date": "N/A",
                "Years": "N/A",
                "CAGR (%)": "No Data"
            }

    # Catch specific expected errors and general exceptions
    except KeyError as ke:
         print(f"  -> Error (KeyError) processing data for {ticker}: {ke} - Likely column name issue.")
         results[ticker] = { "Start Date": "Error", "End Date": "Error", "Years": "Error", "CAGR (%)": f"KeyError: {ke}" }
    except Exception as e:
        print(f"  -> Error (General) fetching or processing data for {ticker}: {type(e).__name__} - {e}")
        results[ticker] = { "Start Date": "Error", "End Date": "Error", "Years": "Error", "CAGR (%)": f"Error: {e}" }

# --- Output Results ---
print("\n--- Historical Performance Summary ---")
# Convert results dictionary to DataFrame for pretty printing
results_df = pd.DataFrame.from_dict(results, orient='index')
# Ensure consistent column order
if not results_df.empty:
    # Replace actual NaN values with a string representation *before* printing
    results_df.fillna("N/A", inplace=True)
    results_df = results_df[["Start Date", "End Date", "Years", "CAGR (%)"]]

# Print as Markdown table - removed the unsupported na_rep argument
print(results_df.to_markdown())

Results: Long-Term Growth Rates

After running the Python script, we obtained the following CAGR figures for the SP500 index and the different sectors. Note: These numbers reflect the performance from the earliest available date for each specific asset up to April 11, 2025. The actual start dates vary significantly.

I just put them in a nice format and added the Asset Description as a reference:

	Start Date	End Date	Years	CAGR (%)	Asset Description
SPY	1993-01-29	2025-04-11	32.2	10.05	S&P 500 ETF
^IXIC	1971-02-05	2025-04-11	54.18	9.91	Nasdaq Composite Index
XLK	1998-12-22	2025-04-11	26.3	8.36	Technology Select Sector SPDR Fund
XLF	1998-12-22	2025-04-11	26.3	5.53	Financial Select Sector SPDR Fund
XLV	1998-12-22	2025-04-11	26.3	8.23	Health Care Select Sector SPDR Fund
XLY	1998-12-22	2025-04-11	26.3	9.16	Consumer Discretionary Select Sector SPDR Fund
XLP	1998-12-22	2025-04-11	26.3	6.76	Consumer Staples Select Sector SPDR Fund
XLE	1998-12-22	2025-04-11	26.3	7.48	Energy Select Sector SPDR Fund
XLI	1998-12-22	2025-04-11	26.3	8.52	Industrial Select Sector SPDR Fund
XLU	1998-12-22	2025-04-11	26.3	7.29	Utilities Select Sector SPDR Fund
XLB	1998-12-22	2025-04-11	26.3	7.62	Materials Select Sector SPDR Fund
XLC	2018-06-19	2025-04-11	6.81	10.27	Communication Services Select Sector SPDR Fund
XLRE	2015-10-08	2025-04-11	9.51	6.3	Real Estate Select Sector SPDR Fund

Discussion & Insights

Looking at the results, we can draw several observations:

Broad Market Performance: The S&P 500 (SPY) and Nasdaq Composite (^IXIC) show remarkably similar long-term CAGRs of around 10%. This reflects the strong overall growth of the US market.
Sector Variability (Since Dec 1998 for most): There are significant differences in performance across sectors since late 1998 (when most sector ETFs launched):
- Growth-Oriented Sectors: Consumer Discretionary (XLY) stands out with a strong 9.16% CAGR. Technology (XLK) also performed well at 8.36%, slightly behind Industrials (XLI) at 8.52%. Health Care (XLV) showed robust growth at 8.23%.
- Cyclical Sectors: Industrials (XLI) led this group since 1998. Materials (XLB) and Energy (XLE) provided similar returns around 7.5-7.6%. Financials (XLF) significantly lagged with a 5.53% CAGR over this period, perhaps reflecting the impact of the 2008 financial crisis.
- Defensive Sectors: Utilities (XLU) at 7.29% and Consumer Staples (XLP) at 6.76% delivered more moderate returns, typical of their defensive nature.
- Newer Sectors: Communication Services (XLC) shows the highest CAGR (10.27%) but over a much shorter period (since mid-2018). Real Estate (XLRE) has a shorter history too (since late 2015) and showed a lower CAGR of 6.3%.

It’s crucial to remember that past performance is not indicative of future results. Economic conditions, technological shifts, regulatory changes, and global events can all impact future returns. This analysis provides a historical perspective, not a prediction. Shorter timeframes, like those for XLC and XLRE, are also less indicative of long-term trends compared to the multi-decade results of others.

Conclusion

Analyzing historical data reveals the power of long-term investing and compounding, particularly in broad market indices like the S&P 500 and Nasdaq, which both delivered roughly 10% annualized returns over their respective histories in this analysis. However, it also highlights the diverse performance across different economic sectors. Since late 1998, Consumer Discretionary, Industrials, and Technology have been among the stronger performers, while Financials lagged. Understanding these long-term trends, while acknowledging their limitations for predicting the future, can help investors build diversified portfolios aligned with their goals and risk tolerance.

Disclaimer: The content provided in this article is for informational and educational purposes only. It does not constitute financial advice, investment advice, trading advice, or any other sort of advice, and you should not treat any of the article’s content as such. Always conduct your own research and consult with a qualified financial advisor before making any investment decisions.

Note: This article was drafted with the assistance of AI technology and has been reviewed and edited by the author. Please note that the return was calculated by the Python outcome and may not be accurate.

Decades of Returns: A Look at the S&P 500, Nasdaq, and Sector Performance with Python

Methodology: Crunching the Numbers

Calculating SP500 returns with Python

Results: Long-Term Growth Rates

Discussion & Insights

Conclusion

The Future of Copper: Supply, Demand, and Alternatives

The Impact of Short-Term Yield Cuts on Utility Performance

Understanding How the Federal Reserve Manages the Yield Curve

The Dollar and Yields: Understanding the Divergence in the last weeks

Market Outlook – Mid-November 2024

Understanding the Dynamics of U.S. Treasury Bonds in the Context of a Rising Deficit

Leave a Reply Cancel reply

Footer

Methodology: Crunching the Numbers

Calculating SP500 returns with Python

Results: Long-Term Growth Rates

Discussion & Insights

Conclusion

Similar Posts

Leave a Reply Cancel reply

Footer