Automating Data Processing & CI/CD on GitHub Pages

1. Prepare the Python Script

Given execute.py, ensure it has a non-trivial error fixed. For illustration, suppose the error was a misused pandas import or a data processing step. Below is an example fix (replace with actual fix based on your execute.py):

# Fixed execute.py example
import pandas as pd

def main():
    df = pd.read_excel('data.xlsx')
    # perform some data processing
    result = df.describe()
    # save the result as JSON
    result.to_json('result.json')

if __name__ == '__main__':
    main()

Remember to replace this with your actual corrected code.

2. Convert data.xlsx to data.csv

Use pandas or a tool to convert data.xlsx to data.csv. For example, a Python snippet:

import pandas as pd
df = pd.read_excel('data.xlsx')
df.to_csv('data.csv', index=False)

Run this locally or include it as a script to generate data.csv. Commit data.csv.

3. Set Up GitHub Actions Workflow

Create .github/workflows/ci.yml in your repo with the following content. This will:

name: CI/CD Pipeline

on:
  push:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install pandas==2.3.0
          pip install ruff

      - name: Run ruff for linting
        run: |
          ruff .

      - name: Run execution script
        run: |
          python execute.py > result.json
        # Assume execute.py produces output in JSON

      - name: Publish to GitHub Pages
        if: github.ref == 'refs/heads/main'
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./
          publish_branch: gh-pages
          user_name: GitHub Actions
          user_email: action@github.com
          cname: ''  # Optional custom domain

# Note: To actually publish result.json, you might need to copy it to the root or specify publish_dir accordingly.

Adjust paths as needed. The publish_dir should contain result.json for it to be served via GitHub Pages.

Notes