I wrote a python script to backup my home directory

WasPentalive · 4 months ago

I wrote a python script to backup my home directory

Droolio@feddit.uk · 4 months ago

Multiple backups may be kept.

Nice work, but if I may suggest - it lacks hardlink support, so’s quite wasteful in terms of disk space - the number of ‘tags’ (snapshots) will be extremely limited.

At least two robust solutions that use rsync+hardlinks already exist: rsnapshot.org and dirvish.org (both written in perl). There’s definitely room for backup tools that produce plain copies, instead of packed chunk data like restic and Duplicacy, and a python or even bash-based tool might be nice, so keep at it.

However, I liken backup software to encryption - extreme care must be taken when rolling and using your own. Whatever tool you use, test test test the backups. :)

WasPentalive · edit-2 4 months ago

@droolio@feddit.uk I see what you’re asking. You’re wondering if, instead of storing a duplicate file when another backup set already contains it, I could use a hardlink to point to the file already stored in that other set?

I have a system where I create a backup set for each day of the week. When I do a backup for that day, I update the set, or if it’s out of date, I replace it entirely with a fresh backup image (After 7 backups to that set). But if the backup sets became inter-dependent, removing or updating one set could lead to problems with others that rely on files in the first set.

Does that make sense? I am asking because I am not familiar with the utilities you mentioned and may be taking your post wrong.

demeaning_casually@infosec.pub · edit-2 4 months ago

A hilariously unnecessary Python script that could have easily been done in bash since it’s literally just a wrapper around rsync. 😅

When you’ve only got a Python-sized hammer in your toolbox, everything looks like a Python nail, I guess.

#!/bin/bash

# Function to read settings
# Settings file format:
# ~/.config/loci/settings
# [backup]
#
# server = <<Name of server>>
# user = <<server user login>>
# backup_root = <<Directory off user's home Directory>>
# taglist = mon tue wed thu fri sat sun spc
# exclude_files = <<not implemented yet - leave blank>>
# source_dir = <<the local directory we are backing up>>
read_settings() {
  settings_file="$HOME/.config/loci/settings"
  if [[ -f "$settings_file" ]]; then
    while IFS='=' read -r key value || [[ -n "$key" ]]; do
      if [[ ! -z "$key" && ! "$key" =~ ^# && ! "$key" =~ ^\[ ]]; then
        key=$(echo "$key" | xargs)
        value=$(echo "$value" | xargs)
        declare -g "$key"="$value"
      fi
    done < "$settings_file"
  else
    echo "Settings file not found: $settings_file"
    exit 1
  fi
}

# Function to perform the backup
backup() {
  local tag="$1"
  read_settings
  
  # Create backup directory if it doesn't exist
  backup_dest="$backup_root/$tag"
  mkdir -p "$backup_dest" 2>/dev/null
  
  # Rsync command for backup
  target="$user@$server:/home/$user/$backup_root/$tag"
  rsync_cmd="rsync -avh $source_dir $target"
  # If exclude_files is defined and not empty, add it to rsync command
  if [[ -n "$exclude_files" ]]; then
    rsync_cmd="rsync -avh --exclude='$exclude_files' $source_dir $target"
  fi
  echo "Command:$rsync_cmd"
  eval "$rsync_cmd"
  
  # Log the backup information
  log_path="$HOME/.backuplog"
  timestamp=$(date +"%Y-%m-%d %H:%M")
  echo "\"$tag\",$timestamp,$rsync_cmd,$timestamp" >> "$log_path"
  
  echo "Backup for '$tag' completed and logged."
}

# Function to remove the backup
remove_backup() {
  local tag="$1"
  read_settings
  
  # Rsync remove command
  rmfile="/home/$user/$backup_root/$tag"
  rm_cmd="ssh $user@$server rm -rf $rmfile"
  eval "$rm_cmd"
  
  # Remove log entries
  log_path="$HOME/.backuplog"
  if [[ -f "$log_path" ]]; then
    # Create a temporary file
    temp_file=$(mktemp)
    # Copy lines not starting with the tag to temp file
    grep -v "^\"$tag\"," "$log_path" > "$temp_file"
    # Replace the original with filtered content
    mv "$temp_file" "$log_path"
  fi
  
  echo "Backup '$tag' removed."
}

# Function to list the backups
list_backups() {
  read_settings
  log_path="$HOME/.backuplog"
  
  # Loop through each tag in the taglist
  for tag in $taglist; do
    # Count occurrences of the tag in the log
    count=0
    youngest=""
    
    if [[ -f "$log_path" ]]; then
      # Get count of tag occurrences
      count=$(grep -c "^\"$tag\"," "$log_path")
      
      # Get the newest backup date for this tag
      if [[ $count -gt 0 ]]; then
        # Extract dates and find the newest one
        dates=$(grep "^\"$tag\"," "$log_path" | cut -d',' -f2)
        youngest=$(echo "$dates" | sort -r | head -1)
      fi
    fi
    
    # Determine status
    if [[ $count -eq 0 ]]; then
      status="Missing"
    elif [[ $count -gt 5 ]]; then
      status="Needs renewal"
    elif [[ ! -z "$youngest" ]]; then
      # Calculate days since last backup
      youngest_seconds=$(date -d "$youngest" +%s)
      now_seconds=$(date +%s)
      days_diff=$(( (now_seconds - youngest_seconds) / 86400 ))
      
      if [[ $days_diff -gt 7 ]]; then
        status="Needs to be run"
      else
        status="Up to date"
      fi
    else
      status="Missing"
    fi
    
    echo "Tag: $tag, Status: $status, Count: $count, Last Backup: ${youngest:-N/A}"
  done
}

# Main function
main() {
  if [[ "$1" == "-b" || "$1" == "--backup" ]] && [[ ! -z "$2" ]]; then
    backup "$2"
  elif [[ "$1" == "-r" || "$1" == "--remove" ]] && [[ ! -z "$2" ]]; then
    remove_backup "$2"
  elif [[ "$1" == "-l" || "$1" == "--list" ]]; then
    list_backups
  else
    echo "Usage: loci -b <tag> | loci -r <tag> | loci -l"
  fi
}

# Execute the script
main "$@"

WasPentalive · 4 months ago

It’s also to help me learn python. And it works for me. : ^ )

ThrowawayOnLemmy@lemmy.world · edit-2 4 months ago

Don’t mind him. Any time someone shares code, there’s always someone else who did nothing talking about how much better your code could have been. Just noise from the peanut gallery.

WasPentalive · 4 months ago

Yeah, no problem… I started out with just bare rsync - but I did the backup infrequently and needed my notes to know the command. Then I wrote a simple shell script to run the rsync for me. Then I decided I needed more than one backup, redundancy is good. Then I wanted to keep track of the backups so I had it write to .backuplog then that file started getting dated (every time I run a “sun” backup the record of the previous one is useless) so Finally TaDa! loci is born.

rc__buggy@sh.itjust.works · 4 months ago

lol, you’re braver than me. No one ever sees the “code” I’ve written.

WasPentalive · 4 months ago

That’s ok Like any landing you can walk away from. Any code that runs to spec is good, much could be better.

ChapulinColorado@lemmy.world · 4 months ago

Bash does seem like a better fit for this kind of script since it is a lot more portable.

I.e.: It comes by default for many Linux distributions. For windows, a Git bash install will get you most utilities needed for large reliable scripts (grep, scp, find, sort, uniq, cat, tr, ls, etc.).

With that said, you should write it on whatever language you want, especially if it is for learning purposes, that’s where the fun comes from :)

ThrowawayOnLemmy@lemmy.world · edit-2 4 months ago

Do you wanna share a bash script, then?

WasPentalive · 4 months ago

Especially one that lets you know how long it’s been since you took time to run a backup, keeps track of which set of backups could be updated, and which should be refreshed, and keeps a log file up to date and in .csv format so you can mess with it in a spreadsheet?

demeaning_casually@infosec.pub · 4 months ago

#!/bin/bash
read_settings() {
  settings_file="$HOME/.config/loci/settings"
  if [[ -f "$settings_file" ]]; then
    while IFS='=' read -r key value || [[ -n "$key" ]]; do
      if [[ ! -z "$key" && ! "$key" =~ ^# && ! "$key" =~ ^\[ ]]; then
        key=$(echo "$key" | xargs)
        value=$(echo "$value" | xargs)
        declare -g "$key"="$value"
      fi
    done < "$settings_file"
  else
    echo "Settings file not found: $settings_file"
    exit 1
  fi
}

# Function to perform the backup
backup() {
  local tag="$1"
  read_settings
  
  log_path="$HOME/.backuplog"
  
  # Check if header exists in log file, if not, create it
  if [[ ! -f "$log_path" ]]; then
    echo "\"tag\",\"timestamp\",\"command\",\"completion_time\"" > "$log_path"
  elif [[ $(head -1 "$log_path") != "\"tag\",\"timestamp\",\"command\",\"completion_time\"" ]]; then
    # Add header if it doesn't exist
    temp_file=$(mktemp)
    echo "\"tag\",\"timestamp\",\"command\",\"completion_time\"" > "$temp_file"
    cat "$log_path" >> "$temp_file"
    mv "$temp_file" "$log_path"
  fi
  
  # Create backup directory if it doesn't exist
  backup_dest="$backup_root/$tag"
  mkdir -p "$backup_dest" 2>/dev/null
  
  # Rsync command for backup
  target="$user@$server:/home/$user/$backup_root/$tag"
  rsync_cmd="rsync -avh $source_dir $target"
  # If exclude_files is defined and not empty, add it to rsync command
  if [[ -n "$exclude_files" ]]; then
    rsync_cmd="rsync -avh --exclude='$exclude_files' $source_dir $target"
  fi
  
  echo "Starting backup for tag '$tag' at $(date '+%Y-%m-%d %H:%M:%S')"
  echo "Command: $rsync_cmd"
  
  # Record start time
  start_timestamp=$(date +"%Y-%m-%d %H:%M:%S")
  
  # Execute the backup
  eval "$rsync_cmd"
  backup_status=$?
  
  # Record completion time
  completion_timestamp=$(date +"%Y-%m-%d %H:%M:%S")
  
  # Calculate duration
  start_seconds=$(date -d "$start_timestamp" +%s)
  end_seconds=$(date -d "$completion_timestamp" +%s)
  duration=$((end_seconds - start_seconds))
  
  # Format duration
  if [[ $duration -ge 3600 ]]; then
    formatted_duration="$((duration / 3600))h $((duration % 3600 / 60))m $((duration % 60))s"
  elif [[ $duration -ge 60 ]]; then
    formatted_duration="$((duration / 60))m $((duration % 60))s"
  else
    formatted_duration="${duration}s"
  fi
  
  # Log the backup information as proper CSV
  echo "\"$tag\",\"$start_timestamp\",\"$rsync_cmd\",\"$completion_timestamp\"" >> "$log_path"
  
  if [[ $backup_status -eq 0 ]]; then
    echo -e "\e[32mBackup for '$tag' completed successfully\e[0m"
    echo "Duration: $formatted_duration"
    echo "Logged to: $log_path"
  else
    echo -e "\e[31mBackup for '$tag' failed with status $backup_status\e[0m"
  fi
}

# Function to remove the backup
remove_backup() {
  local tag="$1"
  read_settings
  
  echo "Removing backup for tag '$tag'..."
  
  # Rsync remove command
  rmfile="/home/$user/$backup_root/$tag"
  rm_cmd="ssh $user@$server rm -rf $rmfile"
  
  # Execute the removal command
  eval "$rm_cmd"
  rm_status=$?
  
  if [[ $rm_status -ne 0 ]]; then
    echo -e "\e[31mError: Failed to remove remote backup for tag '$tag'\e[0m"
    echo "Command failed: $rm_cmd"
    return 1
  fi
  
  # Remove log entries while preserving header
  log_path="$HOME/.backuplog"
  if [[ -f "$log_path" ]]; then
    # Create a temporary file
    temp_file=$(mktemp)
    
    # Copy header (first line) if it exists
    if [[ -s "$log_path" ]]; then
      head -1 "$log_path" > "$temp_file"
      # Only copy non-matching lines after header
      tail -n +2 "$log_path" | grep -v "^\"$tag\"," >> "$temp_file"
    else
      # If log is empty, add header
      echo "\"tag\",\"timestamp\",\"command\",\"completion_time\"" > "$temp_file"
    fi
    
    # Replace the original with filtered content
    mv "$temp_file" "$log_path"
    
    echo -e "\e[32mBackup '$tag' removed successfully\e[0m"
    echo "Log entries for '$tag' have been removed from $log_path"
  else
    echo -e "\e[32mBackup '$tag' removed successfully\e[0m"
    echo "No log file found at $log_path"
  fi
}

# Function to list the backups with detailed timing information
list_backups() {
  read_settings
  log_path="$HOME/.backuplog"
  
  echo "Backup Status Report ($(date '+%Y-%m-%d %H:%M:%S'))"
  echo "========================================================="
  printf "%-8s %-15s %-10s %-20s %-15s\n" "TAG" "STATUS" "COUNT" "LAST BACKUP" "DAYS AGO"
  echo "--------------------------------------------------------"
  
  # Check if header exists in log file, if not, create it
  if [[ ! -f "$log_path" ]]; then
    echo "\"tag\",\"timestamp\",\"command\",\"completion_time\"" > "$log_path"
    echo "Created new log file with CSV headers."
  elif [[ $(head -1 "$log_path") != "\"tag\",\"timestamp\",\"command\",\"completion_time\"" ]]; then
    # Add header if it doesn't exist
    temp_file=$(mktemp)
    echo "\"tag\",\"timestamp\",\"command\",\"completion_time\"" > "$temp_file"
    cat "$log_path" >> "$temp_file"
    mv "$temp_file" "$log_path"
    echo "Added CSV headers to existing log file."
  fi
  
  # Loop through each tag in the taglist
  for tag in $taglist; do
    # Count occurrences of the tag in the log
    count=0
    youngest=""
    days_ago="N/A"
    
    if [[ -f "$log_path" ]]; then
      # Skip header when counting
      count=$(grep -c "^\"$tag\"," "$log_path")
      
      # Get the newest backup date for this tag
      if [[ $count -gt 0 ]]; then
        # Extract dates and find the newest one
        dates=$(grep "^\"$tag\"," "$log_path" | cut -d',' -f2)
        youngest=$(echo "$dates" | sort -r | head -1)
        
        # Calculate days since last backup
        if [[ ! -z "$youngest" ]]; then
          youngest_seconds=$(date -d "$youngest" +%s)
          now_seconds=$(date +%s)
          days_diff=$(( (now_seconds - youngest_seconds) / 86400 ))
          days_ago="$days_diff days"
        fi
      fi
    fi
    
    # Determine status with colored output
    if [[ $count -eq 0 ]]; then
      status="Missing"
      status_color="\e[31m$status\e[0m" # Red
    elif [[ $count -gt 5 ]]; then
      status="Needs renewal"
      status_color="\e[33m$status\e[0m" # Yellow
    elif [[ ! -z "$youngest" ]]; then
      # Calculate days since last backup
      youngest_seconds=$(date -d "$youngest" +%s)
      now_seconds=$(date +%s)
      days_diff=$(( (now_seconds - youngest_seconds) / 86400 ))
      
      if [[ $days_diff -gt 7 ]]; then
        status="Needs to be run"
        status_color="\e[33m$status\e[0m" # Yellow
      else
        status="Up to date"
        status_color="\e[32m$status\e[0m" # Green
      fi
    else
      status="Missing"
      status_color="\e[31m$status\e[0m" # Red
    fi
    
    printf "%-8s %-15b %-10s %-20s %-15s\n" "$tag" "$status_color" "$count" "${youngest:-N/A}" "$days_ago"
  done
  
  echo "--------------------------------------------------------"
  echo "CSV log file: $log_path"
  echo "Run 'loci -l' to refresh this status report"
}

# Function to show backup stats
show_stats() {
  read_settings
  log_path="$HOME/.backuplog"
  
  if [[ ! -f "$log_path" ]]; then
    echo "No backup log found at $log_path"
    return 1
  fi
  
  echo "Backup Statistics"
  echo "================="
  
  # Total number of backups
  total_backups=$(grep -v "^\"tag\"" "$log_path" | wc -l)
  echo "Total backups: $total_backups"
  
  # Backups per tag
  echo -e "\nBackups per tag:"
  for tag in $taglist; do
    count=$(grep "^\"$tag\"," "$log_path" | wc -l)
    echo "  $tag: $count"
  done
  
  # Last backup time for each tag
  echo -e "\nLast backup time:"
  for tag in $taglist; do
    latest=$(grep "^\"$tag\"," "$log_path" | cut -d',' -f2 | sort -r | head -1)
    if [[ -z "$latest" ]]; then
      echo "  $tag: Never"
    else
      # Calculate days ago
      latest_seconds=$(date -d "$latest" +%s)
      now_seconds=$(date +%s)
      days_diff=$(( (now_seconds - latest_seconds) / 86400 ))
      echo "  $tag: $latest ($days_diff days ago)"
    fi
  done
  
  echo -e "\nBackup log file: $log_path"
  echo "To view in a spreadsheet: cp $log_path ~/backups.csv"
}

# Function to export log to CSV
export_csv() {
  read_settings
  log_path="$HOME/.backuplog"
  export_path="${1:-$HOME/backup_export.csv}"
  
  if [[ ! -f "$log_path" ]]; then
    echo "No backup log found at $log_path"
    return 1
  fi
  
  # Copy the log file to export location
  cp "$log_path" "$export_path"
  echo "Backup log exported to: $export_path"
  echo "You can now open this file in your spreadsheet application."
}

# Main function
main() {
  if [[ "$1" == "-b" || "$1" == "--backup" ]] && [[ ! -z "$2" ]]; then
    backup "$2"
  elif [[ "$1" == "-r" || "$1" == "--remove" ]] && [[ ! -z "$2" ]]; then
    remove_backup "$2"
  elif [[ "$1" == "-l" || "$1" == "--list" ]]; then
    list_backups
  elif [[ "$1" == "-s" || "$1" == "--stats" ]]; then
    show_stats
  elif [[ "$1" == "-e" || "$1" == "--export" ]]; then
    export_csv "$2"
  elif [[ "$1" == "-h" || "$1" == "--help" ]]; then
    echo "Loci Backup Management Tool"
    echo "Usage:"
    echo "  loci -b, --backup <tag>   Create a backup with the specified tag"
    echo "  loci -r, --remove <tag>   Remove a backup with the specified tag"
    echo "  loci -l, --list           List all backup statuses"
    echo "  loci -s, --stats          Show backup statistics"
    echo "  loci -e, --export [path]  Export backup log to CSV (default: ~/backup_export.csv)"
    echo "  loci -h, --help           Show this help message"
  else
    echo "Usage: loci -b <tag> | loci -r <tag> | loci -l | loci -s | loci -e [path] | loci -h"
  fi
}

WasPentalive · 4 months ago

Ah, Improvements!

blaidd@jlai.lu · edit-2 2 months ago

deleted by creator

Artyom@lemm.ee · 4 months ago

Can you please articulate why Python and Bash are so different in your eyes?

demeaning_casually@infosec.pub · edit-2 4 months ago

One needs to be ~~compiled~~ installed and the other is literally the de facto scripting language installed everywhere and intended for exactly this purpose.

Artyom@lemm.ee · 4 months ago

Python does not need to be compiled, have you ever used it?

WasPentalive · 4 months ago

My system came with Python3 installed. Debian 12.

WasPentalive · 4 months ago

Looks like a line by line translation from the python. Will you use it to backup your home directory?

demeaning_casually@infosec.pub · 4 months ago

No.

It doesn’t really do anything I particularly need.

Cobrachicken@lemmy.world · 4 months ago

Saved for trying out later, ty!