feat(Hashing): added more hash checking for repeat download checks to avoid overlap

This commit is contained in:
2025-10-07 17:39:57 +13:00
parent 8c293a4684
commit 15f14088be
8 changed files with 584 additions and 10 deletions

View File

@@ -235,6 +235,15 @@ The following options apply only to the `download` command. This command downloa
- `--no-dupes`
- This flag will not redownload files if they were already downloaded in the current run
- This is calculated by MD5 hash
- `--simple-check`
- **Enhanced: Works with Persistent Hash Storage**
- Enables fast URL-based duplicate detection for the `--no-dupes` functionality
- When enabled, the downloader first checks if a submission URL has been downloaded before calculating expensive file hashes
- Creates enhanced hash files (`.bdfr_hashes.json`) with URL mappings for faster subsequent runs
- Stores both hash-to-file and URL-to-hash mappings for optimal performance
- Falls back to full hash checking if URL is not found in the hash file
- Maintains backward compatibility with existing hash files
- Significantly improves performance when downloading from sources with many duplicate URLs
- `--search-existing`
- This will make the BDFR compile the hashes for every file in `directory`
- The hashes are used to remove duplicates if `--no-dupes` is supplied or make hard links if `--make-hard-links` is supplied