## Change log
  
- Youtube support added
- Custom filenames feature added
- Custom folder structure feature added
- Unsaving downloaded posts option added
- Remove duplicate posts on different subreddits option added
- Skipping given domains option added
- Keeping track of already downloaded posts on a separate file option added (See --dowloaded-posts in README)
- No audio on v.redd.it videos bug fixed (see README for details about ffmpeg)
- --default-directory option is added
- --default-options is added
- --use-local-config option is added
- Bug fixes
This commit is contained in:
Ali Parlakçı
2020-06-01 15:05:02 +03:00
committed by GitHub
parent 0e007abd64
commit fd4958c06a
26 changed files with 1805 additions and 1712 deletions

216
README.md
View File

@@ -1,115 +1,178 @@
# Bulk Downloader for Reddit # 📥 Bulk Downloader for Reddit
Downloads media from reddit posts. Made by [u/aliparlakci](https://reddit.com/u/aliparlakci) Downloads reddit posts. Made by [u/aliparlakci](https://reddit.com/u/aliparlakci)
Please give feedback *(errors, feature requests, etc.)* on the [Issues](https://github.com/aliparlakci/bulk-downloader-for-reddit/issues) page. I will try to resolve them ASAP.
## [Download the latest release here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest) ## [Download the latest release here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
## What it can do ## 🚀 How to use
If you run **Windows**, after you extract the zip file, double-click on the *bulk-downloader-for-reddit.exe*. The program will guide you through. Also, take a look at the [Setting up the program](#🔨-setting-up-the-program) section. **However**, Bulk Dowloader for Reddit has a plenty of features which can only be activated via command line arguments. See [Options](#⚙-Options) for it.
- Can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links Unfortunately, there is no binary for **MacOS** or **Linux**. If you are a MacOS or Linux user, you must use the program from the source code. See the [Interpret from source code](docs/INTERPRET_FROM_SOURCE.md) page.
- Sorts posts by hot, top, new and so on
- Downloads **REDDIT** images and videos, **IMGUR** images and albums, **GFYCAT** links, **EROME** images and albums, **SELF POSTS** and any link to a **DIRECT IMAGE**
- Skips the existing ones
- Puts post title and OP's name in file's name
- Puts every post to its subreddit's folder
- Saves a reusable copy of posts' details that are found so that they can be re-downloaded again
- Logs failed ones in a file to so that you can try to download them later
## Installation However, binary version for Linux is being worked. So, stay tuned.
You can use it either as a `bulk-downloader-for-reddit.exe` executable file for Windows, as a Linux binary or as a *[Python script](#python-script)*. There is no MacOS executable, MacOS users must use the Python script option. OR, regardless of your operating system, you can fire up the program from the **source code**.
### Executables #### `python3 -m pip install -r -requirements.txt`
For Windows and Linux, [download the latest executables, here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest). #### `python3 script.py`
### Python script See the [Interpret from source code](docs/INTERPRET_FROM_SOURCE.md) page for more information.
* Download this repository ([latest zip](https://github.com/aliparlakci/bulk-downloader-for-reddit/archive/master.zip) or `git clone git@github.com:aliparlakci/bulk-downloader-for-reddit.git`). ## 🔨 Setting up the program
* Enter its folder. ### 🖼 IMGUR API
* Run `python ./script.py` from the command-line (Windows, MacOSX or Linux command line; it may work with Anaconda prompt) See [here](docs/INTERPRET_FROM_SOURCE.md#finding-the-correct-keyword-for-python) if you have any trouble with this step.
It uses Python 3.6 and above. It won't work with Python 3.5 or any Python 2.x. If you have a trouble setting it up, see [here](docs/INTERPRET_FROM_SOURCE.md). You need to create an imgur developer app in order API to work. Go to https://api.imgur.com/oauth2/addclient and login.
IMGUR will redirect you to homepage instead of API form page. After you log in, open the above link manually. Fill the form in the link (It does not really matter what you fill it with. You can write www.google.com to the callback url)
### Setting up the script After you send the form, it will redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**. Type in those values into program respectively.
You need to create an imgur developer app in order API to work. Go to https://api.imgur.com/oauth2/addclient and fill the form (It does not really matter how you fill it). ### 📽 ffmpeg Library
It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**. Program needs **ffmpeg software** to add audio to some video files. However, installing it is **voluntary**. Although the program can still run with no errors without the ffmpeg library, some video files might have no sound.
When you run it for the first time, it will automatically create `config.json` file containing `imgur_client_id`, `imgur_client_secret`, `reddit_username` and `reddit_refresh_token`. Install it through a package manager such as **Chocolatey** in Windows, **apt** in Linux or **Homebrew** in MacOS:
- **in Windows**: After you **[install Chocolatey](https://chocolatey.org/install)**, type **`choco install ffmpeg`** in either Command Promt or Powershell.
- **in Linux**: Type **`sudo apt install ffmpeg`** in Terminal.
- **in MacOS**: After you **[install Homebrew](https://brew.sh/)**, type **`brew install ffmpeg`** in Terminal
## Running OR, [Download ffmpeg](https://www.ffmpeg.org/download.html) manually on your system and [add the bin folder in the downloaded folder's directory to `PATH` of your system.](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) However, package manager option is suggested.
You can run it it an interactive mode, or using [command-line arguments](docs/COMMAND_LINE_ARGUMENTS.md) (also available via `python ./script.py --help` or `bulk-downloader-for-reddit.exe --help`). ## ⚙ Options
To run the interactive mode, simply use `python ./script.py` or double click on `bulk-downloader-for-reddit.exe` without any extra commands. Some of the below features are available only through command-line.
### [Example for command line arguments](docs/COMMAND_LINE_ARGUMENTS.md#examples) Open the [Command Promt](https://youtu.be/bgSSJQolR0E?t=18), [Powershell](https://youtu.be/bgSSJQolR0E?t=18) or [Terminal](https://youtu.be/Pz4yHAB3G8w?t=31) in the folder that contains bulk-downloader-for-reddit file (click on the links to see how)
### Example for an interactive script After you type **`bulk-downloader-for-reddit.exe`**, type the preffered options.
``` Example: **`bulk-downloader-for-reddit.exe --subreddit pics --sort top --limit 10`**
(py37) bulk-downloader-for-reddit user$ python ./script.py
Bulk Downloader for Reddit v1.6.5 ## **`--subreddit`**
Written by Ali PARLAKCI parlakciali@gmail.com Downloads posts from given subreddit(s). Takes number of subreddit names as a paramater.
https://github.com/aliparlakci/bulk-downloader-for-reddit/ Example usage: **`--subreddit IAmA pics --sort hot --limit 10`**
download directory: downloads/dataisbeautiful_last_few ## **`--multireddit`**
select program mode: Downloads posts from given subreddit. Takes a single multireddit name as a parameter. **`--user`** option is required.
[1] search Example usage: **`--multireddit myMulti --user me --sort top --time week`**
[2] subreddit
[3] multireddit
[4] submitted
[5] upvoted
[6] saved
[7] log
[0] exit
> 2 ## **`--search`**
(type frontpage for all subscribed subreddits, Searches for given query in given subreddit(s) or multireddit. Takes a search query as a parameter. **`--subreddit`** or **`--multireddit`** option is required. **`--sort`** option is required.
use plus to seperate multi subreddits: pics+funny+me_irl etc.)
subreddit: dataisbeautiful Example usage: **`--search carter --subreddit funny`**
select sort type: ## **`--submitted`**
Downloads given redditor's submitted posts. Does not take any parameter. **`--user`** option is required.
[1] hot Example usage: **`--submitted --user spɛz --sort top --time week`**
[2] top
[3] new
[4] rising
[5] controversial
[0] exit
> 1 ## **`--upvoted`**
Downloads given redditor's upvoted posts. Does not take any parameter. **`--user`** option is required.
limit (0 for none): 50 Example usage: **`--upvoted --user spɛz`**
GETTING POSTS ## **`--saved`**
Downloads logged in redditor's saved posts. Does not take any parameter. Example usage: **`--saved`**
## **`--link`**
Takes a reddit link as a parameter and downloads the posts in the link. Put the link in " " (double quotes).
(1/24) r/dataisbeautiful Example usage: **`--link "https://www.reddit.com/r/funny/comments/25blmh/"`**
AutoModerator_[Battle]_DataViz_Battle_for_the_month_of_April_2019__Visualize_the_April_Fool's_Prank_for_2019-04-01_on__r_DataIsBeautiful_b8ws37.md
Downloaded
(2/24) r/dataisbeautiful ## **`--log`**
AutoModerator_[Topic][Open]_Open_Discussion_Monday_—_Anybody_can_post_a_general_visualization_question_or_start_a_fresh_discussion!_bg1wej.md Program saves the found posts into POSTS.json file and the failed posts to FAILED.json file in LOG_FILES folder. You can use those files to redownload the posts inside them.
Downloaded
... Uses a .json file to redownload posts from. Takes single directory to a .json file as a parameter.
Total of 24 links downloaded! Example usage: **`--log D:\pics\LOG_FILES\FAILED.json`**
Press enter to quit ---
```
## **`--user`**
Takes a reddit username as a parameter. Example usage: **`--user spɛz`**
## FAQ ## **`--sort`**
Takes a valid sorting type as a parameter. Valid sort types are `hot`, `top`, `new`, `rising`, `controversial` and `relevance` (if you are using `--search` option)
Example usage: **`--sort top`**
## **`--time`**
Takes a valid time as a parameter. Valid times are `hour`, `day`, `week`, `month`, `year` and `all`. Example usage: **`--time all`**
## **`--limit`**
Takes a number to specify how many should program get. Upper bound is 1000 posts for **each** subreddit. For example, if you are downloading posts from pics and IAmA, the upper bound is 2000. Do not use the option to set it to highest bound possible.
Example usage: **`--limit 500`**
---
## **`--skip`**
Takes a number of domains as a parameter to skip the posts from those domains. Use self to imply text posts.
Example usage: **`--skip v.redd.it youtube.com youtu.be self`**
## **`--quit`**
Automatically quits the application after it finishes. Otherwise, it will wait for an input to quit.
Example usage: **`--quit`**
## **`--directory`**
Takes a directory which the posts should be downloaded to. Overrides the given default directory. Use `..\` to imply upper level and `.\` to imply the current level.
Example usage: **`--directory D:\bdfr\`**
Example usage: **`--directory ..\images\`**
Example usage: **`-d ..\images\`**
Example usage: **`-d .\`**
## **`--set-filename`**
Starts the program to set a filename template to use for downloading posts. **Does not take any parameter.**
When the programs starts, you will be prompted to type a filename template. Use `SUBREDDIT`, `REDDITOR`, `POSTID`, `TITLE`, `UPVOTES`, `FLAIR`, `DATE` in curly brakets `{ }` to refer to the corrosponding property of a post.
❗ Do NOT change the filename structure frequently. If you did, the program could not find duplicates and would download the already downloaded files again. This would not create any duplicates in the directory but the program would not be as snappy as it should be.
The default filename template is **`{REDDITOR}_{TITLE}_{POSTID}`**
Example usage: **`--set-filename`**
## **`--set-folderpath`**
Starts the program to set a folder structure to use for downloading posts. **Does not take any parameter.**
When the programs starts, you will be prompted to type a filename template. Use `SUBREDDIT`, `REDDITOR`, `POSTID`, `TITLE`, `UPVOTES`, `FLAIR`, `DATE` in curly brakets `{ }` to refer to the corrosponding property of a post. Do not put slashes `/` or backslashes `\` at either ends. For instance, **`{REDDITOR}/{SUBREDDIT}/{FLAIR}`**
The default filename template is **`{SUBREDDIT}`**
Example usage: **`--set-folderpath`**
## **`--set-default-directory`**
Starts the program to set a default directory to use in case no directory is given. **Does not take any parameter.**
When the programs starts, you will be prompted to type a default directory. You can use {time} in foler names to use to timestamp it. For instance, **`D:\bdfr\posts_{time}`**
Example usage: **`--set-default-directory`**
## **`--use-local-config`**
Sets the program to use config.json file in the current directory. Creates it if it does not exists. Useful for having different configurations. **Does not take any parameter.**
Example usage: **`--use-local-config`**
## **`--no-dupes`**
Skips the same posts in different subreddits. Does not take any parameter.
Example usage: **`--no-dupes`**
## **`--downloaded-posts`**
Takes a file directory as a parameter and skips the posts if it matches with the post IDs inside the file. It also saves the newly downloaded posts to the given file. Does not take any parameter.
Example usage: **`--downloaded-posts D:\bdfr\ALL_POSTS.txt`**
## ❔ FAQ
### I am running the script on a headless machine or on a remote server. How can I authenticate my reddit account? ### I am running the script on a headless machine or on a remote server. How can I authenticate my reddit account?
- Download the script on your everday computer and run it for once. - Download the script on your everday computer and run it for once.
@@ -128,21 +191,8 @@ Press enter to quit
### Getting posts takes too long. ### Getting posts takes too long.
- You can press *Ctrl+C* to interrupt it and start downloading. - You can press *Ctrl+C* to interrupt it and start downloading.
### How are the filenames formatted?
- **Self posts** and **images** that do not belong to an album and **album folders** are formatted as:
`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`
You can use *reddit id* to go to post's reddit page by going to link reddit.com/[REDDIT ID]
- An **image in an album** is formatted as:
`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`
Similarly, you can use *imgur id* to go to image's imgur page by going to link imgur.com/[IMGUR ID].
### How do I open self post files? ### How do I open self post files?
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings. - Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/). However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).
However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS.
## Changelog
* [See the changes on *master* here](docs/CHANGELOG.md)

View File

@@ -1,86 +0,0 @@
# Changes on *master*
## [23/02/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/4d385fda60028343be816eb7c4f7bc613a9d555d)
- Fixed v.redd.it links
## [27/01/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/b7baf07fb5998368d87e3c4c36aed40daf820609)
- Clarified the instructions
## [28/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
- Adjusted algorith used for extracting gfycat links because of gfycat's design change
- Ignore space at the end of the given directory
## [16/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
- Fix the bug that prevents downloading imgur videos
## [15/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/adccd8f3ba03ad124d58643d78dab287a4123a6f)
- Prints out the title of posts' that are already downloaded
## [13/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/50cb7c15b9cb4befce0cfa2c23ab5de4af9176c6)
- Added alternative location of current directory for config file
- Fixed console prints on Linux
## [10/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/8f1ff10a5e11464575284210dbba4a0d387bc1c3)
- Added reddit username to config file
## [06/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/210238d0865febcb57fbd9f0b0a7d3da9dbff384)
- Sending headers when requesting a file in order not to be rejected by server
## [04/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/426089d0f35212148caff0082708a87017757bde)
- Disabled printing post types to console
## [30/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/af294929510f884d92b25eaa855c29fc4fb6dcaa)
- Now opens web browser and goes to Imgur when prompts for Imgur credentials
## [26/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Improved verbose mode
- Minimalized the console output
- Added quit option for auto quitting the program after process finishes
## [25/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Added verbose mode
- Stylized the console output
## [24/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7a68ff3efac9939f9574c2cef6184b92edb135f4)
- Added OP's name to file names (backwards compatible)
- Deleted # char from file names (backwards compatible)
- Improved exception handling
## [23/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7314e17125aa78fd4e6b28e26fda7ec7db7e0147)
- Splited download() function
- Added erome support
- Removed exclude feature
- Bug fixes
## [22/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/6e7463005051026ad64006a8580b0b5dc9536b8c)
- Put log files in a folder named "LOG_FILES"
- Fixed the bug that makes multireddit mode unusable
## [21/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/4a8c2377f9fb4d60ed7eeb8d50aaf9a26492462a)
- Added exclude mode
## [20/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7548a010198fb693841ca03654d2c9bdf5742139)
- "0" input for no limit
- Fixed the bug that recognizes none image direct links as image links
## [19/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/41cbb58db34f500a8a5ecc3ac4375bf6c3b275bb)
- Added v.redd.it support
- Added custom exception descriptions to FAILED.json file
- Fixed the bug that prevents downloading some gfycat URLs
## [13/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/9f831e1b784a770c82252e909462871401a05c11)
- Changed config.json file's path to home directory
## [12/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/50a77f6ba54c24f5647d5ea4e177400b71ff04a7)
- Added binaries for Windows and Linux
- Wait on KeyboardInterrupt
- Accept multiple subreddit input
- Fixed the bug that prevents choosing "[0] exit" with typing "exit"
## [11/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/a28a7776ab826dea2a8d93873a94cd46db3a339b)
- Improvements on UX and UI
- Added logging errors to CONSOLE_LOG.txt
- Using current directory if directory has not been given yet.
## [10/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/ffe3839aee6dc1a552d95154d817aefc2b66af81)
- Added support for *self* post
- Now getting posts is quicker

View File

@@ -1,101 +0,0 @@
# Using command-line arguments
See **[compiling from source](INTERPRET_FROM_SOURCE.md)** page first unless you are using an executable file. If you are using an executable file, see [using terminal](INTERPRET_FROM_SOURCE.md#using-terminal) and come back.
***Use*** `.\bulk-downloader-for-reddit.exe` ***or*** `./bulk-downloader-for-reddit` ***if you are using the executable***.
```console
$ python script.py --help
usage: script.py [-h] [--directory DIRECTORY] [--NoDownload] [--verbose]
[--quit] [--link link] [--saved] [--submitted] [--upvoted]
[--log LOG FILE] [--subreddit SUBREDDIT [SUBREDDIT ...]]
[--multireddit MULTIREDDIT] [--user redditor]
[--search query] [--sort SORT TYPE] [--limit Limit]
[--time TIME_LIMIT]
This program downloads media from reddit posts
optional arguments:
-h, --help show this help message and exit
--directory DIRECTORY, -d DIRECTORY
Specifies the directory where posts will be downloaded
to
--NoDownload Just gets the posts and stores them in a file for
downloading later
--verbose, -v Verbose Mode
--quit, -q Auto quit afer the process finishes
--link link, -l link Get posts from link
--saved Triggers saved mode
--submitted Gets posts of --user
--upvoted Gets upvoted posts of --user
--log LOG FILE Takes a log file which created by itself (json files),
reads posts and tries downloading them again.
--subreddit SUBREDDIT [SUBREDDIT ...]
Triggers subreddit mode and takes subreddit's name
without r/. use "frontpage" for frontpage
--multireddit MULTIREDDIT
Triggers multireddit mode and takes multireddit's name
without m/
--user redditor reddit username if needed. use "me" for current user
--search query Searches for given query in given subreddits
--sort SORT TYPE Either hot, top, new, controversial, rising or
relevance default: hot
--limit Limit default: unlimited
--time TIME_LIMIT Either hour, day, week, month, year or all. default:
all
```
# Examples
- **Use `python3` instead of `python` if you are using *MacOS* or *Linux***
```console
python script.py
```
```console
.\bulk-downloader-for-reddit.exe
```
```console
python script.py
```
```console
.\bulk-downloader-for-reddit.exe -- directory .\\NEW_FOLDER --search cats --sort new --time all --subreddit gifs pics --NoDownload
```
```console
./bulk-downloader-for-reddit --directory .\\NEW_FOLDER\\ANOTHER_FOLDER --saved --limit 1000
```
```console
python script.py --directory .\\NEW_FOLDER --sort new --time all --limit 10 --link "https://www.reddit.com/r/gifs/search?q=dogs&restrict_sr=on&type=link&sort=new&t=month"
```
```console
python script.py --directory .\\NEW_FOLDER --link "https://www.reddit.com/r/learnprogramming/comments/7mjw12/"
```
```console
python script.py --directory .\\NEW_FOLDER --search cats --sort new --time all --subreddit gifs pics --NoDownload
```
```console
python script.py --directory .\\NEW_FOLDER --user [USER_NAME] --submitted --limit 10
```
```console
python script.py --directory .\\NEW_FOLDER --multireddit good_subs --user [USER_NAME] --sort top --time week --limit 250
```
```console
python script.py --directory .\\NEW_FOLDER\\ANOTHER_FOLDER --saved --limit 1000
```
```console
python script.py --directory C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json
```
# FAQ
## I can't startup the script no matter what.
See **[finding the correct keyword for Python](INTERPRET_FROM_SOURCE.md#finding-the-correct-keyword-for-python)**

View File

@@ -1,40 +1,35 @@
# Interpret from source code # Interpret from source code
## Requirements ## Requirements
### Python 3 Interpreter ### 🐍 Python 3 Interpreter
- This program is designed to work best on **Python 3.6.5** and this version of Python 3 is suggested. See if it is already installed, [here](#finding-the-correct-keyword-for-python). - Python 3 is required. See if it is already installed, [here](#finding-the-correct-keyword-for-python).
- If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory. - If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is **mandatory**.
## Using terminal ### 📃 Source Code
### To open it... [Download the repository](https://github.com/aliparlakci/bulk-downloader-for-reddit/archive/master.zip) and extract the zip into a folder.
- **on Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**
- **on Linux**: Right-click in a folder and select **Open Terminal** or press **Ctrl+Alt+T**. ## 💻 Using the command line
Open the [Command Promt](https://youtu.be/bgSSJQolR0E?t=18), [Powershell](https://youtu.be/bgSSJQolR0E?t=18) or [Terminal](https://youtu.be/Pz4yHAB3G8w?t=31) in the folder that contains the script.py file (click on the links to see how)
- **on MacOS**: Look for an app called **Terminal**. ### Finding the correct keyword for Python
Enter these lines to the terminal window until it prints out the a version starting with **`3.`**:
### Navigating to the directory where script is downloaded
Go inside the folder where script.py is located. If you are not familiar with changing directories on command-prompt and terminal read *Changing Directories* in [this article](https://lifehacker.com/5633909/who-needs-a-mouse-learn-to-use-the-command-line-for-almost-anything)
## Finding the correct keyword for Python
Enter these lines to terminal window until it prints out the version you have downloaded and installed:
- `python --version` - `python --version`
- `python3 --version` - `python3 --version`
- `python3.7 --version`
- `python3.6 --version`
- `py --version` - `py --version`
- `py -3 --version` - `py -3 --version`
- `py -3.6 --version`
- `py -3.7 --version`
Once it does, your keyword is without the `--version` part. Once it does, your keyword is without the `--version` part.
## Installing dependencies ## 📦 Installing dependencies
Enter the line below to terminal window when you are in the directory where script.py is, use your keyword for Python: Enter the line below to terminal window when you are in the directory where script.py is, use your keyword instead of `python`:
```console ```console
python -m pip install -r requirements.txt python -m pip install -r requirements.txt
``` ```
--- ## 🏃‍♂️ Running the code
Type below code into command line inside the program folder, use your keyword instead of `python`:
```console
python script.py
```
Now, you can go to [Using command-line arguments](COMMAND_LINE_ARGUMENTS.md) The program should guide you through. **However**, you can also use custom options. See [Options](../README.md#⚙-Options)

View File

@@ -2,3 +2,4 @@ bs4
requests requests
praw praw
imgurpython imgurpython
youtube-dl

643
script.py
View File

@@ -13,6 +13,7 @@ import time
import webbrowser import webbrowser
from io import StringIO from io import StringIO
from pathlib import Path, PurePath from pathlib import Path, PurePath
from prawcore.exceptions import InsufficientScope
from src.downloaders.Direct import Direct from src.downloaders.Direct import Direct
from src.downloaders.Erome import Erome from src.downloaders.Erome import Erome
@@ -20,427 +21,33 @@ from src.downloaders.Gfycat import Gfycat
from src.downloaders.Imgur import Imgur from src.downloaders.Imgur import Imgur
from src.downloaders.redgifs import Redgifs from src.downloaders.redgifs import Redgifs
from src.downloaders.selfPost import SelfPost from src.downloaders.selfPost import SelfPost
from src.downloaders.vreddit import VReddit
from src.downloaders.youtube import Youtube
from src.downloaders.gifDeliveryNetwork import GifDeliveryNetwork from src.downloaders.gifDeliveryNetwork import GifDeliveryNetwork
from src.errors import * from src.errors import ImgurLimitError, NoSuitablePost, FileAlreadyExistsError, ImgurLoginError, NotADownloadableLinkError, NoSuitablePost, InvalidJSONFile, FailedToDownload, DomainInSkip, full_exc_info
from src.parser import LinkDesigner from src.parser import LinkDesigner
from src.searcher import getPosts from src.searcher import getPosts
from src.utils import (GLOBAL, createLogFile, jsonFile, nameCorrector, from src.utils import (GLOBAL, createLogFile, nameCorrector,
printToFile) printToFile)
from src.jsonHelper import JsonFile
from src.config import Config
from src.arguments import Arguments
from src.programMode import ProgramMode
from src.reddit import Reddit
from src.store import Store
__author__ = "Ali Parlakci" __author__ = "Ali Parlakci"
__license__ = "GPL" __license__ = "GPL"
__version__ = "1.6.5" __version__ = "1.8.0"
__maintainer__ = "Ali Parlakci" __maintainer__ = "Ali Parlakci"
__email__ = "parlakciali@gmail.com" __email__ = "parlakciali@gmail.com"
def getConfig(configFileName):
"""Read credentials from config.json file"""
keys = ['imgur_client_id',
'imgur_client_secret']
if os.path.exists(configFileName):
FILE = jsonFile(configFileName)
content = FILE.read()
if "reddit_refresh_token" in content:
if content["reddit_refresh_token"] == "":
FILE.delete("reddit_refresh_token")
if not all(False if content.get(key,"") == "" else True for key in keys):
print(
"Go to this URL and fill the form: " \
"https://api.imgur.com/oauth2/addclient\n" \
"Enter the client id and client secret here:"
)
webbrowser.open("https://api.imgur.com/oauth2/addclient",new=2)
for key in keys:
try:
if content[key] == "":
raise KeyError
except KeyError:
FILE.add({key:input(" "+key+": ")})
return jsonFile(configFileName).read()
else:
FILE = jsonFile(configFileName)
configDictionary = {}
print(
"Go to this URL and fill the form: " \
"https://api.imgur.com/oauth2/addclient\n" \
"Enter the client id and client secret here:"
)
webbrowser.open("https://api.imgur.com/oauth2/addclient",new=2)
for key in keys:
configDictionary[key] = input(" "+key+": ")
FILE.add(configDictionary)
return FILE.read()
def parseArguments(arguments=[]):
"""Initialize argparse and add arguments"""
parser = argparse.ArgumentParser(allow_abbrev=False,
description="This program downloads " \
"media from reddit " \
"posts")
parser.add_argument("--directory","-d",
help="Specifies the directory where posts will be " \
"downloaded to",
metavar="DIRECTORY")
parser.add_argument("--NoDownload",
help="Just gets the posts and stores them in a file" \
" for downloading later",
action="store_true",
default=False)
parser.add_argument("--verbose","-v",
help="Verbose Mode",
action="store_true",
default=False)
parser.add_argument("--quit","-q",
help="Auto quit afer the process finishes",
action="store_true",
default=False)
parser.add_argument("--link","-l",
help="Get posts from link",
metavar="link")
parser.add_argument("--saved",
action="store_true",
help="Triggers saved mode")
parser.add_argument("--submitted",
action="store_true",
help="Gets posts of --user")
parser.add_argument("--upvoted",
action="store_true",
help="Gets upvoted posts of --user")
parser.add_argument("--log",
help="Takes a log file which created by itself " \
"(json files), reads posts and tries downloadin" \
"g them again.",
# type=argparse.FileType('r'),
metavar="LOG FILE")
parser.add_argument("--subreddit",
nargs="+",
help="Triggers subreddit mode and takes subreddit's " \
"name without r/. use \"frontpage\" for frontpage",
metavar="SUBREDDIT",
type=str)
parser.add_argument("--multireddit",
help="Triggers multireddit mode and takes "\
"multireddit's name without m/",
metavar="MULTIREDDIT",
type=str)
parser.add_argument("--user",
help="reddit username if needed. use \"me\" for " \
"current user",
required="--multireddit" in sys.argv or \
"--submitted" in sys.argv,
metavar="redditor",
type=str)
parser.add_argument("--search",
help="Searches for given query in given subreddits",
metavar="query",
type=str)
parser.add_argument("--sort",
help="Either hot, top, new, controversial, rising " \
"or relevance default: hot",
choices=[
"hot","top","new","controversial","rising",
"relevance"
],
metavar="SORT TYPE",
type=str)
parser.add_argument("--limit",
help="default: unlimited",
metavar="Limit",
type=int)
parser.add_argument("--time",
help="Either hour, day, week, month, year or all." \
" default: all",
choices=["all","hour","day","week","month","year"],
metavar="TIME_LIMIT",
type=str)
if arguments == []:
return parser.parse_args()
else:
return parser.parse_args(arguments)
def checkConflicts():
"""Check if command-line arguments are given correcly,
if not, raise errors
"""
if GLOBAL.arguments.user is None:
user = 0
else:
user = 1
search = 1 if GLOBAL.arguments.search else 0
modes = [
"saved","subreddit","submitted","log","link","upvoted","multireddit"
]
values = {
x: 0 if getattr(GLOBAL.arguments,x) is None or \
getattr(GLOBAL.arguments,x) is False \
else 1 \
for x in modes
}
if not sum(values[x] for x in values) == 1:
raise ProgramModeError("Invalid program mode")
if search+values["saved"] == 2:
raise SearchModeError("You cannot search in your saved posts")
if search+values["submitted"] == 2:
raise SearchModeError("You cannot search in submitted posts")
if search+values["upvoted"] == 2:
raise SearchModeError("You cannot search in upvoted posts")
if search+values["log"] == 2:
raise SearchModeError("You cannot search in log files")
if values["upvoted"]+values["submitted"] == 1 and user == 0:
raise RedditorNameError("No redditor name given")
class PromptUser:
@staticmethod
def chooseFrom(choices):
print()
choicesByIndex = list(str(x) for x in range(len(choices)+1))
for i in range(len(choices)):
print("{indent}[{order}] {mode}".format(
indent=" "*4,order=i+1,mode=choices[i]
))
print(" "*4+"[0] exit\n")
choice = input("> ")
while not choice.lower() in choices+choicesByIndex+["exit"]:
print("Invalid input\n")
programModeIndex = input("> ")
if choice == "0" or choice == "exit":
sys.exit()
elif choice in choicesByIndex:
return choices[int(choice)-1]
else:
return choice
def __init__(self):
print("select program mode:")
programModes = [
"search","subreddit","multireddit",
"submitted","upvoted","saved","log"
]
programMode = self.chooseFrom(programModes)
if programMode == "search":
GLOBAL.arguments.search = input("\nquery: ")
GLOBAL.arguments.subreddit = input("\nsubreddit: ")
print("\nselect sort type:")
sortTypes = [
"relevance","top","new"
]
sortType = self.chooseFrom(sortTypes)
GLOBAL.arguments.sort = sortType
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self.chooseFrom(timeFilters)
GLOBAL.arguments.time = timeFilter
if programMode == "subreddit":
subredditInput = input("(type frontpage for all subscribed subreddits,\n" \
" use plus to seperate multi subreddits:" \
" pics+funny+me_irl etc.)\n\n" \
"subreddit: ")
GLOBAL.arguments.subreddit = subredditInput
# while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
# subredditInput = input("subreddit: ")
# GLOBAL.arguments.subreddit += "+" + subredditInput
if " " in GLOBAL.arguments.subreddit:
GLOBAL.arguments.subreddit = "+".join(GLOBAL.arguments.subreddit.split())
# DELETE THE PLUS (+) AT THE END
if not subredditInput.lower() == "frontpage" \
and GLOBAL.arguments.subreddit[-1] == "+":
GLOBAL.arguments.subreddit = GLOBAL.arguments.subreddit[:-1]
print("\nselect sort type:")
sortTypes = [
"hot","top","new","rising","controversial"
]
sortType = self.chooseFrom(sortTypes)
GLOBAL.arguments.sort = sortType
if sortType in ["top","controversial"]:
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self.chooseFrom(timeFilters)
GLOBAL.arguments.time = timeFilter
else:
GLOBAL.arguments.time = "all"
elif programMode == "multireddit":
GLOBAL.arguments.user = input("\nmultireddit owner: ")
GLOBAL.arguments.multireddit = input("\nmultireddit: ")
print("\nselect sort type:")
sortTypes = [
"hot","top","new","rising","controversial"
]
sortType = self.chooseFrom(sortTypes)
GLOBAL.arguments.sort = sortType
if sortType in ["top","controversial"]:
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self.chooseFrom(timeFilters)
GLOBAL.arguments.time = timeFilter
else:
GLOBAL.arguments.time = "all"
elif programMode == "submitted":
GLOBAL.arguments.submitted = True
GLOBAL.arguments.user = input("\nredditor: ")
print("\nselect sort type:")
sortTypes = [
"hot","top","new","controversial"
]
sortType = self.chooseFrom(sortTypes)
GLOBAL.arguments.sort = sortType
if sortType == "top":
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self.chooseFrom(timeFilters)
GLOBAL.arguments.time = timeFilter
else:
GLOBAL.arguments.time = "all"
elif programMode == "upvoted":
GLOBAL.arguments.upvoted = True
GLOBAL.arguments.user = input("\nredditor: ")
elif programMode == "saved":
GLOBAL.arguments.saved = True
elif programMode == "log":
while True:
GLOBAL.arguments.log = input("\nlog file directory:")
if Path(GLOBAL.arguments.log ).is_file():
break
while True:
try:
GLOBAL.arguments.limit = int(input("\nlimit (0 for none): "))
if GLOBAL.arguments.limit == 0:
GLOBAL.arguments.limit = None
break
except ValueError:
pass
def prepareAttributes():
ATTRIBUTES = {}
if GLOBAL.arguments.user is not None:
ATTRIBUTES["user"] = GLOBAL.arguments.user
if GLOBAL.arguments.search is not None:
ATTRIBUTES["search"] = GLOBAL.arguments.search
if GLOBAL.arguments.sort == "hot" or \
GLOBAL.arguments.sort == "controversial" or \
GLOBAL.arguments.sort == "rising":
GLOBAL.arguments.sort = "relevance"
if GLOBAL.arguments.sort is not None:
ATTRIBUTES["sort"] = GLOBAL.arguments.sort
else:
if GLOBAL.arguments.submitted:
ATTRIBUTES["sort"] = "new"
else:
ATTRIBUTES["sort"] = "hot"
if GLOBAL.arguments.time is not None:
ATTRIBUTES["time"] = GLOBAL.arguments.time
else:
ATTRIBUTES["time"] = "all"
if GLOBAL.arguments.link is not None:
GLOBAL.arguments.link = GLOBAL.arguments.link.strip("\"")
ATTRIBUTES = LinkDesigner(GLOBAL.arguments.link)
if GLOBAL.arguments.search is not None:
ATTRIBUTES["search"] = GLOBAL.arguments.search
if GLOBAL.arguments.sort is not None:
ATTRIBUTES["sort"] = GLOBAL.arguments.sort
if GLOBAL.arguments.time is not None:
ATTRIBUTES["time"] = GLOBAL.arguments.time
elif GLOBAL.arguments.subreddit is not None:
if type(GLOBAL.arguments.subreddit) == list:
GLOBAL.arguments.subreddit = "+".join(GLOBAL.arguments.subreddit)
ATTRIBUTES["subreddit"] = GLOBAL.arguments.subreddit
elif GLOBAL.arguments.multireddit is not None:
ATTRIBUTES["multireddit"] = GLOBAL.arguments.multireddit
elif GLOBAL.arguments.saved is True:
ATTRIBUTES["saved"] = True
elif GLOBAL.arguments.upvoted is True:
ATTRIBUTES["upvoted"] = True
elif GLOBAL.arguments.submitted is not None:
ATTRIBUTES["submitted"] = True
if GLOBAL.arguments.sort == "rising":
raise InvalidSortingType("Invalid sorting type has given")
ATTRIBUTES["limit"] = GLOBAL.arguments.limit
return ATTRIBUTES
def postFromLog(fileName): def postFromLog(fileName):
"""Analyze a log file and return a list of dictionaries containing """Analyze a log file and return a list of dictionaries containing
submissions submissions
""" """
if Path.is_file(Path(fileName)): if Path.is_file(Path(fileName)):
content = jsonFile(fileName).read() content = JsonFile(fileName).read()
else: else:
print("File not found") print("File not found")
sys.exit() sys.exit()
@@ -453,63 +60,44 @@ def postFromLog(fileName):
posts = [] posts = []
for post in content: for post in content:
if not content[post][-1]['postType'] == None: if not content[post][-1]['TYPE'] == None:
posts.append(content[post][-1]) posts.append(content[post][-1])
return posts return posts
def isPostExists(POST): def isPostExists(POST,directory):
"""Figure out a file's name and checks if the file already exists""" """Figure out a file's name and checks if the file already exists"""
title = nameCorrector(POST['postTitle']) filename = GLOBAL.config['filename'].format(**POST)
PATH = GLOBAL.directory / POST["postSubreddit"]
possibleExtensions = [".jpg",".png",".mp4",".gif",".webm",".md"] possibleExtensions = [".jpg",".png",".mp4",".gif",".webm",".md",".mkv",".flv"]
"""If you change the filenames, don't forget to add them here.
Please don't remove existing ones
"""
for extension in possibleExtensions: for extension in possibleExtensions:
OLD_FILE_PATH = PATH / ( path = directory / Path(filename+extension)
title
+ "_" + POST['postId']
+ extension
)
FILE_PATH = PATH / (
POST["postSubmitter"]
+ "_" + title
+ "_" + POST['postId']
+ extension
)
SHORT_FILE_PATH = PATH / (POST['postId']+extension)
if OLD_FILE_PATH.exists() or \
FILE_PATH.exists() or \
SHORT_FILE_PATH.exists():
if path.exists():
return True return True
else: else:
return False return False
def downloadPost(SUBMISSION): def downloadPost(SUBMISSION,directory):
"""Download directory is declared here for each file"""
directory = GLOBAL.directory / SUBMISSION['postSubreddit']
global lastRequestTime global lastRequestTime
lastRequestTime = 0
downloaders = { downloaders = {
"imgur":Imgur,"gfycat":Gfycat,"erome":Erome,"direct":Direct,"self":SelfPost, "imgur":Imgur,"gfycat":Gfycat,"erome":Erome,"direct":Direct,"self":SelfPost,
"redgifs":Redgifs, "gifdeliverynetwork": GifDeliveryNetwork "redgifs":Redgifs, "gifdeliverynetwork": GifDeliveryNetwork,
"v.redd.it": VReddit, "youtube": Youtube
} }
print() print()
if SUBMISSION['postType'] in downloaders: if SUBMISSION['TYPE'] in downloaders:
if SUBMISSION['postType'] == "imgur": # WORKAROUND FOR IMGUR API LIMIT
if SUBMISSION['TYPE'] == "imgur":
while int(time.time() - lastRequestTime) <= 2: while int(time.time() - lastRequestTime) <= 2:
pass pass
@@ -554,7 +142,7 @@ def downloadPost(SUBMISSION):
raise ImgurLimitError('{} LIMIT EXCEEDED\n'.format(KEYWORD.upper())) raise ImgurLimitError('{} LIMIT EXCEEDED\n'.format(KEYWORD.upper()))
downloaders[SUBMISSION['postType']] (directory,SUBMISSION) downloaders[SUBMISSION['TYPE']] (directory,SUBMISSION)
else: else:
raise NoSuitablePost raise NoSuitablePost
@@ -566,35 +154,61 @@ def download(submissions):
to download each one, catch errors, update the log files to download each one, catch errors, update the log files
""" """
subsLenght = len(submissions)
global lastRequestTime global lastRequestTime
lastRequestTime = 0 lastRequestTime = 0
downloadedCount = subsLenght downloadedCount = 0
duplicates = 0 duplicates = 0
FAILED_FILE = createLogFile("FAILED") FAILED_FILE = createLogFile("FAILED")
for i in range(subsLenght): if GLOBAL.arguments.unsave:
print(f"\n({i+1}/{subsLenght}) {submissions[i]['postId']} r/{submissions[i]['postSubreddit']}", reddit = Reddit(GLOBAL.config['credentials']['reddit']).begin()
end="")
print(f" {submissions[i]['postType'].upper()}",end="",noPrint=True)
if isPostExists(submissions[i]): submissions = list(filter(lambda x: x['POSTID'] not in GLOBAL.downloadedPosts(), submissions))
print(f"\n" \ subsLenght = len(submissions)
f"{submissions[i]['postSubmitter']}_"
f"{nameCorrector(submissions[i]['postTitle'])}") for i in range(len(submissions)):
print(f"\n({i+1}/{subsLenght})",end="")
print(submissions[i]['POSTID'],
f"r/{submissions[i]['SUBREDDIT']}",
f"u/{submissions[i]['REDDITOR']}",
submissions[i]['FLAIR'] if submissions[i]['FLAIR'] else "",
sep="",
end="")
print(f" {submissions[i]['TYPE'].upper()}",end="",noPrint=True)
details = {**submissions[i], **{"TITLE": nameCorrector(submissions[i]['TITLE'])}}
directory = GLOBAL.directory / GLOBAL.config["folderpath"].format(**details)
if isPostExists(details,directory):
print()
print(directory)
print(GLOBAL.config['filename'].format(**details))
print("It already exists") print("It already exists")
duplicates += 1 duplicates += 1
downloadedCount -= 1 continue
if any(domain in submissions[i]['CONTENTURL'] for domain in GLOBAL.arguments.skip):
print()
print(submissions[i]['CONTENTURL'])
print("Domain found in skip domains, skipping post...")
continue continue
try: try:
downloadPost(submissions[i]) downloadPost(details,directory)
GLOBAL.downloadedPosts.add(details['POSTID'])
try:
if GLOBAL.arguments.unsave:
reddit.submission(id=details['POSTID']).unsave()
except InsufficientScope:
reddit = Reddit().begin()
reddit.submission(id=details['POSTID']).unsave()
downloadedCount += 1
except FileAlreadyExistsError: except FileAlreadyExistsError:
print("It already exists") print("It already exists")
duplicates += 1 duplicates += 1
downloadedCount -= 1
except ImgurLoginError: except ImgurLoginError:
print( print(
@@ -608,13 +222,12 @@ def download(submissions):
"{class_name}: {info}".format( "{class_name}: {info}".format(
class_name=exception.__class__.__name__,info=str(exception) class_name=exception.__class__.__name__,info=str(exception)
), ),
submissions[i] details
]}) ]})
downloadedCount -= 1
except NotADownloadableLinkError as exception: except NotADownloadableLinkError as exception:
print( print(
"{class_name}: {info}".format( "{class_name}: {info} See CONSOLE_LOG.txt for more information".format(
class_name=exception.__class__.__name__,info=str(exception) class_name=exception.__class__.__name__,info=str(exception)
) )
) )
@@ -624,60 +237,55 @@ def download(submissions):
), ),
submissions[i] submissions[i]
]}) ]})
downloadedCount -= 1
except DomainInSkip:
print()
print(submissions[i]['CONTENTURL'])
print("Domain found in skip domains, skipping post...")
except NoSuitablePost: except NoSuitablePost:
print("No match found, skipping...") print("No match found, skipping...")
downloadedCount -= 1
except Exception as exception: except FailedToDownload:
# raise exception print("Failed to download the posts, skipping...")
except Exception as exc:
print( print(
"{class_name}: {info}".format( "{class_name}: {info} See CONSOLE_LOG.txt for more information".format(
class_name=exception.__class__.__name__,info=str(exception) class_name=exc.__class__.__name__,info=str(exc)
) )
) )
logging.error(sys.exc_info()[0].__name__,
exc_info=full_exc_info(sys.exc_info()))
print(log_stream.getvalue(),noPrint=True)
FAILED_FILE.add({int(i+1):[ FAILED_FILE.add({int(i+1):[
"{class_name}: {info}".format( "{class_name}: {info}".format(
class_name=exception.__class__.__name__,info=str(exception) class_name=exc.__class__.__name__,info=str(exc)
), ),
submissions[i] submissions[i]
]}) ]})
downloadedCount -= 1
if duplicates: if duplicates:
print(f"\nThere {'were' if duplicates > 1 else 'was'} " \ print(f"\nThere {'were' if duplicates > 1 else 'was'} " \
f"{duplicates} duplicate{'s' if duplicates > 1 else ''}") f"{duplicates} duplicate{'s' if duplicates > 1 else ''}")
if downloadedCount == 0: if downloadedCount == 0:
print("Nothing downloaded :(") print("Nothing is downloaded :(")
else: else:
print(f"Total of {downloadedCount} " \ print(f"Total of {downloadedCount} " \
f"link{'s' if downloadedCount > 1 else ''} downloaded!") f"link{'s' if downloadedCount > 1 else ''} downloaded!")
def main(): def printLogo():
VanillaPrint( VanillaPrint(
f"\nBulk Downloader for Reddit v{__version__}\n" \ f"\nBulk Downloader for Reddit v{__version__}\n" \
f"Written by Ali PARLAKCI parlakciali@gmail.com\n\n" \ f"Written by Ali PARLAKCI parlakciali@gmail.com\n\n" \
f"https://github.com/aliparlakci/bulk-downloader-for-reddit/" f"https://github.com/aliparlakci/bulk-downloader-for-reddit/\n"
)
GLOBAL.arguments = parseArguments()
if GLOBAL.arguments.directory is not None:
GLOBAL.directory = Path(GLOBAL.arguments.directory.strip())
else:
GLOBAL.directory = Path(input("\ndownload directory: ").strip())
print("\n"," ".join(sys.argv),"\n",noPrint=True)
print(f"Bulk Downloader for Reddit v{__version__}\n",noPrint=True
) )
try: def main():
checkConflicts()
except ProgramModeError as err:
PromptUser()
if not Path(GLOBAL.defaultConfigDirectory).is_dir(): if not Path(GLOBAL.defaultConfigDirectory).is_dir():
os.makedirs(GLOBAL.defaultConfigDirectory) os.makedirs(GLOBAL.defaultConfigDirectory)
@@ -686,16 +294,64 @@ def main():
GLOBAL.configDirectory = Path("config.json") GLOBAL.configDirectory = Path("config.json")
else: else:
GLOBAL.configDirectory = GLOBAL.defaultConfigDirectory / "config.json" GLOBAL.configDirectory = GLOBAL.defaultConfigDirectory / "config.json"
try:
GLOBAL.config = Config(GLOBAL.configDirectory).generate()
except InvalidJSONFile as exception:
VanillaPrint(str(exception.__class__.__name__),">>",str(exception))
VanillaPrint("Resolve it or remove it to proceed")
input("\nPress enter to quit")
sys.exit()
GLOBAL.config = getConfig(GLOBAL.configDirectory) sys.argv = sys.argv + GLOBAL.config["options"].split()
if GLOBAL.arguments.log is not None: arguments = Arguments.parse()
logDir = Path(GLOBAL.arguments.log) GLOBAL.arguments = arguments
if arguments.set_filename:
Config(GLOBAL.configDirectory).setCustomFileName()
sys.exit()
if arguments.set_folderpath:
Config(GLOBAL.configDirectory).setCustomFolderPath()
sys.exit()
if arguments.set_default_directory:
Config(GLOBAL.configDirectory).setDefaultDirectory()
sys.exit()
if arguments.set_default_options:
Config(GLOBAL.configDirectory).setDefaultOptions()
sys.exit()
if arguments.use_local_config:
JsonFile(".\\config.json").add(GLOBAL.config)
sys.exit()
if arguments.directory:
GLOBAL.directory = Path(arguments.directory.strip())
elif "default_directory" in GLOBAL.config and GLOBAL.config["default_directory"] != "":
GLOBAL.directory = Path(GLOBAL.config["default_directory"].format(time=GLOBAL.RUN_TIME))
else:
GLOBAL.directory = Path(input("\ndownload directory: ").strip())
if arguments.downloaded_posts:
GLOBAL.downloadedPosts = Store(arguments.downloaded_posts)
else:
GLOBAL.downloadedPosts = Store()
printLogo()
print("\n"," ".join(sys.argv),"\n",noPrint=True)
if arguments.log is not None:
logDir = Path(arguments.log)
download(postFromLog(logDir)) download(postFromLog(logDir))
sys.exit() sys.exit()
programMode = ProgramMode(arguments).generate()
try: try:
POSTS = getPosts(prepareAttributes()) posts = getPosts(programMode)
except Exception as exc: except Exception as exc:
logging.error(sys.exc_info()[0].__name__, logging.error(sys.exc_info()[0].__name__,
exc_info=full_exc_info(sys.exc_info())) exc_info=full_exc_info(sys.exc_info()))
@@ -703,15 +359,11 @@ def main():
print(exc) print(exc)
sys.exit() sys.exit()
if POSTS is None: if posts is None:
print("I could not find any posts in that URL") print("I could not find any posts in that URL")
sys.exit() sys.exit()
if GLOBAL.arguments.NoDownload: download(posts)
sys.exit()
else:
download(POSTS)
if __name__ == "__main__": if __name__ == "__main__":
@@ -721,16 +373,19 @@ if __name__ == "__main__":
try: try:
VanillaPrint = print VanillaPrint = print
print = printToFile print = printToFile
GLOBAL.RUN_TIME = time.time() GLOBAL.RUN_TIME = str(time.strftime(
"%d-%m-%Y_%H-%M-%S",
time.localtime(time.time())
))
main() main()
except KeyboardInterrupt: except KeyboardInterrupt:
if GLOBAL.directory is None: if GLOBAL.directory is None:
GLOBAL.directory = Path(".\\") GLOBAL.directory = Path("..\\")
except Exception as exception: except Exception as exception:
if GLOBAL.directory is None: if GLOBAL.directory is None:
GLOBAL.directory = Path(".\\") GLOBAL.directory = Path("..\\")
logging.error(sys.exc_info()[0].__name__, logging.error(sys.exc_info()[0].__name__,
exc_info=full_exc_info(sys.exc_info())) exc_info=full_exc_info(sys.exc_info()))
print(log_stream.getvalue()) print(log_stream.getvalue())

View File

@@ -8,7 +8,7 @@ from script import __version__
options = { options = {
"build_exe": { "build_exe": {
"packages":[ "packages":[
"idna","imgurpython", "praw", "requests" "idna","imgurpython", "praw", "requests", "multiprocessing"
] ]
} }
} }

148
src/arguments.py Normal file
View File

@@ -0,0 +1,148 @@
import argparse
import sys
class Arguments:
@staticmethod
def parse(arguments=[]):
"""Initialize argparse and add arguments"""
parser = argparse.ArgumentParser(allow_abbrev=False,
description="This program downloads " \
"media from reddit " \
"posts")
parser.add_argument("--directory","-d",
help="Specifies the directory where posts will be " \
"downloaded to",
metavar="DIRECTORY")
parser.add_argument("--verbose","-v",
help="Verbose Mode",
action="store_true",
default=False)
parser.add_argument("--quit","-q",
help="Auto quit afer the process finishes",
action="store_true",
default=False)
parser.add_argument("--link","-l",
help="Get posts from link",
metavar="link")
parser.add_argument("--saved",
action="store_true",
required="--unsave" in sys.argv,
help="Triggers saved mode")
parser.add_argument("--unsave",
action="store_true",
help="Unsaves downloaded posts")
parser.add_argument("--submitted",
action="store_true",
help="Gets posts of --user")
parser.add_argument("--upvoted",
action="store_true",
help="Gets upvoted posts of --user")
parser.add_argument("--log",
help="Takes a log file which created by itself " \
"(json files), reads posts and tries downloadin" \
"g them again.",
# type=argparse.FileType('r'),
metavar="LOG FILE")
parser.add_argument("--subreddit",
nargs="+",
help="Triggers subreddit mode and takes subreddit's " \
"name without r/. use \"frontpage\" for frontpage",
metavar="SUBREDDIT",
type=str)
parser.add_argument("--multireddit",
help="Triggers multireddit mode and takes "\
"multireddit's name without m/",
metavar="MULTIREDDIT",
type=str)
parser.add_argument("--user",
help="reddit username if needed. use \"me\" for " \
"current user",
required="--multireddit" in sys.argv or \
"--submitted" in sys.argv,
metavar="redditor",
type=str)
parser.add_argument("--search",
help="Searches for given query in given subreddits",
metavar="query",
type=str)
parser.add_argument("--sort",
help="Either hot, top, new, controversial, rising " \
"or relevance default: hot",
choices=[
"hot","top","new","controversial","rising",
"relevance"
],
metavar="SORT TYPE",
type=str)
parser.add_argument("--limit",
help="default: unlimited",
metavar="Limit",
type=int)
parser.add_argument("--time",
help="Either hour, day, week, month, year or all." \
" default: all",
choices=["all","hour","day","week","month","year"],
metavar="TIME_LIMIT",
type=str)
parser.add_argument("--skip",
nargs="+",
help="Skip posts with given domain",
type=str,
default=[])
parser.add_argument("--set-folderpath",
action="store_true",
help="Set custom folderpath"
)
parser.add_argument("--set-filename",
action="store_true",
help="Set custom filename",
)
parser.add_argument("--set-default-directory",
action="store_true",
help="Set a default directory to be used in case no directory is given",
)
parser.add_argument("--set-default-options",
action="store_true",
help="Set default options to use everytime program runs",
)
parser.add_argument("--use-local-config",
action="store_true",
help="Creates a config file in the program's directory and uses it. Useful for having multiple configs",
)
parser.add_argument("--no-dupes",
action="store_true",
help="Do not download duplicate posts on different subreddits",
)
parser.add_argument("--downloaded-posts",
help="Use a hash file to keep track of downloaded files",
type=str
)
if arguments == []:
return parser.parse_args()
else:
return parser.parse_args(arguments)

151
src/config.py Normal file
View File

@@ -0,0 +1,151 @@
import os
import socket
import webbrowser
import random
from src.reddit import Reddit
from src.jsonHelper import JsonFile
class Config():
def __init__(self,filename):
self.filename = filename
self.file = JsonFile(self.filename)
def generate(self):
self._validateCredentials()
self._readCustomFileName()
self._readCustomFolderPath()
self._readDefaultOptions()
return self.file.read()
def setCustomFileName(self):
print("""
IMPORTANT: Do not change the filename structure frequently.
If you did, the program could not find duplicates and
would download the already downloaded files again.
This would not create any duplicates in the directory but
the program would not be as snappy as it should be.
Type a template file name for each post.
You can use SUBREDDIT, REDDITOR, POSTID, TITLE, UPVOTES, FLAIR, DATE in curly braces
The text in curly braces will be replaced with the corresponding property of an each post
For example: {FLAIR}_{SUBREDDIT}_{REDDITOR}
Existing filename template:""", None if "filename" not in self.file.read() else self.file.read()["filename"])
filename = input(">> ").upper()
self.file.add({
"filename": filename
})
def _readCustomFileName(self):
content = self.file.read()
if not "filename" in content:
self.file.add({
"filename": "{REDDITOR}_{TITLE}_{POSTID}"
})
content = self.file.read()
if not "{POSTID}" in content["filename"]:
self.file.add({
"filename": content["filename"] + "_{POSTID}"
})
def setCustomFolderPath(self):
print("""
Type a folder structure (generic folder path)
Use slash or DOUBLE backslash to separate folders
You can use SUBREDDIT, REDDITOR, POSTID, TITLE, UPVOTES, FLAIR, DATE in curly braces
The text in curly braces will be replaced with the corresponding property of an each post
For example: {REDDITOR}/{SUBREDDIT}/{FLAIR}
Existing folder structure""", None if "folderpath" not in self.file.read() else self.file.read()["folderpath"])
folderpath = input(">> ").strip("\\").strip("/").upper()
self.file.add({
"folderpath": folderpath
})
def _readCustomFolderPath(self,path=None):
content = self.file.read()
if not "folderpath" in content:
self.file.add({
"folderpath": "{SUBREDDIT}"
})
def setDefaultOptions(self):
print("""
Type options to be used everytime script runs
For example: --no-dupes --quit --limit 100 --skip youtube.com
Existing default options:""", None if "options" not in self.file.read() else self.file.read()["options"])
options = input(">> ").strip("")
self.file.add({
"options": options
})
def _readDefaultOptions(self,path=None):
content = self.file.read()
if not "options" in content:
self.file.add({
"options": ""
})
def _validateCredentials(self):
"""Read credentials from config.json file"""
keys = ['imgur_client_id',
'imgur_client_secret']
try:
content = self.file.read()["credentials"]
except:
self.file.add({
"credentials":{}
})
content = self.file.read()["credentials"]
if "reddit" in content and len(content["reddit"]) != 0:
pass
else:
Reddit().begin()
if not all(content.get(key,False) for key in keys):
print(
"---Setting up the Imgur API---\n\n" \
"Go to this URL and fill the form:\n" \
"https://api.imgur.com/oauth2/addclient\n" \
"Then, enter the client id and client secret here\n" \
"Press Enter to open the link in the browser"
)
input()
webbrowser.open("https://api.imgur.com/oauth2/addclient",new=2)
for key in keys:
try:
if content[key] == "":
raise KeyError
except KeyError:
self.file.add({key:input("\t"+key+": ")},
"credentials")
print()
def setDefaultDirectory(self):
print("""Set a default directory to use in case no directory is given
Leave blank to reset it. You can use {time} in foler names to use to timestamp it
For example: D:/archive/BDFR_{time}
""")
print("Current default directory:", self.file.read()["default_directory"] if "default_directory" in self.file.read() else "")
self.file.add({
"default_directory": input(">> ")
})

View File

@@ -3,30 +3,16 @@ import os
from src.downloaders.downloaderUtils import getFile, getExtension from src.downloaders.downloaderUtils import getFile, getExtension
from src.errors import FileNameTooLong from src.errors import FileNameTooLong
from src.utils import nameCorrector from src.utils import GLOBAL
from src.utils import printToFile as print from src.utils import printToFile as print
class Direct: class Direct:
def __init__(self,directory,POST): def __init__(self,directory,POST):
POST['postExt'] = getExtension(POST['postURL']) POST['EXTENSION'] = getExtension(POST['CONTENTURL'])
if not os.path.exists(directory): os.makedirs(directory) if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(POST['postTitle'])
"""Filenames are declared here""" filename = GLOBAL.config['filename'].format(**POST)+POST["EXTENSION"]
shortFilename = POST['POSTID']+POST['EXTENSION']
print(POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']) getFile(filename,shortFilename,directory,POST['CONTENTURL'])
fileDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']
)
tempDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+".tmp"
)
try:
getFile(fileDir,tempDir,POST['postURL'])
except FileNameTooLong:
fileDir = directory / (POST['postId']+POST['postExt'])
tempDir = directory / (POST['postId']+".tmp")
getFile(fileDir,tempDir,POST['postURL'])

View File

@@ -1,4 +1,6 @@
import os import os
import logging
import sys
import urllib.request import urllib.request
from html.parser import HTMLParser from html.parser import HTMLParser
@@ -6,14 +8,14 @@ from src.downloaders.downloaderUtils import getFile
from src.downloaders.downloaderUtils import getExtension from src.downloaders.downloaderUtils import getExtension
from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely, from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely,
NotADownloadableLinkError, FileAlreadyExistsError) NotADownloadableLinkError, FileAlreadyExistsError, full_exc_info)
from src.utils import nameCorrector from src.utils import GLOBAL
from src.utils import printToFile as print from src.utils import printToFile as print
class Erome: class Erome:
def __init__(self,directory,post): def __init__(self,directory,post):
try: try:
IMAGES = self.getLinks(post['postURL']) IMAGES = self.getLinks(post['CONTENTURL'])
except urllib.error.HTTPError: except urllib.error.HTTPError:
raise NotADownloadableLinkError("Not a downloadable link") raise NotADownloadableLinkError("Not a downloadable link")
@@ -27,59 +29,43 @@ class Erome:
"""Filenames are declared here""" """Filenames are declared here"""
title = nameCorrector(post['postTitle']) filename = GLOBAL.config['filename'].format(**post)+post["EXTENSION"]
print(post["postSubmitter"]+"_"+title+"_"+post['postId']+extension) shortFilename = post['POSTID'] + extension
fileDir = directory / (
post["postSubmitter"]+"_"+title+"_"+post['postId']+extension
)
tempDir = directory / (
post["postSubmitter"]+"_"+title+"_"+post['postId']+".tmp"
)
imageURL = IMAGES[0] imageURL = IMAGES[0]
if 'https://' not in imageURL and 'http://' not in imageURL: if 'https://' not in imageURL or 'http://' not in imageURL:
imageURL = "https://" + imageURL imageURL = "https://" + imageURL
try: getFile(filename,shortFilename,directory,imageURL)
getFile(fileDir,tempDir,imageURL)
except FileNameTooLong:
fileDir = directory / (post['postId'] + extension)
tempDir = directory / (post['postId'] + '.tmp')
getFile(fileDir,tempDir,imageURL)
else: else:
title = nameCorrector(post['postTitle']) filename = GLOBAL.config['filename'].format(**post)
print(post["postSubmitter"]+"_"+title+"_"+post['postId'],end="\n\n")
folderDir = directory / ( print(filename)
post["postSubmitter"] + "_" + title + "_" + post['postId']
) folderDir = directory / filename
try: try:
if not os.path.exists(folderDir): if not os.path.exists(folderDir):
os.makedirs(folderDir) os.makedirs(folderDir)
except FileNotFoundError: except FileNotFoundError:
folderDir = directory / post['postId'] folderDir = directory / post['POSTID']
os.makedirs(folderDir) os.makedirs(folderDir)
for i in range(imagesLenght): for i in range(imagesLenght):
extension = getExtension(IMAGES[i]) extension = getExtension(IMAGES[i])
fileName = str(i+1) filename = str(i+1)+extension
imageURL = IMAGES[i] imageURL = IMAGES[i]
if 'https://' not in imageURL and 'http://' not in imageURL: if 'https://' not in imageURL and 'http://' not in imageURL:
imageURL = "https://" + imageURL imageURL = "https://" + imageURL
fileDir = folderDir / (fileName + extension)
tempDir = folderDir / (fileName + ".tmp")
print(" ({}/{})".format(i+1,imagesLenght)) print(" ({}/{})".format(i+1,imagesLenght))
print(" {}".format(fileName+extension)) print(" {}".format(filename))
try: try:
getFile(fileDir,tempDir,imageURL,indent=2) getFile(filename,filename,folderDir,imageURL,indent=2)
print() print()
except FileAlreadyExistsError: except FileAlreadyExistsError:
print(" The file already exists" + " "*10,end="\n\n") print(" The file already exists" + " "*10,end="\n\n")

View File

@@ -6,40 +6,25 @@ from bs4 import BeautifulSoup
from src.downloaders.downloaderUtils import getFile, getExtension from src.downloaders.downloaderUtils import getFile, getExtension
from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely, from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely,
NotADownloadableLinkError, FileAlreadyExistsError) NotADownloadableLinkError, FileAlreadyExistsError)
from src.utils import nameCorrector from src.utils import GLOBAL
from src.utils import printToFile as print from src.utils import printToFile as print
from src.downloaders.gifDeliveryNetwork import GifDeliveryNetwork from src.downloaders.gifDeliveryNetwork import GifDeliveryNetwork
class Gfycat: class Gfycat:
def __init__(self,directory,POST): def __init__(self,directory,POST):
try: try:
POST['mediaURL'] = self.getLink(POST['postURL']) POST['MEDIAURL'] = self.getLink(POST['CONTENTURL'])
except IndexError: except IndexError:
raise NotADownloadableLinkError("Could not read the page source") raise NotADownloadableLinkError("Could not read the page source")
POST['postExt'] = getExtension(POST['mediaURL']) POST['EXTENSION'] = getExtension(POST['MEDIAURL'])
if not os.path.exists(directory): os.makedirs(directory) if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(POST['postTitle'])
"""Filenames are declared here""" filename = GLOBAL.config['filename'].format(**POST)+POST["EXTENSION"]
shortFilename = POST['POSTID']+POST['EXTENSION']
print(POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']) getFile(filename,shortFilename,directory,POST['MEDIAURL'])
fileDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']
)
tempDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+".tmp"
)
try:
getFile(fileDir,tempDir,POST['mediaURL'])
except FileNameTooLong:
fileDir = directory / (POST['postId']+POST['postExt'])
tempDir = directory / (POST['postId']+".tmp")
getFile(fileDir,tempDir,POST['mediaURL'])
@staticmethod @staticmethod
def getLink(url): def getLink(url):

View File

@@ -13,7 +13,7 @@ class Imgur:
def __init__(self,directory,post): def __init__(self,directory,post):
self.imgurClient = self.initImgur() self.imgurClient = self.initImgur()
imgurID = self.getId(post['postURL']) imgurID = self.getId(post['CONTENTURL'])
content = self.getLink(imgurID) content = self.getLink(imgurID)
if not os.path.exists(directory): os.makedirs(directory) if not os.path.exists(directory): os.makedirs(directory)
@@ -21,38 +21,16 @@ class Imgur:
if content['type'] == 'image': if content['type'] == 'image':
try: try:
post['mediaURL'] = content['object'].mp4 post['MEDIAURL'] = content['object'].mp4
except AttributeError: except AttributeError:
post['mediaURL'] = content['object'].link post['MEDIAURL'] = content['object'].link
post['postExt'] = getExtension(post['mediaURL']) post['EXTENSION'] = getExtension(post['MEDIAURL'])
title = nameCorrector(post['postTitle']) filename = GLOBAL.config['filename'].format(**post)+post["EXTENSION"]
shortFilename = post['POSTID']+post['EXTENSION']
"""Filenames are declared here""" getFile(filename,shortFilename,directory,post['MEDIAURL'])
print(post["postSubmitter"]+"_"+title+"_"+post['postId']+post['postExt'])
fileDir = directory / (
post["postSubmitter"]
+ "_" + title
+ "_" + post['postId']
+ post['postExt']
)
tempDir = directory / (
post["postSubmitter"]
+ "_" + title
+ "_" + post['postId']
+ ".tmp"
)
try:
getFile(fileDir,tempDir,post['mediaURL'])
except FileNameTooLong:
fileDir = directory / post['postId'] + post['postExt']
tempDir = directory / post['postId'] + '.tmp'
getFile(fileDir,tempDir,post['mediaURL'])
elif content['type'] == 'album': elif content['type'] == 'album':
images = content['object'].images images = content['object'].images
@@ -60,18 +38,17 @@ class Imgur:
howManyDownloaded = imagesLenght howManyDownloaded = imagesLenght
duplicates = 0 duplicates = 0
title = nameCorrector(post['postTitle']) filename = GLOBAL.config['filename'].format(**post)
print(post["postSubmitter"]+"_"+title+"_"+post['postId'],end="\n\n")
folderDir = directory / ( print(filename)
post["postSubmitter"] + "_" + title + "_" + post['postId']
) folderDir = directory / filename
try: try:
if not os.path.exists(folderDir): if not os.path.exists(folderDir):
os.makedirs(folderDir) os.makedirs(folderDir)
except FileNotFoundError: except FileNotFoundError:
folderDir = directory / post['postId'] folderDir = directory / post['POSTID']
os.makedirs(folderDir) os.makedirs(folderDir)
for i in range(imagesLenght): for i in range(imagesLenght):
@@ -82,42 +59,24 @@ class Imgur:
images[i]['Ext'] = getExtension(imageURL) images[i]['Ext'] = getExtension(imageURL)
fileName = (str(i+1) filename = (str(i+1)
+ "_" + "_"
+ nameCorrector(str(images[i]['title'])) + nameCorrector(str(images[i]['title']))
+ "_" + "_"
+ images[i]['id']) + images[i]['id'])
"""Filenames are declared here""" shortFilename = (str(i+1) + "_" + images[i]['id'])
fileDir = folderDir / (fileName + images[i]['Ext']) print("\n ({}/{})".format(i+1,imagesLenght))
tempDir = folderDir / (fileName + ".tmp")
print(" ({}/{})".format(i+1,imagesLenght))
print(" {}".format(fileName+images[i]['Ext']))
try: try:
getFile(fileDir,tempDir,imageURL,indent=2) getFile(filename,shortFilename,folderDir,imageURL,indent=2)
print() print()
except FileAlreadyExistsError: except FileAlreadyExistsError:
print(" The file already exists" + " "*10,end="\n\n") print(" The file already exists" + " "*10,end="\n\n")
duplicates += 1 duplicates += 1
howManyDownloaded -= 1 howManyDownloaded -= 1
# IF FILE NAME IS TOO LONG, IT WONT REGISTER
except FileNameTooLong:
fileName = (str(i+1) + "_" + images[i]['id'])
fileDir = folderDir / (fileName + images[i]['Ext'])
tempDir = folderDir / (fileName + ".tmp")
try:
getFile(fileDir,tempDir,imageURL,indent=2)
# IF STILL TOO LONG
except FileNameTooLong:
fileName = str(i+1)
fileDir = folderDir / (fileName + images[i]['Ext'])
tempDir = folderDir / (fileName + ".tmp")
getFile(fileDir,tempDir,imageURL,indent=2)
except Exception as exception: except Exception as exception:
print("\n Could not get the file") print("\n Could not get the file")
print( print(
@@ -143,8 +102,8 @@ class Imgur:
config = GLOBAL.config config = GLOBAL.config
return imgurpython.ImgurClient( return imgurpython.ImgurClient(
config['imgur_client_id'], config["credentials"]['imgur_client_id'],
config['imgur_client_secret'] config["credentials"]['imgur_client_secret']
) )
def getId(self,submissionURL): def getId(self,submissionURL):
"""Extract imgur post id """Extract imgur post id

View File

@@ -1,9 +1,14 @@
import sys import sys
import os import os
import time
from urllib.error import HTTPError from urllib.error import HTTPError
import urllib.request import urllib.request
from pathlib import Path
import hashlib
from src.errors import FileAlreadyExistsError, FileNameTooLong from src.utils import nameCorrector, GLOBAL
from src.utils import printToFile as print
from src.errors import FileAlreadyExistsError, FileNameTooLong, FailedToDownload, DomainInSkip
def dlProgress(count, blockSize, totalSize): def dlProgress(count, blockSize, totalSize):
"""Function for writing download progress to console """Function for writing download progress to console
@@ -30,16 +35,10 @@ def getExtension(link):
else: else:
return '.mp4' return '.mp4'
def getFile(fileDir,tempDir,imageURL,indent=0): def getFile(filename,shortFilename,folderDir,imageURL,indent=0, silent=False):
"""Downloads given file to given directory.
fileDir -- Full file directory if any(domain in imageURL for domain in GLOBAL.arguments.skip):
tempDir -- Full file directory with the extension of '.tmp' raise DomainInSkip
imageURL -- URL to the file to be downloaded
redditID -- Post's reddit id if renaming the file is necessary.
As too long file names seem not working.
"""
headers = [ headers = [
("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \ ("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
@@ -58,20 +57,45 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
opener.addheaders = headers opener.addheaders = headers
urllib.request.install_opener(opener) urllib.request.install_opener(opener)
if not (os.path.isfile(fileDir)): filename = nameCorrector(filename)
if not silent: print(" "*indent + str(folderDir),
" "*indent + str(filename),
sep="\n")
for i in range(3): for i in range(3):
fileDir = Path(folderDir) / filename
tempDir = Path(folderDir) / (filename+".tmp")
if not (os.path.isfile(fileDir)):
try: try:
urllib.request.urlretrieve(imageURL, urllib.request.urlretrieve(imageURL,
tempDir, tempDir,
reporthook=dlProgress) reporthook=dlProgress)
if GLOBAL.arguments.no_dupes:
fileHash = createHash(tempDir)
if fileHash in GLOBAL.hashList:
os.remove(tempDir)
raise FileAlreadyExistsError
GLOBAL.hashList.add(fileHash)
os.rename(tempDir,fileDir) os.rename(tempDir,fileDir)
if not silent: print(" "*indent+"Downloaded"+" "*10)
return None
except ConnectionResetError as exception: except ConnectionResetError as exception:
print(" "*indent + str(exception)) if not silent: print(" "*indent + str(exception))
print(" "*indent + "Trying again\n") if not silent: print(" "*indent + "Trying again\n")
except FileNotFoundError: except FileNotFoundError:
raise FileNameTooLong filename = shortFilename
else:
print(" "*indent+"Downloaded"+" "*10)
break
else: else:
raise FileAlreadyExistsError raise FileAlreadyExistsError
raise FailedToDownload
def createHash(filename):
hash_md5 = hashlib.md5()
with open(filename, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()

View File

@@ -6,39 +6,24 @@ from bs4 import BeautifulSoup
from src.downloaders.downloaderUtils import getFile, getExtension from src.downloaders.downloaderUtils import getFile, getExtension
from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely, from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely,
NotADownloadableLinkError, FileAlreadyExistsError) NotADownloadableLinkError, FileAlreadyExistsError)
from src.utils import nameCorrector from src.utils import GLOBAL
from src.utils import printToFile as print from src.utils import printToFile as print
class GifDeliveryNetwork: class GifDeliveryNetwork:
def __init__(self,directory,POST): def __init__(self,directory,POST):
try: try:
POST['mediaURL'] = self.getLink(POST['postURL']) POST['MEDIAURL'] = self.getLink(POST['CONTENTURL'])
except IndexError: except IndexError:
raise NotADownloadableLinkError("Could not read the page source") raise NotADownloadableLinkError("Could not read the page source")
POST['postExt'] = getExtension(POST['mediaURL']) POST['EXTENSION'] = getExtension(POST['MEDIAURL'])
if not os.path.exists(directory): os.makedirs(directory) if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(POST['postTitle'])
"""Filenames are declared here""" filename = GLOBAL.config['filename'].format(**POST)+POST["EXTENSION"]
shortFilename = POST['POSTID']+POST['EXTENSION']
print(POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']) getFile(filename,shortFilename,directory,POST['MEDIAURL'])
fileDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']
)
tempDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+".tmp"
)
try:
getFile(fileDir,tempDir,POST['mediaURL'])
except FileNameTooLong:
fileDir = directory / (POST['postId']+POST['postExt'])
tempDir = directory / (POST['postId']+".tmp")
getFile(fileDir,tempDir,POST['mediaURL'])
@staticmethod @staticmethod
def getLink(url): def getLink(url):

View File

@@ -6,39 +6,24 @@ from bs4 import BeautifulSoup
from src.downloaders.downloaderUtils import getFile, getExtension from src.downloaders.downloaderUtils import getFile, getExtension
from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely, from src.errors import (FileNameTooLong, AlbumNotDownloadedCompletely,
NotADownloadableLinkError, FileAlreadyExistsError) NotADownloadableLinkError, FileAlreadyExistsError)
from src.utils import nameCorrector from src.utils import GLOBAL
from src.utils import printToFile as print from src.utils import printToFile as print
class Redgifs: class Redgifs:
def __init__(self,directory,POST): def __init__(self,directory,POST):
try: try:
POST['mediaURL'] = self.getLink(POST['postURL']) POST['MEDIAURL'] = self.getLink(POST['CONTENTURL'])
except IndexError: except IndexError:
raise NotADownloadableLinkError("Could not read the page source") raise NotADownloadableLinkError("Could not read the page source")
POST['postExt'] = getExtension(POST['mediaURL']) POST['EXTENSION'] = getExtension(POST['MEDIAURL'])
if not os.path.exists(directory): os.makedirs(directory) if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(POST['postTitle'])
"""Filenames are declared here""" filename = GLOBAL.config['filename'].format(**POST)+POST["EXTENSION"]
shortFilename = POST['POSTID']+POST['EXTENSION']
print(POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']) getFile(filename,shortFilename,directory,POST['MEDIAURL'])
fileDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+POST['postExt']
)
tempDir = directory / (
POST["postSubmitter"]+"_"+title+"_"+POST['postId']+".tmp"
)
try:
getFile(fileDir,tempDir,POST['mediaURL'])
except FileNameTooLong:
fileDir = directory / (POST['postId']+POST['postExt'])
tempDir = directory / (POST['postId']+".tmp")
getFile(fileDir,tempDir,POST['mediaURL'])
def getLink(self, url): def getLink(self, url):
"""Extract direct link to the video from page's source """Extract direct link to the video from page's source

View File

@@ -3,7 +3,7 @@ import os
from pathlib import Path from pathlib import Path
from src.errors import FileAlreadyExistsError from src.errors import FileAlreadyExistsError
from src.utils import nameCorrector from src.utils import GLOBAL
VanillaPrint = print VanillaPrint = print
from src.utils import printToFile as print from src.utils import printToFile as print
@@ -12,15 +12,12 @@ class SelfPost:
def __init__(self,directory,post): def __init__(self,directory,post):
if not os.path.exists(directory): os.makedirs(directory) if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(post['postTitle']) filename = GLOBAL.config['filename'].format(**post)
"""Filenames are declared here""" fileDir = directory / (filename+".md")
print(fileDir)
print(filename+".md")
print(post["postSubmitter"]+"_"+title+"_"+post['postId']+".md")
fileDir = directory / (
post["postSubmitter"]+"_"+title+"_"+post['postId']+".md"
)
if Path.is_file(fileDir): if Path.is_file(fileDir):
raise FileAlreadyExistsError raise FileAlreadyExistsError
@@ -28,7 +25,7 @@ class SelfPost:
try: try:
self.writeToFile(fileDir,post) self.writeToFile(fileDir,post)
except FileNotFoundError: except FileNotFoundError:
fileDir = post['postId']+".md" fileDir = post['POSTID']+".md"
fileDir = directory / fileDir fileDir = directory / fileDir
self.writeToFile(fileDir,post) self.writeToFile(fileDir,post)
@@ -38,20 +35,20 @@ class SelfPost:
"""Self posts are formatted here""" """Self posts are formatted here"""
content = ("## [" content = ("## ["
+ post["postTitle"] + post["TITLE"]
+ "](" + "]("
+ post["postURL"] + post["CONTENTURL"]
+ ")\n" + ")\n"
+ post["postContent"] + post["CONTENT"]
+ "\n\n---\n\n" + "\n\n---\n\n"
+ "submitted to [r/" + "submitted to [r/"
+ post["postSubreddit"] + post["SUBREDDIT"]
+ "](https://www.reddit.com/r/" + "](https://www.reddit.com/r/"
+ post["postSubreddit"] + post["SUBREDDIT"]
+ ") by [u/" + ") by [u/"
+ post["postSubmitter"] + post["REDDITOR"]
+ "](https://www.reddit.com/user/" + "](https://www.reddit.com/user/"
+ post["postSubmitter"] + post["REDDITOR"]
+ ")") + ")")
with io.open(directory,"w",encoding="utf-8") as FILE: with io.open(directory,"w",encoding="utf-8") as FILE:

View File

@@ -0,0 +1,57 @@
import os
import subprocess
from src.downloaders.downloaderUtils import getFile, getExtension
from src.errors import FileNameTooLong
from src.utils import GLOBAL
from src.utils import printToFile as print
class VReddit:
def __init__(self,directory,post):
extension = ".mp4"
if not os.path.exists(directory): os.makedirs(directory)
filename = GLOBAL.config['filename'].format(**post)+extension
shortFilename = post['POSTID']+extension
try:
FNULL = open(os.devnull, 'w')
subprocess.call("ffmpeg", stdout=FNULL, stderr=subprocess.STDOUT)
except:
getFile(filename,shortFilename,directory,post['CONTENTURL'])
print("FFMPEG library not found, skipping merging video and audio")
else:
videoName = post['POSTID'] + "_video"
videoURL = post['CONTENTURL']
audioName = post['POSTID'] + "_audio"
audioURL = videoURL[:videoURL.rfind('/')] + '/audio'
print(directory,filename,sep="\n")
getFile(videoName,videoName,directory,videoURL,silent=True)
getFile(audioName,audioName,directory,audioURL,silent=True)
try:
self._mergeAudio(videoName,
audioName,
filename,
shortFilename,
directory)
except KeyboardInterrupt:
os.remove(directory / filename)
os.remove(directory / audioName)
os.rename(directory / videoName, directory / filename)
@staticmethod
def _mergeAudio(video,audio,filename,shortFilename,directory):
inputVideo = str(directory / video)
inputAudio = str(directory / audio)
FNULL = open(os.devnull, 'w')
cmd = f"ffmpeg -i {inputAudio} -i {inputVideo} -c:v copy -c:a aac -strict experimental {str(directory / filename)}"
subprocess.call(cmd, stdout=FNULL, stderr=subprocess.STDOUT)
os.remove(directory / video)
os.remove(directory / audio)

View File

@@ -0,0 +1,51 @@
import os
import youtube_dl
import sys
from src.downloaders.downloaderUtils import getExtension, dlProgress, createHash
from src.utils import GLOBAL
from src.utils import printToFile as print
from src.errors import FileAlreadyExistsError
class Youtube:
def __init__(self,directory,post):
if not os.path.exists(directory): os.makedirs(directory)
filename = GLOBAL.config['filename'].format(**post)
print(filename)
self.download(filename,directory,post['CONTENTURL'])
def download(self,filename,directory,url):
ydl_opts = {
"format": "best",
"outtmpl": str(directory / (filename + ".%(ext)s")),
"progress_hooks": [self._hook],
"playlistend": 1,
"nooverwrites": True,
"quiet": True
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
location = directory/(filename+".mp4")
if GLOBAL.arguments.no_dupes:
try:
fileHash = createHash(location)
except FileNotFoundError:
return None
if fileHash in GLOBAL.hashList:
os.remove(location)
raise FileAlreadyExistsError
GLOBAL.hashList.add(fileHash)
@staticmethod
def _hook(d):
if d['status'] == 'finished': return print("Downloaded")
downloadedMbs = int(d['downloaded_bytes'] * (10**(-6)))
fileSize = int(d['total_bytes']*(10**(-6)))
sys.stdout.write("{}Mb/{}Mb\r".format(downloadedMbs,fileSize))
sys.stdout.flush()

View File

@@ -1,12 +1,8 @@
import sys import sys
class FauxTb(object): def full_exc_info(exc_info):
def __init__(self, tb_frame, tb_lineno, tb_next):
self.tb_frame = tb_frame
self.tb_lineno = tb_lineno
self.tb_next = tb_next
def current_stack(skip=0): def current_stack(skip=0):
try: 1/0 try: 1/0
except ZeroDivisionError: except ZeroDivisionError:
f = sys.exc_info()[2].tb_frame f = sys.exc_info()[2].tb_frame
@@ -18,14 +14,20 @@ def current_stack(skip=0):
f = f.f_back f = f.f_back
return lst return lst
def extend_traceback(tb, stack): def extend_traceback(tb, stack):
class FauxTb(object):
def __init__(self, tb_frame, tb_lineno, tb_next):
self.tb_frame = tb_frame
self.tb_lineno = tb_lineno
self.tb_next = tb_next
"""Extend traceback with stack info.""" """Extend traceback with stack info."""
head = tb head = tb
for tb_frame, tb_lineno in stack: for tb_frame, tb_lineno in stack:
head = FauxTb(tb_frame, tb_lineno, head) head = FauxTb(tb_frame, tb_lineno, head)
return head return head
def full_exc_info(exc_info):
"""Like sys.exc_info, but includes the full traceback.""" """Like sys.exc_info, but includes the full traceback."""
t, v, tb = exc_info t, v, tb = exc_info
full_tb = extend_traceback(tb, current_stack(1)) full_tb = extend_traceback(tb, current_stack(1))
@@ -87,3 +89,15 @@ class NoSuitablePost(Exception):
class ImgurLimitError(Exception): class ImgurLimitError(Exception):
pass pass
class DirectLinkNotFound(Exception):
pass
class InvalidJSONFile(Exception):
pass
class FailedToDownload(Exception):
pass
class DomainInSkip(Exception):
pass

58
src/jsonHelper.py Normal file
View File

@@ -0,0 +1,58 @@
import json
from os import path, remove
from src.errors import InvalidJSONFile
class JsonFile:
""" Write and read JSON files
Use add(self,toBeAdded) to add to files
Use delete(self,*deletedKeys) to delete keys
"""
FILEDIR = ""
def __init__(self,FILEDIR):
self.FILEDIR = FILEDIR
if not path.exists(self.FILEDIR):
self.__writeToFile({},create=True)
def read(self):
try:
with open(self.FILEDIR, 'r') as f:
return json.load(f)
except json.decoder.JSONDecodeError:
raise InvalidJSONFile(f"{self.FILEDIR} cannot be read")
def add(self,toBeAdded,sub=None):
"""Takes a dictionary and merges it with json file.
It uses new key's value if a key already exists.
Returns the new content as a dictionary.
"""
data = self.read()
if sub: data[sub] = {**data[sub], **toBeAdded}
else: data = {**data, **toBeAdded}
self.__writeToFile(data)
return self.read()
def delete(self,*deleteKeys):
"""Delete given keys from JSON file.
Returns the new content as a dictionary.
"""
data = self.read()
for deleteKey in deleteKeys:
if deleteKey in data:
del data[deleteKey]
found = True
if not found:
return False
self.__writeToFile(data)
def __writeToFile(self,content,create=False):
if not create:
remove(self.FILEDIR)
with open(self.FILEDIR, 'w') as f:
json.dump(content, f, indent=4)

270
src/programMode.py Normal file
View File

@@ -0,0 +1,270 @@
from src.errors import SearchModeError, RedditorNameError, ProgramModeError, InvalidSortingType
from src.utils import GLOBAL
from src.parser import LinkDesigner
from pathlib import Path
import sys
class ProgramMode:
def __init__(self,arguments):
self.arguments = arguments
def generate(self):
try:
self._validateProgramMode()
except ProgramModeError:
self._promptUser()
programMode = {}
if self.arguments.user is not None:
programMode["user"] = self.arguments.user
if self.arguments.search is not None:
programMode["search"] = self.arguments.search
if self.arguments.sort == "hot" or \
self.arguments.sort == "controversial" or \
self.arguments.sort == "rising":
self.arguments.sort = "relevance"
if self.arguments.sort is not None:
programMode["sort"] = self.arguments.sort
else:
if self.arguments.submitted:
programMode["sort"] = "new"
else:
programMode["sort"] = "hot"
if self.arguments.time is not None:
programMode["time"] = self.arguments.time
else:
programMode["time"] = "all"
if self.arguments.link is not None:
self.arguments.link = self.arguments.link.strip("\"")
programMode = LinkDesigner(self.arguments.link)
if self.arguments.search is not None:
programMode["search"] = self.arguments.search
if self.arguments.sort is not None:
programMode["sort"] = self.arguments.sort
if self.arguments.time is not None:
programMode["time"] = self.arguments.time
elif self.arguments.subreddit is not None:
if type(self.arguments.subreddit) == list:
self.arguments.subreddit = "+".join(self.arguments.subreddit)
programMode["subreddit"] = self.arguments.subreddit
elif self.arguments.multireddit is not None:
programMode["multireddit"] = self.arguments.multireddit
elif self.arguments.saved is True:
programMode["saved"] = True
elif self.arguments.upvoted is True:
programMode["upvoted"] = True
elif self.arguments.submitted is not None:
programMode["submitted"] = True
if self.arguments.sort == "rising":
raise InvalidSortingType("Invalid sorting type has given")
programMode["limit"] = self.arguments.limit
return programMode
@staticmethod
def _chooseFrom(choices):
print()
choicesByIndex = list(str(x) for x in range(len(choices)+1))
for i in range(len(choices)):
print("{indent}[{order}] {mode}".format(
indent=" "*4,order=i+1,mode=choices[i]
))
print(" "*4+"[0] exit\n")
choice = input("> ")
while not choice.lower() in choices+choicesByIndex+["exit"]:
print("Invalid input\n")
input("> ")
if choice == "0" or choice == "exit":
sys.exit()
elif choice in choicesByIndex:
return choices[int(choice)-1]
else:
return choice
def _promptUser(self):
print("select program mode:")
programModes = [
"search","subreddit","multireddit",
"submitted","upvoted","saved","log"
]
programMode = self._chooseFrom(programModes)
if programMode == "search":
self.arguments.search = input("\nquery: ")
self.arguments.subreddit = input("\nsubreddit: ")
print("\nselect sort type:")
sortTypes = [
"relevance","top","new"
]
sortType = self._chooseFrom(sortTypes)
self.arguments.sort = sortType
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self._chooseFrom(timeFilters)
self.arguments.time = timeFilter
if programMode == "subreddit":
subredditInput = input("(type frontpage for all subscribed subreddits,\n" \
" use plus to seperate multi subreddits:" \
" pics+funny+me_irl etc.)\n\n" \
"subreddit: ")
self.arguments.subreddit = subredditInput
# while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
# subredditInput = input("subreddit: ")
# self.arguments.subreddit += "+" + subredditInput
if " " in self.arguments.subreddit:
self.arguments.subreddit = "+".join(self.arguments.subreddit.split())
# DELETE THE PLUS (+) AT THE END
if not subredditInput.lower() == "frontpage" \
and self.arguments.subreddit[-1] == "+":
self.arguments.subreddit = self.arguments.subreddit[:-1]
print("\nselect sort type:")
sortTypes = [
"hot","top","new","rising","controversial"
]
sortType = self._chooseFrom(sortTypes)
self.arguments.sort = sortType
if sortType in ["top","controversial"]:
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self._chooseFrom(timeFilters)
self.arguments.time = timeFilter
else:
self.arguments.time = "all"
elif programMode == "multireddit":
self.arguments.user = input("\nmultireddit owner: ")
self.arguments.multireddit = input("\nmultireddit: ")
print("\nselect sort type:")
sortTypes = [
"hot","top","new","rising","controversial"
]
sortType = self._chooseFrom(sortTypes)
self.arguments.sort = sortType
if sortType in ["top","controversial"]:
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self._chooseFrom(timeFilters)
self.arguments.time = timeFilter
else:
self.arguments.time = "all"
elif programMode == "submitted":
self.arguments.submitted = True
self.arguments.user = input("\nredditor: ")
print("\nselect sort type:")
sortTypes = [
"hot","top","new","controversial"
]
sortType = self._chooseFrom(sortTypes)
self.arguments.sort = sortType
if sortType == "top":
print("\nselect time filter:")
timeFilters = [
"hour","day","week","month","year","all"
]
timeFilter = self._chooseFrom(timeFilters)
self.arguments.time = timeFilter
else:
self.arguments.time = "all"
elif programMode == "upvoted":
self.arguments.upvoted = True
self.arguments.user = input("\nredditor: ")
elif programMode == "saved":
self.arguments.saved = True
elif programMode == "log":
while True:
self.arguments.log = input("\nlog file directory:")
if Path(self.arguments.log).is_file():
break
while True:
try:
self.arguments.limit = int(input("\nlimit (0 for none): "))
if self.arguments.limit == 0:
self.arguments.limit = None
break
except ValueError:
pass
def _validateProgramMode(self):
"""Check if command-line self.arguments are given correcly,
if not, raise errors
"""
if self.arguments.user is None:
user = 0
else:
user = 1
search = 1 if self.arguments.search else 0
modes = [
"saved","subreddit","submitted","log","link","upvoted","multireddit"
]
values = {
x: 0 if getattr(self.arguments,x) is None or \
getattr(self.arguments,x) is False \
else 1 \
for x in modes
}
if not sum(values[x] for x in values) == 1:
raise ProgramModeError("Invalid program mode")
if search+values["saved"] == 2:
raise SearchModeError("You cannot search in your saved posts")
if search+values["submitted"] == 2:
raise SearchModeError("You cannot search in submitted posts")
if search+values["upvoted"] == 2:
raise SearchModeError("You cannot search in upvoted posts")
if search+values["log"] == 2:
raise SearchModeError("You cannot search in log files")
if values["upvoted"]+values["submitted"] == 1 and user == 0:
raise RedditorNameError("No redditor name given")

98
src/reddit.py Normal file
View File

@@ -0,0 +1,98 @@
import praw
import random
import socket
import webbrowser
from prawcore.exceptions import NotFound, ResponseException, Forbidden
from src.utils import GLOBAL
from src.jsonHelper import JsonFile
from src. errors import RedditLoginFailed
class Reddit:
def __init__(self,refresh_token=None):
self.SCOPES = ['identity','history','read','save']
self.PORT = 7634
self.refresh_token = refresh_token
self.redditInstance = None
self.arguments = {
"client_id":GLOBAL.reddit_client_id,
"client_secret":GLOBAL.reddit_client_secret,
"user_agent":str(socket.gethostname())
}
def begin(self):
if self.refresh_token:
self.arguments["refresh_token"] = self.refresh_token
self.redditInstance = praw.Reddit(**self.arguments)
try:
self.redditInstance.auth.scopes()
return self.redditInstance
except ResponseException:
self.arguments["redirect_uri"] = "http://localhost:" + str(self.PORT)
self.redditInstance = praw.Reddit(**self.arguments)
reddit, refresh_token = self.getRefreshToken(*self.SCOPES)
else:
self.arguments["redirect_uri"] = "http://localhost:" + str(self.PORT)
self.redditInstance = praw.Reddit(**self.arguments)
reddit, refresh_token = self.getRefreshToken(*self.SCOPES)
JsonFile(GLOBAL.configDirectory).add({
"reddit_username": str(reddit.user.me()),
"reddit": refresh_token
},"credentials")
return self.redditInstance
def recieve_connection(self):
"""Wait for and then return a connected socket..
Opens a TCP connection on port 8080, and waits for a single client.
"""
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server.bind(('localhost', self.PORT))
server.listen(1)
client = server.accept()[0]
server.close()
return client
def send_message(self, client, message):
"""Send message to client and close the connection."""
client.send(
'HTTP/1.1 200 OK\r\n\r\n{}'.format(message).encode('utf-8')
)
client.close()
def getRefreshToken(self,*scopes):
state = str(random.randint(0, 65000))
url = self.redditInstance.auth.url(scopes, state, 'permanent')
print("---Setting up the Reddit API---\n")
print("Go to this URL and login to reddit:\n",url,sep="\n",end="\n\n")
webbrowser.open(url,new=2)
client = self.recieve_connection()
data = client.recv(1024).decode('utf-8')
str(data)
param_tokens = data.split(' ', 2)[1].split('?', 1)[1].split('&')
params = {
key: value for (key, value) in [token.split('=') \
for token in param_tokens]
}
if state != params['state']:
self.send_message(
client, 'State mismatch. Expected: {} Received: {}'
.format(state, params['state'])
)
raise RedditLoginFailed
elif 'error' in params:
self.send_message(client, params['error'])
raise RedditLoginFailed
refresh_token = self.redditInstance.auth.authorize(params['code'])
self.send_message(client,
"<script>" \
"alert(\"You can go back to terminal window now.\");" \
"</script>"
)
return (self.redditInstance,refresh_token)

View File

@@ -2,6 +2,7 @@ import os
import sys import sys
import random import random
import socket import socket
import time
import webbrowser import webbrowser
import urllib.request import urllib.request
from urllib.error import HTTPError from urllib.error import HTTPError
@@ -9,477 +10,330 @@ from urllib.error import HTTPError
import praw import praw
from prawcore.exceptions import NotFound, ResponseException, Forbidden from prawcore.exceptions import NotFound, ResponseException, Forbidden
from src.utils import GLOBAL, createLogFile, jsonFile, printToFile from src.reddit import Reddit
from src.utils import GLOBAL, createLogFile, printToFile
from src.jsonHelper import JsonFile
from src.errors import (NoMatchingSubmissionFound, NoPrawSupport, from src.errors import (NoMatchingSubmissionFound, NoPrawSupport,
NoRedditSupport, MultiredditNotFound, NoRedditSupport, MultiredditNotFound,
InvalidSortingType, RedditLoginFailed, InvalidSortingType, RedditLoginFailed,
InsufficientPermission) InsufficientPermission, DirectLinkNotFound)
print = printToFile print = printToFile
def beginPraw(config,user_agent = str(socket.gethostname())): def getPosts(programMode):
class GetAuth: """Call PRAW regarding to arguments and pass it to extractDetails.
def __init__(self,redditInstance,port): Return what extractDetails has returned.
self.redditInstance = redditInstance
self.PORT = int(port)
def recieve_connection(self):
"""Wait for and then return a connected socket..
Opens a TCP connection on port 8080, and waits for a single client.
"""
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server.bind(('localhost', self.PORT))
server.listen(1)
client = server.accept()[0]
server.close()
return client
def send_message(self, client, message):
"""Send message to client and close the connection."""
client.send(
'HTTP/1.1 200 OK\r\n\r\n{}'.format(message).encode('utf-8')
)
client.close()
def getRefreshToken(self,*scopes):
state = str(random.randint(0, 65000))
url = self.redditInstance.auth.url(scopes, state, 'permanent')
print("Go to this URL and login to reddit:\n\n",url)
webbrowser.open(url,new=2)
client = self.recieve_connection()
data = client.recv(1024).decode('utf-8')
str(data)
param_tokens = data.split(' ', 2)[1].split('?', 1)[1].split('&')
params = {
key: value for (key, value) in [token.split('=') \
for token in param_tokens]
}
if state != params['state']:
self.send_message(
client, 'State mismatch. Expected: {} Received: {}'
.format(state, params['state'])
)
raise RedditLoginFailed
elif 'error' in params:
self.send_message(client, params['error'])
raise RedditLoginFailed
refresh_token = self.redditInstance.auth.authorize(params['code'])
self.send_message(client,
"<script>" \
"alert(\"You can go back to terminal window now.\");" \
"</script>"
)
return (self.redditInstance,refresh_token)
"""Start reddit instance"""
scopes = ['identity','history','read']
port = "1337"
arguments = {
"client_id":GLOBAL.reddit_client_id,
"client_secret":GLOBAL.reddit_client_secret,
"user_agent":user_agent
}
if "reddit_refresh_token" in GLOBAL.config:
arguments["refresh_token"] = GLOBAL.config["reddit_refresh_token"]
reddit = praw.Reddit(**arguments)
try:
reddit.auth.scopes()
except ResponseException:
arguments["redirect_uri"] = "http://localhost:" + str(port)
reddit = praw.Reddit(**arguments)
authorizedInstance = GetAuth(reddit,port).getRefreshToken(*scopes)
reddit = authorizedInstance[0]
refresh_token = authorizedInstance[1]
jsonFile(GLOBAL.configDirectory).add({
"reddit_username":str(reddit.user.me()),
"reddit_refresh_token":refresh_token
})
else:
arguments["redirect_uri"] = "http://localhost:" + str(port)
reddit = praw.Reddit(**arguments)
authorizedInstance = GetAuth(reddit,port).getRefreshToken(*scopes)
reddit = authorizedInstance[0]
refresh_token = authorizedInstance[1]
jsonFile(GLOBAL.configDirectory).add({
"reddit_username":str(reddit.user.me()),
"reddit_refresh_token":refresh_token
})
return reddit
def getPosts(args):
"""Call PRAW regarding to arguments and pass it to redditSearcher.
Return what redditSearcher has returned.
""" """
config = GLOBAL.config reddit = Reddit(GLOBAL.config["credentials"]["reddit"]).begin()
reddit = beginPraw(config)
if args["sort"] == "best": if programMode["sort"] == "best":
raise NoPrawSupport("PRAW does not support that") raise NoPrawSupport("PRAW does not support that")
if "subreddit" in args: if "subreddit" in programMode:
if "search" in args: if "search" in programMode:
if args["subreddit"] == "frontpage": if programMode["subreddit"] == "frontpage":
args["subreddit"] = "all" programMode["subreddit"] = "all"
if "user" in args: if "user" in programMode:
if args["user"] == "me": if programMode["user"] == "me":
args["user"] = str(reddit.user.me()) programMode["user"] = str(reddit.user.me())
if not "search" in args: if not "search" in programMode:
if args["sort"] == "top" or args["sort"] == "controversial": if programMode["sort"] == "top" or programMode["sort"] == "controversial":
keyword_params = { keyword_params = {
"time_filter":args["time"], "time_filter":programMode["time"],
"limit":args["limit"] "limit":programMode["limit"]
} }
# OTHER SORT TYPES DON'T TAKE TIME_FILTER # OTHER SORT TYPES DON'T TAKE TIME_FILTER
else: else:
keyword_params = { keyword_params = {
"limit":args["limit"] "limit":programMode["limit"]
} }
else: else:
keyword_params = { keyword_params = {
"time_filter":args["time"], "time_filter":programMode["time"],
"limit":args["limit"] "limit":programMode["limit"]
} }
if "search" in args: if "search" in programMode:
if GLOBAL.arguments.sort in ["hot","rising","controversial"]: if programMode["sort"] in ["hot","rising","controversial"]:
raise InvalidSortingType("Invalid sorting type has given") raise InvalidSortingType("Invalid sorting type has given")
if "subreddit" in args: if "subreddit" in programMode:
print ( print (
"search for \"{search}\" in\n" \ "search for \"{search}\" in\n" \
"subreddit: {subreddit}\nsort: {sort}\n" \ "subreddit: {subreddit}\nsort: {sort}\n" \
"time: {time}\nlimit: {limit}\n".format( "time: {time}\nlimit: {limit}\n".format(
search=args["search"], search=programMode["search"],
limit=args["limit"], limit=programMode["limit"],
sort=args["sort"], sort=programMode["sort"],
subreddit=args["subreddit"], subreddit=programMode["subreddit"],
time=args["time"] time=programMode["time"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
return redditSearcher( return extractDetails(
reddit.subreddit(args["subreddit"]).search( reddit.subreddit(programMode["subreddit"]).search(
args["search"], programMode["search"],
limit=args["limit"], limit=programMode["limit"],
sort=args["sort"], sort=programMode["sort"],
time_filter=args["time"] time_filter=programMode["time"]
) )
) )
elif "multireddit" in args: elif "multireddit" in programMode:
raise NoPrawSupport("PRAW does not support that") raise NoPrawSupport("PRAW does not support that")
elif "user" in args: elif "user" in programMode:
raise NoPrawSupport("PRAW does not support that") raise NoPrawSupport("PRAW does not support that")
elif "saved" in args: elif "saved" in programMode:
raise ("Reddit does not support that") raise ("Reddit does not support that")
if args["sort"] == "relevance": if programMode["sort"] == "relevance":
raise InvalidSortingType("Invalid sorting type has given") raise InvalidSortingType("Invalid sorting type has given")
if "saved" in args: if "saved" in programMode:
print( print(
"saved posts\nuser:{username}\nlimit={limit}\n".format( "saved posts\nuser:{username}\nlimit={limit}\n".format(
username=reddit.user.me(), username=reddit.user.me(),
limit=args["limit"] limit=programMode["limit"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
return redditSearcher(reddit.user.me().saved(limit=args["limit"])) return extractDetails(reddit.user.me().saved(limit=programMode["limit"]))
if "subreddit" in args: if "subreddit" in programMode:
if args["subreddit"] == "frontpage": if programMode["subreddit"] == "frontpage":
print ( print (
"subreddit: {subreddit}\nsort: {sort}\n" \ "subreddit: {subreddit}\nsort: {sort}\n" \
"time: {time}\nlimit: {limit}\n".format( "time: {time}\nlimit: {limit}\n".format(
limit=args["limit"], limit=programMode["limit"],
sort=args["sort"], sort=programMode["sort"],
subreddit=args["subreddit"], subreddit=programMode["subreddit"],
time=args["time"] time=programMode["time"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
return redditSearcher( return extractDetails(
getattr(reddit.front,args["sort"]) (**keyword_params) getattr(reddit.front,programMode["sort"]) (**keyword_params)
) )
else: else:
print ( print (
"subreddit: {subreddit}\nsort: {sort}\n" \ "subreddit: {subreddit}\nsort: {sort}\n" \
"time: {time}\nlimit: {limit}\n".format( "time: {time}\nlimit: {limit}\n".format(
limit=args["limit"], limit=programMode["limit"],
sort=args["sort"], sort=programMode["sort"],
subreddit=args["subreddit"], subreddit=programMode["subreddit"],
time=args["time"] time=programMode["time"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
return redditSearcher( return extractDetails(
getattr( getattr(
reddit.subreddit(args["subreddit"]),args["sort"] reddit.subreddit(programMode["subreddit"]),programMode["sort"]
) (**keyword_params) ) (**keyword_params)
) )
elif "multireddit" in args: elif "multireddit" in programMode:
print ( print (
"user: {user}\n" \ "user: {user}\n" \
"multireddit: {multireddit}\nsort: {sort}\n" \ "multireddit: {multireddit}\nsort: {sort}\n" \
"time: {time}\nlimit: {limit}\n".format( "time: {time}\nlimit: {limit}\n".format(
user=args["user"], user=programMode["user"],
limit=args["limit"], limit=programMode["limit"],
sort=args["sort"], sort=programMode["sort"],
multireddit=args["multireddit"], multireddit=programMode["multireddit"],
time=args["time"] time=programMode["time"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
try: try:
return redditSearcher( return extractDetails(
getattr( getattr(
reddit.multireddit( reddit.multireddit(
args["user"], args["multireddit"] programMode["user"], programMode["multireddit"]
),args["sort"] ),programMode["sort"]
) (**keyword_params) ) (**keyword_params)
) )
except NotFound: except NotFound:
raise MultiredditNotFound("Multireddit not found") raise MultiredditNotFound("Multireddit not found")
elif "submitted" in args: elif "submitted" in programMode:
print ( print (
"submitted posts of {user}\nsort: {sort}\n" \ "submitted posts of {user}\nsort: {sort}\n" \
"time: {time}\nlimit: {limit}\n".format( "time: {time}\nlimit: {limit}\n".format(
limit=args["limit"], limit=programMode["limit"],
sort=args["sort"], sort=programMode["sort"],
user=args["user"], user=programMode["user"],
time=args["time"] time=programMode["time"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
return redditSearcher( return extractDetails(
getattr( getattr(
reddit.redditor(args["user"]).submissions,args["sort"] reddit.redditor(programMode["user"]).submissions,programMode["sort"]
) (**keyword_params) ) (**keyword_params)
) )
elif "upvoted" in args: elif "upvoted" in programMode:
print ( print (
"upvoted posts of {user}\nlimit: {limit}\n".format( "upvoted posts of {user}\nlimit: {limit}\n".format(
user=args["user"], user=programMode["user"],
limit=args["limit"] limit=programMode["limit"]
).upper(),noPrint=True ).upper(),noPrint=True
) )
try: try:
return redditSearcher( return extractDetails(
reddit.redditor(args["user"]).upvoted(limit=args["limit"]) reddit.redditor(programMode["user"]).upvoted(limit=programMode["limit"])
) )
except Forbidden: except Forbidden:
raise InsufficientPermission("You do not have permission to do that") raise InsufficientPermission("You do not have permission to do that")
elif "post" in args: elif "post" in programMode:
print("post: {post}\n".format(post=args["post"]).upper(),noPrint=True) print("post: {post}\n".format(post=programMode["post"]).upper(),noPrint=True)
return redditSearcher( return extractDetails(
reddit.submission(url=args["post"]),SINGLE_POST=True reddit.submission(url=programMode["post"]),SINGLE_POST=True
) )
def redditSearcher(posts,SINGLE_POST=False): def extractDetails(posts,SINGLE_POST=False):
"""Check posts and decide if it can be downloaded. """Check posts and decide if it can be downloaded.
If so, create a dictionary with post details and append them to a list. If so, create a dictionary with post details and append them to a list.
Write all of posts to file. Return the list Write all of posts to file. Return the list
""" """
subList = [] postList = []
global subCount postCount = 0
subCount = 0
global orderCount
orderCount = 0
global gfycatCount
gfycatCount = 0
global redgifsCount
redgifsCount = 0
global imgurCount
imgurCount = 0
global eromeCount
eromeCount = 0
global gifDeliveryNetworkCount
gifDeliveryNetworkCount = 0
global directCount
directCount = 0
global selfCount
selfCount = 0
allPosts = {} allPosts = {}
print("\nGETTING POSTS") print("\nGETTING POSTS")
if GLOBAL.arguments.verbose: print("\n")
postsFile = createLogFile("POSTS") postsFile = createLogFile("POSTS")
if SINGLE_POST: if SINGLE_POST:
submission = posts submission = posts
subCount += 1 postCount += 1
try: try:
details = {'postId':submission.id, details = {'POSTID':submission.id,
'postTitle':submission.title, 'TITLE':submission.title,
'postSubmitter':str(submission.author), 'REDDITOR':str(submission.author),
'postType':None, 'TYPE':None,
'postURL':submission.url, 'CONTENTURL':submission.url,
'postSubreddit':submission.subreddit.display_name} 'SUBREDDIT':submission.subreddit.display_name,
'UPVOTES': submission.score,
'FLAIR':submission.link_flair_text,
'DATE':str(time.strftime(
"%Y-%m-%d_%H-%M",
time.localtime(submission.created_utc)
))}
except AttributeError: except AttributeError:
pass pass
result = checkIfMatching(submission) result = matchWithDownloader(submission)
if result is not None: if result is not None:
details = result details = {**details, **result}
orderCount += 1 postList.append(details)
if GLOBAL.arguments.verbose:
printSubmission(submission,subCount,orderCount)
subList.append(details)
postsFile.add({subCount:[details]}) postsFile.add({postCount:details})
else: else:
try: try:
for submission in posts: for submission in posts:
subCount += 1 postCount += 1
if subCount % 100 == 0 and not GLOBAL.arguments.verbose: if postCount % 100 == 0:
sys.stdout.write("") sys.stdout.write("")
sys.stdout.flush() sys.stdout.flush()
if subCount % 1000 == 0: if postCount % 1000 == 0:
sys.stdout.write("\n"+" "*14) sys.stdout.write("\n"+" "*14)
sys.stdout.flush() sys.stdout.flush()
try: try:
details = {'postId':submission.id, details = {'POSTID':submission.id,
'postTitle':submission.title, 'TITLE':submission.title,
'postSubmitter':str(submission.author), 'REDDITOR':str(submission.author),
'postType':None, 'TYPE':None,
'postURL':submission.url, 'CONTENTURL':submission.url,
'postSubreddit':submission.subreddit.display_name} 'SUBREDDIT':submission.subreddit.display_name,
'UPVOTES': submission.score,
'FLAIR':submission.link_flair_text,
'DATE':str(time.strftime(
"%Y-%m-%d_%H-%M",
time.localtime(submission.created_utc)
))}
except AttributeError: except AttributeError:
continue continue
result = checkIfMatching(submission) result = matchWithDownloader(submission)
if result is not None: if result is not None:
details = result details = {**details, **result}
orderCount += 1 postList.append(details)
if GLOBAL.arguments.verbose:
printSubmission(submission,subCount,orderCount)
subList.append(details)
allPosts[subCount] = [details] allPosts[postCount] = details
except KeyboardInterrupt: except KeyboardInterrupt:
print("\nKeyboardInterrupt",noPrint=True) print("\nKeyboardInterrupt",noPrint=True)
postsFile.add(allPosts) postsFile.add(allPosts)
if not len(subList) == 0: if not len(postList) == 0:
if GLOBAL.arguments.NoDownload or GLOBAL.arguments.verbose:
print(
f"\n\nTotal of {len(subList)} submissions found!"
)
print(
f"{gfycatCount} GFYCATs, {imgurCount} IMGURs, " \
f"{eromeCount} EROMEs, {directCount} DIRECTs " \
f"and {selfCount} SELF POSTS",noPrint=True
)
else:
print() print()
return subList return postList
else: else:
raise NoMatchingSubmissionFound("No matching submission was found") raise NoMatchingSubmissionFound("No matching submission was found")
def checkIfMatching(submission): def matchWithDownloader(submission):
global gfycatCount
global redgifsCount if 'v.redd.it' in submission.domain:
global imgurCount bitrates = ["DASH_1080","DASH_720","DASH_600", \
global eromeCount "DASH_480","DASH_360","DASH_240"]
global directCount
global gifDeliveryNetworkCount for bitrate in bitrates:
global selfCount videoURL = submission.url+"/"+bitrate
try: try:
details = {'postId':submission.id, responseCode = urllib.request.urlopen(videoURL).getcode()
'postTitle':submission.title, except urllib.error.HTTPError:
'postSubmitter':str(submission.author), responseCode = 0
'postType':None,
'postURL':submission.url, if responseCode == 200:
'postSubreddit':submission.subreddit.display_name} return {'TYPE': 'v.redd.it', 'CONTENTURL': videoURL}
except AttributeError:
return None
if 'gfycat' in submission.domain: if 'gfycat' in submission.domain:
details['postType'] = 'gfycat' return {'TYPE': 'gfycat'}
gfycatCount += 1
return details if 'youtube' in submission.domain \
and 'watch' in submission.url:
return {'TYPE': 'youtube'}
if 'youtu.be' in submission.domain:
url = urllib.request.urlopen(submission.url).geturl()
if 'watch' in url:
return {'TYPE': 'youtube'}
elif 'imgur' in submission.domain: elif 'imgur' in submission.domain:
details['postType'] = 'imgur' return {'TYPE': 'imgur'}
imgurCount += 1
return details
elif 'erome' in submission.domain: elif 'erome' in submission.domain:
details['postType'] = 'erome' return {'TYPE': 'erome'}
eromeCount += 1
return details
elif 'redgifs' in submission.domain: elif 'redgifs' in submission.domain:
details['postType'] = 'redgifs' return {'TYPE': 'redgifs'}
redgifsCount += 1
return details
elif 'gifdeliverynetwork' in submission.domain: elif 'gifdeliverynetwork' in submission.domain:
details['postType'] = 'gifdeliverynetwork' return {'TYPE': 'gifdeliverynetwork'}
gifDeliveryNetworkCount += 1
return details
elif submission.is_self: elif submission.is_self and 'self' not in GLOBAL.arguments.skip:
details['postType'] = 'self' return {'TYPE': 'self',
details['postContent'] = submission.selftext 'CONTENT': submission.selftext}
selfCount += 1
return details
directLink = isDirectLink(submission.url)
if directLink is not False:
details['postType'] = 'direct'
details['postURL'] = directLink
directCount += 1
return details
def printSubmission(SUB,validNumber,totalNumber):
"""Print post's link, title and media link to screen"""
print(validNumber,end=") ")
print(totalNumber,end=" ")
print(
"https://www.reddit.com/"
+"r/"
+SUB.subreddit.display_name
+"/comments/"
+SUB.id
)
print(" "*(len(str(validNumber))
+(len(str(totalNumber)))+3),end="")
try: try:
print(SUB.title) return {'TYPE': 'direct',
except: 'CONTENTURL': extractDirectLink(submission.url)}
SUB.title = "unnamed" except DirectLinkNotFound:
print("SUBMISSION NAME COULD NOT BE READ") return None
pass
print(" "*(len(str(validNumber))+(len(str(totalNumber)))+3),end="") def extractDirectLink(URL):
print(SUB.url,end="\n\n")
def isDirectLink(URL):
"""Check if link is a direct image link. """Check if link is a direct image link.
If so, return URL, If so, return URL,
if not, return False if not, return False
@@ -508,10 +362,10 @@ def isDirectLink(URL):
return videoURL return videoURL
else: else:
return False raise DirectLinkNotFound
for extension in imageTypes: for extension in imageTypes:
if extension in URL.split("/")[-1]: if extension in URL.split("/")[-1]:
return URL return URL
else: else:
return False raise DirectLinkNotFound

24
src/store.py Normal file
View File

@@ -0,0 +1,24 @@
from os import path
class Store:
def __init__(self,directory=None):
self.directory = directory
if self.directory:
if path.exists(directory):
with open(directory, 'r') as f:
self.list = f.read().split("\n")
else:
with open(self.directory, 'a'):
pass
self.list = []
else:
self.list = []
def __call__(self):
return self.list
def add(self, filehash):
self.list.append(filehash)
if self.directory:
with open(self.directory, 'a') as f:
f.write("{filehash}\n".format(filehash=filehash))

View File

@@ -1,91 +1,41 @@
import io import io
import json import json
import sys import sys
import time
from os import makedirs, path, remove from os import makedirs, path, remove
from pathlib import Path from pathlib import Path
from src.jsonHelper import JsonFile
from src.errors import FileNotFoundError from src.errors import FileNotFoundError
class GLOBAL: class GLOBAL:
"""Declare global variables""" """Declare global variables"""
RUN_TIME = 0 RUN_TIME = ""
config = {'imgur_client_id':None, 'imgur_client_secret': None} config = {'imgur_client_id':None, 'imgur_client_secret': None}
arguments = None arguments = None
directory = None directory = None
defaultConfigDirectory = Path.home() / "Bulk Downloader for Reddit" defaultConfigDirectory = Path.home() / "Bulk Downloader for Reddit"
configDirectory = "" configDirectory = ""
reddit_client_id = "BSyphDdxYZAgVQ" reddit_client_id = "U-6gk4ZCh3IeNQ"
reddit_client_secret = "bfqNJaRh8NMh-9eAr-t4TRz-Blk" reddit_client_secret = "7CZHY6AmKweZME5s50SfDGylaPg"
hashList = set()
downloadedPosts = lambda: []
printVanilla = print printVanilla = print
class jsonFile:
""" Write and read JSON files
Use add(self,toBeAdded) to add to files
Use delete(self,*deletedKeys) to delete keys
"""
FILEDIR = ""
def __init__(self,FILEDIR):
self.FILEDIR = FILEDIR
if not path.exists(self.FILEDIR):
self.__writeToFile({},create=True)
def read(self):
with open(self.FILEDIR, 'r') as f:
return json.load(f)
def add(self,toBeAdded):
"""Takes a dictionary and merges it with json file.
It uses new key's value if a key already exists.
Returns the new content as a dictionary.
"""
data = self.read()
data = {**data, **toBeAdded}
self.__writeToFile(data)
return self.read()
def delete(self,*deleteKeys):
"""Delete given keys from JSON file.
Returns the new content as a dictionary.
"""
data = self.read()
for deleteKey in deleteKeys:
if deleteKey in data:
del data[deleteKey]
found = True
if not found:
return False
self.__writeToFile(data)
def __writeToFile(self,content,create=False):
if not create:
remove(self.FILEDIR)
with open(self.FILEDIR, 'w') as f:
json.dump(content, f, indent=4)
def createLogFile(TITLE): def createLogFile(TITLE):
"""Create a log file with given name """Create a log file with given name
inside a folder time stampt in its name and inside a folder time stampt in its name and
put given arguments inside \"HEADER\" key put given arguments inside \"HEADER\" key
""" """
folderDirectory = GLOBAL.directory / "LOG_FILES" / \ folderDirectory = GLOBAL.directory / "LOG_FILES" / GLOBAL.RUN_TIME
str(time.strftime(
"%d-%m-%Y_%H-%M-%S",time.localtime(GLOBAL.RUN_TIME)
))
logFilename = TITLE.upper()+'.json' logFilename = TITLE.upper()+'.json'
if not path.exists(folderDirectory): if not path.exists(folderDirectory):
makedirs(folderDirectory) makedirs(folderDirectory)
FILE = jsonFile(folderDirectory / Path(logFilename)) FILE = JsonFile(folderDirectory / Path(logFilename))
HEADER = " ".join(sys.argv) HEADER = " ".join(sys.argv)
FILE.add({"HEADER":HEADER}) FILE.add({"HEADER":HEADER})
@@ -96,9 +46,7 @@ def printToFile(*args, noPrint=False,**kwargs):
CONSOLE LOG file in a folder time stampt in the name CONSOLE LOG file in a folder time stampt in the name
""" """
TIME = str(time.strftime("%d-%m-%Y_%H-%M-%S", folderDirectory = GLOBAL.directory / Path("LOG_FILES") / Path(GLOBAL.RUN_TIME)
time.localtime(GLOBAL.RUN_TIME)))
folderDirectory = GLOBAL.directory / "LOG_FILES" / TIME
if not noPrint or \ if not noPrint or \
GLOBAL.arguments.verbose or \ GLOBAL.arguments.verbose or \
@@ -134,12 +82,11 @@ def nameCorrector(string):
spacesRemoved.append(string[b]) spacesRemoved.append(string[b])
string = ''.join(spacesRemoved) string = ''.join(spacesRemoved)
correctedString = []
if len(string.split('\n')) > 1: if len(string.split('\n')) > 1:
string = "".join(string.split('\n')) string = "".join(string.split('\n'))
BAD_CHARS = ['\\','/',':','*','?','"','<','>','|','.','#'] BAD_CHARS = ['\\','/',':','*','?','"','<','>','|','#']
if any(x in string for x in BAD_CHARS): if any(x in string for x in BAD_CHARS):
for char in string: for char in string: