Reddit scraping.
Below are several use cases showing you how to implement Reddit scrapping with Python Reddit API Wrapper(PRAW), a special Python library designed for that process. PRAW is a Python wrapper for the Reddit API, which lets you to scrape data from subreddits, create a bot and much more things.
Python Knowledge Base: Make coding great again.
- Updated:
2024-11-20 by Andrey BRATUS, Senior Data Analyst.
Scrape Reddit post and comments:
Scrape fresh Subreddit posts into a textfile:
Submit new post to Subreddit:
Bot replying new posts in Subreddit containing certain word in title and body:
IMPORTANT: don't forget to install PRAW by pip install praw or conda install praw depending on your IDE and environment.
At the very start first you need to get Reddit account and also create an application-script on https://www.reddit.com/settings/privacy.
After that you are ready to go with Python scripts below, don't forget to fill necessary data.
import praw
reddit = praw.Reddit(user_agent="your agent name", client_id="your client ID",
client_secret="your app secret")
url = "https://www.reddit.com/r/dadjokes/comments/sm6ikx/a_young_woman_was_standing_outside_her_car/"
post = reddit.submission(url=url)
print(post.title)
print(post.selftext)
print(len(post.comments))
for comment in post.comments:
print(comment.body)
OUT: your comments for selected post.
import praw
from datetime import datetime, timedelta
reddit = praw.Reddit(user_agent="your agent name", client_id="your client ID",
client_secret="your app secret")
subreddit = reddit.subreddit("HistoryMemes")
posts24h = []
with open('postoutput.txt', 'w') as file:
for post in subreddit.new():
current_time = datetime.utcnow()
post_time = datetime.utcfromtimestamp(post.created)
delta_time = current_time - post_time
# print(delta_time)
if delta_time <= timedelta(hours=24):
posts24h.append((post.title, post.selftext, post_time))
file.write(f'{post.title}\n{post.selftext}\n\n')
print('Subreddit posts for last 24H are saved to text file !!!')
OUT: Subreddit posts for last 24H are saved to text file !!!
import praw
reddit = praw.Reddit(user_agent=True, client_id="YOUR REDDIT APP ID",
client_secret="YOUR REDDIT APP SECRET", username='YOUR REDDIT USERNAME', password='YOUR REDDIT ACCOUNT PASSWORD')
subreddit = reddit.subreddit("WeirdJokes")
subreddit.validate_on_submit = True
title = 'It should be allowed )'
content = """
People who make sound while eating food must be slapped without asking why.
"""
subreddit.submit(title=title, selftext=content)
print('New post was submitted to selected Subreddit !!!')
OUT: New post was submitted to selected Subreddit !!!
import praw
from datetime import datetime, timedelta
reddit = praw.Reddit(user_agent=True, client_id="YOUR REDDIT APP ID",
client_secret="YOUR REDDIT APP SECRET", username='YOUR REDDIT USERNAME', password='YOUR REDDIT ACCOUNT PASSWORD')
subreddit = reddit.subreddit("Joker")
for post in subreddit.new():
current_time = datetime.utcnow()
post_time = datetime.utcfromtimestamp(post.created)
delta_time = current_time - post_time
if delta_time <= timedelta(hours=48):
if "joke" in post.title.lower():
# print(post.title)
# post.reply('Here's another joke !)
for comment in post.comments:
if "joke" in comment.body.lower():
comment.reply("Here's another joke !")
print('Replies are provided when needed !!!')
OUT: Replies are provided when needed !!!