Scrape, Analyze, Conquer: Python for Reddit Insights!

Reddit scraping.


Below are several use cases showing you how to implement Reddit scrapping with Python Reddit API Wrapper(PRAW), a special Python library designed for that process. PRAW is a Python wrapper for the Reddit API, which lets you to scrape data from subreddits, create a bot and much more things.

Reddit scrapping with Python.
Reddit scrapping meme.

Python Knowledge Base: Make coding great again.
- Updated: 2024-07-26 by Andrey BRATUS, Senior Data Analyst.




    IMPORTANT: don't forget to install PRAW by pip install praw or conda install praw depending on your IDE and environment.
    At the very start first you need to get Reddit account and also create an application-script on https://www.reddit.com/settings/privacy. After that you are ready to go with Python scripts below, don't forget to fill necessary data.


  1. Scrape Reddit post and comments:


  2. 
    import praw
    
    reddit = praw.Reddit(user_agent="your agent name", client_id="your client ID", 
    client_secret="your app secret")
    
    url = "https://www.reddit.com/r/dadjokes/comments/sm6ikx/a_young_woman_was_standing_outside_her_car/"
    
    post = reddit.submission(url=url)
    print(post.title)
    print(post.selftext)
    
    print(len(post.comments))
    for comment in post.comments:
      print(comment.body)
    

    OUT: your comments for selected post.



  3. Scrape fresh Subreddit posts into a textfile:


  4. 
    import praw
    from datetime import datetime, timedelta
    
    reddit = praw.Reddit(user_agent="your agent name", client_id="your client ID", 
    client_secret="your app secret")
    
    subreddit = reddit.subreddit("HistoryMemes")
    
    posts24h = []
    
    with open('postoutput.txt', 'w') as file:
      for post in subreddit.new():
        current_time = datetime.utcnow()
        post_time = datetime.utcfromtimestamp(post.created)
    
        delta_time = current_time - post_time
        # print(delta_time)
        if delta_time <= timedelta(hours=24):
          posts24h.append((post.title, post.selftext, post_time))
          file.write(f'{post.title}\n{post.selftext}\n\n')
    
    print('Subreddit posts for last 24H are saved to text file !!!')
    

    OUT: Subreddit posts for last 24H are saved to text file !!!



  5. Submit new post to Subreddit:


  6. 
    import praw
    
    
    reddit = praw.Reddit(user_agent=True, client_id="YOUR REDDIT APP ID", 
      client_secret="YOUR REDDIT APP SECRET", username='YOUR REDDIT USERNAME', password='YOUR REDDIT ACCOUNT PASSWORD')
    
    subreddit = reddit.subreddit("WeirdJokes")
    subreddit.validate_on_submit = True
    
    title = 'It should be allowed )'
    content = """
    People who make sound while eating food must be slapped without asking why.
    """
    
    subreddit.submit(title=title, selftext=content)
    
    print('New post was submitted to selected Subreddit !!!')
    

    OUT: New post was submitted to selected Subreddit !!!


  7. Bot replying new posts in Subreddit containing certain word in title and body:


  8. 
    import praw
    from datetime import datetime, timedelta
    
    reddit = praw.Reddit(user_agent=True, client_id="YOUR REDDIT APP ID", 
      client_secret="YOUR REDDIT APP SECRET", username='YOUR REDDIT USERNAME', password='YOUR REDDIT ACCOUNT PASSWORD')
    
    subreddit = reddit.subreddit("Joker")
    
    
    for post in subreddit.new():
      current_time = datetime.utcnow()
      post_time = datetime.utcfromtimestamp(post.created)
      delta_time = current_time - post_time
      if delta_time <= timedelta(hours=48):
        if "joke" in post.title.lower():
          # print(post.title)
          # post.reply('Here's another joke !)
          for comment in post.comments:
            if "joke" in comment.body.lower():
              comment.reply("Here's another joke !")
    
    print('Replies are provided when needed !!!')          
    

    OUT: Replies are provided when needed !!!





See also related topics: