97 lines
2.7 KiB
Markdown
97 lines
2.7 KiB
Markdown
## Twitter scraper
|
|
Scrape user's tweets :D
|
|
|
|
## Usage:
|
|
|
|
### Unauthenticated
|
|
Example:
|
|
```
|
|
scraper = TweetScraper()
|
|
tweets = scraper.get_tweets_anonymous("<user_id>")
|
|
```
|
|
|
|
This will only allow use of the anonymous user tweets method, other methods will fail.
|
|
|
|
The anonymous method returns a list of tweets from the user as viewed from a logged-out session. It will only return 100 tweets (not necessarily the most recent)
|
|
|
|
|
|
### Authenticated
|
|
Example:
|
|
```
|
|
dotenv.load_dotenv()
|
|
|
|
auth_token = os.environ["AUTH_TOKEN"]
|
|
csrf_token = os.environ["CSRF_TOKEN"]
|
|
|
|
scraper = TweetsScraper(auth_token, csrf_token)
|
|
|
|
user_id = scraper.get_id_from_handle("pobnellion")
|
|
user_tweets = scraper.get_tweets(user_id, 100)
|
|
```
|
|
|
|
Allows you to get tweets as a logged in user. Twitter only makes the 2000 ish most recent tweets available, but that should be more than enough.
|
|
|
|
You can either directly pass in the user id to `get_tweets()`, or use `get_id_from_screen_name()` to get the id if you don't have it.
|
|
|
|
To use dotenv, include a `.env` file in the directory with the following contents (no quotes around the values):
|
|
```
|
|
AUTH_TOKEN=<auth token>
|
|
CSRF_TOKEN=<csrf token>
|
|
```
|
|
|
|
|
|
You can find your auth and csrf tokens in twitter's cookies (F12 in your browser > storage tab > cookies)
|
|
The auth token cookie is called `auth_token` and the csrf token is called `ct0`
|
|
|
|
#### Include replies
|
|
```
|
|
user_id = scraper.get_id_from_handle("@pobnellion")
|
|
user_tweets = scraper.get_tweets_and_replies(user_id, 100)
|
|
```
|
|
|
|
This is equivalent to viewing the 'replies' tab on twitter, replies show up as Conversation objects which contain a list of tweets.
|
|
The last tweet in the conversation will always be by the currently viewed user, even if there are more replies in the chain.
|
|
|
|
|
|
### Tweet object
|
|
Contains the text of the tweet, along with the timestamp and some stats (like count, repost count, views, etc)
|
|
|
|
#### Fields:
|
|
- id : tweet id
|
|
- views : view count
|
|
- text : tweet content
|
|
- likes : like count
|
|
- replies : reply count
|
|
- retweets : retweet count
|
|
- quotes : quite tweet count
|
|
- date : post date
|
|
- is_retweet: tweet is a retweet
|
|
- is_quote: tweet is a quote tweet
|
|
- user: user who sent tweet (this is useful in conversations)
|
|
|
|
Printing a tweet object results in an overview:
|
|
|
|
`L:52 RT:2 Q:1 R:3 V:1032 2025-01-20T01:53:57+00:00 Example tweet text`
|
|
|
|
### Conversation object
|
|
|
|
Container for a list of tweets as shown when viewing the replies tab. Does not have any other information
|
|
|
|
#### Fields
|
|
- items : list of tweets in the conversation
|
|
|
|
### User object
|
|
|
|
Twitter user
|
|
|
|
#### Fields
|
|
|
|
- id : user id
|
|
- handle : user handle (without @)
|
|
- display_name :
|
|
- description :
|
|
- join_date :
|
|
- location :
|
|
- tweets_count :
|
|
- blue_verified :
|
|
- follower_count : |