Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

sokomishalov / Skraper

Licence: apache-2.0

Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, IFunny, VK, Pikabu)

Programming Languages

java

68154 projects - #9 most used programming language

kotlin

9241 projects

Labels

youtube twitter telegram facebook instagram scraper reddit twitch vk pinterest tumblr jsoup

Projects that are alternatives of or similar to Skraper

Socialcounters

jQuery/PHP - Collection of Social Media APIs that display number of your social media fans. Facebook Likes, Twitter Followers, Instagram Followers, YouTube Subscribers, etc..

Stars: ✭ 104 (+44.44%)

Mutual labels: pinterest, tumblr, youtube, twitter, facebook, instagram, vk

Socialreaper

Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Stars: ✭ 338 (+369.44%)

Mutual labels: pinterest, tumblr, youtube, twitter, facebook, reddit

Reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Stars: ✭ 240 (+233.33%)

Mutual labels: pinterest, tumblr, youtube, twitter, facebook, reddit

Postwill

Posting to the most popular social media from Ruby

Stars: ✭ 181 (+151.39%)

Mutual labels: pinterest, tumblr, twitter, facebook, instagram

Media Scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

Stars: ✭ 206 (+186.11%)

Mutual labels: tumblr, scraper, twitter, instagram, reddit

Spam Bot 3000

Social media research and promotion, semi-autonomous CLI bot

Stars: ✭ 79 (+9.72%)

Mutual labels: scraper, twitter, facebook, instagram, reddit

Sharer.js

🔛 🔖 Create your own social share buttons. No jquery.

Stars: ✭ 1,624 (+2155.56%)

Mutual labels: pinterest, telegram, twitter, facebook, reddit

Keyring

Keyring is an authentication framework for WordPress. It comes with definitions for a variety of HTTP Basic, OAuth1 and OAuth2 web services. Use it as a common foundation for working with other web services from within WordPress code.

Stars: ✭ 52 (-27.78%)

Mutual labels: pinterest, youtube, twitter, facebook, instagram

Gatsby Remark Embedder

Gatsby Remark plugin to embed well known services by their URL.

Stars: ✭ 245 (+240.28%)

Mutual labels: twitch, pinterest, youtube, twitter, instagram

Network Avatar Picker

A npm module that returns user's social network avatar. Supported providers: facebook, instagram, twitter, tumblr, vimeo, github, youtube and gmail

Stars: ✭ 74 (+2.78%)

Mutual labels: tumblr, youtube, twitter, facebook, instagram

Ripme

Downloads albums in bulk

Stars: ✭ 2,748 (+3716.67%)

Mutual labels: tumblr, twitter, instagram, reddit

Feeds

Importiert Daten aus API-Quellen wie Facebook, Instagram, Twitter, YouTube, Vimeo oder RSS (ehemals YFeed)

Stars: ✭ 34 (-52.78%)

Mutual labels: youtube, twitter, facebook, instagram

Rssbox

📰 I consume the world via RSS feeds, and this is my attempt to keep it that way.

Stars: ✭ 492 (+583.33%)

Mutual labels: twitch, youtube, twitter, instagram

Streamwall

Display a mosaic of livestreams. Built for streaming.

Stars: ✭ 160 (+122.22%)

Mutual labels: twitch, youtube, facebook, instagram

Nemiro.oauth.dll

Nemiro.OAuth is a class library for authorization via OAuth protocol in .NET Framework

Stars: ✭ 45 (-37.5%)

Mutual labels: twitter, facebook, instagram, vk

alternative-front-ends

Overview of alternative open source front-ends for popular internet platforms (e.g. YouTube, Twitter, etc.)

Stars: ✭ 1,664 (+2211.11%)

Mutual labels: instagram, youtube, twitter, reddit

Embera

A Oembed consumer library, that gives you information about urls. It helps you replace urls to youtube or vimeo for example, with their html embed code. It has advanced features like offline support, responsive embeds and caching support.

Stars: ✭ 268 (+272.22%)

Mutual labels: youtube, twitter, facebook, instagram

Social Media Profiles Regexs

📇 Extract social media profiles and more with regular expressions

Stars: ✭ 324 (+350%)

Mutual labels: telegram, twitter, facebook, instagram

sharon

A lightweight and modular social sharing library

Stars: ✭ 16 (-77.78%)

Mutual labels: facebook, reddit, tumblr, pinterest

Socialmanagertools Gui

🤖 👻 Desktop application for Instagram Bot, Twitter Bot and Facebook Bot

Stars: ✭ 293 (+306.94%)

Mutual labels: scraper, twitter, facebook, instagram

View All Similar Projects ➔

Skraper

~~Here should be some fancy logo~~

Overview

Kotlin/Java library and cli tool which allows scraping and downloading posts, attachments, other meta from more than 10 sources without any authorization or full page rendering. Based on jsoup and coroutines.

Repository contains:

Cli tool
Kotlin library
Telegram bot

Current list of implemented sources:

Bugs

Unfortunately, each web-site is subject to change without any notice, so the tool may work incorrectly because of that. If that happens, please let me know via an issue.

Cli tool

Cli tool allows to:

download media with flag --media-only from almost all presented sources.
scrape posts meta information

Requirements:

Java: 1.8 +
Maven (optional)

Build tool

./mvnw clean package -DskipTests=true

Usage:

./skraper --help

usage: [-h] PROVIDER PATH [-n LIMIT] [-t TYPE] [-o OUTPUT] [-m]
       [--parallel-downloads PARALLEL_DOWNLOADS]

optional arguments:
  -h, --help                                show this help message and exit

  -n LIMIT, --limit LIMIT                   posts limit (50 by default)

  -t TYPE, --type TYPE                      output type, options: [log, csv, json, xml, yaml]

  -o OUTPUT, --output OUTPUT                output path

  -m, --media-only                          scrape media only

  --parallel-downloads PARALLEL_DOWNLOADS   amount of parallel downloads for media items if
                                            enabled flag --media-only (4 by default)


positional arguments:
  PROVIDER                                  skraper provider, options: [facebook, instagram,
                                            twitter, youtube, twitch, reddit, ninegag, pinterest,
                                            flickr, tumblr, ifunny, vk, pikabu]

  PATH                                      path to user/community/channel/topic/trend
usage: [-h] PROVIDER PATH [-n LIMIT] [-t TYPE] [-o OUTPUT] [-m]
       [--parallel-downloads PARALLEL_DOWNLOADS]

optional arguments:
  -h, --help                                show this help message and exit

  -n LIMIT, --limit LIMIT                   posts limit (50 by default)

  -t TYPE, --type TYPE                      output type, options: [log, csv, json, xml, yaml]

  -o OUTPUT, --output OUTPUT                output path

  -m, --media-only                          scrape media only

  --parallel-downloads PARALLEL_DOWNLOADS   amount of parallel downloads for media items if
                                            enabled flag --media-only (4 by default)


positional arguments:
  PROVIDER                                  skraper provider, options: [facebook, instagram,
                                            twitter, youtube, twitch, reddit, ninegag, pinterest,
                                            flickr, tumblr, ifunny, vk, pikabu]

  PATH                                      path to user/community/channel/topic/trend

Examples:

./skraper ninegag /hot 
./skraper reddit /r/memes -n 5 -t csv -o ./reddit/posts
./skraper youtube /user/JetBrainsTV/videos --media-only -n 2

Kotlin Library

Distribution

Maven:

<dependency>
    <groupId>ru.sokomishalov.skraper</groupId>
    <artifactId>skrapers</artifactId>
    <version>0.7.0</version>
</dependency>

Gradle kotlin dsl:

implementation("ru.sokomishalov.skraper:skrapers:0.7.0")

Usage

Demo

You may take a look at library usage in this android sample app or telegram bot

Instantiate specific scraper

As mentioned before, the provider implementation list is:

After that usage as simple as is:

val skraper = InstagramSkraper(client = OkHttpSkraperClient())

Important moment: it is highly recommended to not use DefaultBlockingSkraperClient . There are some more efficient, non-blocking and resource-friendly implementations for SkraperClient. To use them you just have to put required dependencies in the classpath.

Current http-client implementation list:

DefaultBlockingClient: simple java.net.* blocking api implementation
OkHttpSkraperClient: okhttp3 implementation
SpringReactiveSkraperClient: spring-webflux client implementation
KtorSkraperClient: ktor-client-jvm implementation

Available methods

Each scraper is a class which implements Skraper interface:

interface Skraper {
    val baseUrl: URLString
    val client: SkraperClient get() = DefaultBlockingSkraperClient
    suspend fun getProviderInfo(): ProviderInfo?
    suspend fun getPageInfo(path: String): PageInfo?
    suspend fun getPosts(path: String, limit: Int = DEFAULT_POSTS_LIMIT): List<Post>
    suspend fun resolve(media: Media): Media
}

Also, there are some provider-specific kotlin extensions for implementations. You can find them out at the provider implementation package.

Usage from plain Java

Kotlin coroutines is a CPS implementation (aka callbacks). Here is a quite good java side example of how to call kotlin suspend functions from plain Java.

Scrape user/community/channel/topic/trend posts

To scrape the latest posts for specific user, channel or trend use skraper like that:

suspen fun main() {
    val skraper = FacebookSkraper()
    val posts = skraper.getUserPosts(username = "memes", limit = 2) // extension for getPosts()
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(posts))
}

Received data structure is similar to each other provider's. Output data example:

[
  {
    "id": "5029851093699104",
    "text": "gotta love em!",
    "publishedAt": 1580744400000,
    "rating": 79,
    "commentsCount": 3,
    "media": [
      {
        "url": "https://facebook.com/memes/posts/5029851093699104?__xts__%5B0%5D=68.ARA2yRI2YnlXQRKX7Pdphh8ztgvnP11aYE_bZFPNmqLpJZLhwJaG24gDPUTiKDLv-J_E09u2vLjCXalpmEuGSmVR0BkVtcng_i6QV8x5e-aZUv0Mkn1wwKLlhp5NNH6zQWKlqDqRjZrwvcKeUi0unzzulRCHRvDIrbz2leM6PLescFySwMYbMmKFc7ctqaC_F7nJ09Ya0lz9Pqaq_Rh6UsNKom6fqdgHAuoHV894a3QRuyY0BC6fQuXZLOLbRIfEVK3cF9Z5UQiXUYruCySF-WpQEV0k72x6DIjT6B3iovYFnBGHaji9VAx2PByZ-MDs33D1Hz96Mk-O1Pj7zBwO6FvXGhkUJgepiwUOVd0q-pV83rS5EhjtPFDylNoNO2xkDUSIi483p49vumVPWtmab8LX1V6w2anf55kh6pedCXcH3D8rBjz8DaTBnv995u9kk5im-1-HdAGQHyKrCZpaA0QyC-I4oGsCoIJGck3RO8u_SoHcfe2tKjTgPe6j9p1D&__tn__=-R",
        "aspectRatio": 0.864,
        "duration": 10860.000000000
      }
    ]
  },
  {
    "id": "4990218157662398",
    "text": "Interesting",
    "publishedAt": 1580742000000,
    "rating": 3092,
    "commentsCount": 514,
    "media": [
      {
        "url": "https://scontent.fhrk1-1.fna.fbcdn.net/v/t1.0-0/p526x296/52333452_10157743612509879_529328953723191296_n.png?_nc_cat=1&_nc_ohc=oNMb8_mCbD8AX-w9zeY&_nc_ht=scontent.fhrk1-1.fna&oh=ca8a719518ecfb1a24f871282b860124&oe=5E910D0C",
        "aspectRatio": 0.8960573476702509
      }
    ]
  }
]

You can see the full model structure for posts and others here

Scrape user/community/channel/topic/trend info

It is possible to scrape user/channel/trend info for some purposes:

suspend fun main() {
    val skraper = TwitterSkraper()
    val pageInfo = skraper.getUserInfo(username = "memes") // extension for `getPageInfo()`
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(pageInfo))
}

Output:

{
  "nick": "memes",
  "name": "Memes.com",
  "description": "http://memes.com is your number one website for the funniest content on the web. You will find funny pictures, funny memes and much more.",
  "postsCount": 10848,
  "followersCount": 154718,
  "avatar": {
    "url": "https://pbs.twimg.com/profile_images/824808708332941313/mJ4xM6PH_normal.jpg"
  },
  "cover": {
    "url": "https://abs.twimg.com/images/themes/theme1/bg.png"
  }
}

Resolve provider relative url

Sometimes you need to know direct media link:

suspend fun main() {
    val skraper = InstagramSkraper()
    val info = skraper.resolve(Video(url = "https://www.instagram.com/p/B-flad2F5o7/"))
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(info))
}

Output:

{
  "url": "https://scontent-amt2-1.cdninstagram.com/v/t50.2886-16/91508191_213297693225472_2759719910220905597_n.mp4?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=104&_nc_ohc=27bC52qar_oAX-7J2Zh&oe=5EC0BC52&oh=0aafee2860c540452b76e7b8e336147d",
  "aspectRatio": 0.8010012515644556,
  "thumbnail": {
    "url": "https://scontent-amt2-1.cdninstagram.com/v/t51.2885-15/e35/91435498_533808773845524_5302421141680378393_n.jpg?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=100&_nc_ohc=8gPAcByc6YAAX_kDBWm&oh=5edf6b9d90d606f9c0e055b7dbcbfa45&oe=5EC0DDE8",
    "aspectRatio": 0.8010012515644556
  }
}

Download media

There is "static" method which allows to download any media from all known implemented sources:

suspend fun main() {
    val tmpDir = Files.createTempDirectory("skraper").toFile()

    val testVideo = Skrapers.download(
        media = Video("https://youtu.be/fjUO7xaUHJQ"),
        destDir = tmpDir,
        filename = "Gandalf"
    )

    val testImage = Skrapers.download(
        media = Image("https://www.pinterest.ru/pin/89509111320495523/"),
        destDir = tmpDir,
        filename = "Do_no_harm"
    )

    println(testVideo)
    println(testImage)
}

Output:

/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Gandalf.mp4
/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Do_no_harm.jpg

Scrape provider logo

It is also possible to scrape provider info for some purposes:

suspend fun main() {
    val skraper = InstagramSkraper()
    val info = skraper.getProviderInfo()
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(info))
}

Output:

{
  "name": "Instagram",
  "logo": {
    "url": "https://instagram.com/favicon.ico"
  }
}

Telegram bot

To use the bot follow the link.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 72

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗