soskek / arxiv_leaks

Licence: other

Whisper of the arxiv: read comments in tex of papers

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to arxiv leaks

Arxivscraper

A python module to scrape arxiv.org for specific date range and categories

Stars: ✭ 121 (+450%)

Mutual labels: scraper, arxiv

web-scraping-engine

A simple web scraping engine supporting concurrent and anonymous scraping

Stars: ✭ 27 (+22.73%)

Mutual labels: scraper

site-audit-seo

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv, xlsx, Google Drive.

Stars: ✭ 91 (+313.64%)

Mutual labels: scraper

document-dl

Command line program to download documents from web portals.

Stars: ✭ 14 (-36.36%)

Mutual labels: scraper

stock-market-scraper

Scraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)

Stars: ✭ 110 (+400%)

Mutual labels: scraper

aliexscrape

Get Aliexpress product details in JSON

Stars: ✭ 80 (+263.64%)

Mutual labels: scraper

PDAP-Scrapers

Code relating to scraping public police data.

Stars: ✭ 72 (+227.27%)

Mutual labels: scraper

newspaperjs

News extraction and scraping. Article Parsing

Stars: ✭ 59 (+168.18%)

Mutual labels: scraper

crawlkit

A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.

Stars: ✭ 23 (+4.55%)

Mutual labels: scraper

yt-videos-list

Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.

Stars: ✭ 64 (+190.91%)

Mutual labels: scraper

scraper

A web scraper starter project

Stars: ✭ 18 (-18.18%)

Mutual labels: scraper

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-31.82%)

Mutual labels: scraper

tieba-zhuaqu

百度贴吧分布式爬虫，用于贴吧数据挖掘。从贴吧维度和用户维度进行数据分析

Stars: ✭ 56 (+154.55%)

Mutual labels: scraper

InstagramLocationScraper

No description or website provided.

Stars: ✭ 13 (-40.91%)

Mutual labels: scraper

Spydan

A web spider for shodan.io without using the Developer API.

Stars: ✭ 30 (+36.36%)

Mutual labels: scraper

google-this

🔎 A simple yet powerful module to retrieve organic search results and much more from Google.

Stars: ✭ 88 (+300%)

Mutual labels: scraper

youtube-unofficial

Access parts of your account unavailable through normal YouTube API access.

Stars: ✭ 33 (+50%)

Mutual labels: scraper

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

Stars: ✭ 80 (+263.64%)

Mutual labels: scraper

WaGpScraper

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Stars: ✭ 18 (-18.18%)

Mutual labels: scraper

Mimo-Crawler

A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.

Stars: ✭ 22 (+0%)

Mutual labels: scraper

View All Similar Projects ➔

ArxivLeaks

Most of papers on arxiv have latex files, which often contain much comment out. Dig up the valuable comments!

For example, you can extract a secret comment from "Attention Is All You Need", as below:

\paragraph{Symbol Dropout} In the source and target embedding layers, we replace a random subset of the token ids with a sentinel id. For the base model, we use a rate of $symbol\_dropout\_rate=0.1$. Note that this applies only to the auto-regressive use of the target ids - not their use in the cross-entropy loss.

We found "Symbol Dropout", which do not appear in the paper (pdf).

Run

Feed a text/page containing arxiv URLs by -t.

python -u run.py -t deepmind.html -s arxiv_dir

To test, run sh test.sh. This pre-downloads a publication page of deepmind.

You can also read only selected papers by -i, feeding their arxiv ids.

python -u run.py -i 1709.04905 1706.03762 -s arxiv_dir

-s: Downloaded arxiv pages and files are stored into this directory.
-o: Output is printed and saved as a json file with this file path. Default is ./comments.json.

Requirement

requests
lxml

For Writers

You can remove %-comments from your file as follows:

perl -pe 's/(^|[^\\])%.*/\1%/' < old.tex > new.tex

This one line command is given from arxiv.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

soskek / arxiv_leaks

Programming Languages

Labels

Projects that are alternatives of or similar to arxiv leaks

ArxivLeaks

Run

Requirement

For Writers