All Projects → youhusky → Search_Ads_Web_Service

youhusky / Search_Ads_Web_Service

Licence: other
Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Search Ads Web Service

openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-10%)
Mutual labels:  search-engine, spark
Blast
Blast is a full text search and indexing server, written in Go, built on top of Bleve.
Stars: ✭ 934 (+3013.33%)
Mutual labels:  search-engine, grpc
Gnes
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
Stars: ✭ 1,178 (+3826.67%)
Mutual labels:  search-engine, grpc
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+1106.67%)
Mutual labels:  search-engine, spark
Bayard
A full-text search and indexing server written in Rust.
Stars: ✭ 1,555 (+5083.33%)
Mutual labels:  search-engine, grpc
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+66.67%)
Mutual labels:  spark
lolita
基于gin 微服务opentrace集成
Stars: ✭ 13 (-56.67%)
Mutual labels:  grpc
openverse-api
The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
Stars: ✭ 41 (+36.67%)
Mutual labels:  search-engine
grpcoin
API-driven cryptocurrency paper trading game. Write a bot and play!
Stars: ✭ 53 (+76.67%)
Mutual labels:  grpc
spark-util
low-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-46.67%)
Mutual labels:  spark
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+216.67%)
Mutual labels:  spark
httpbook
Quickly and easily send REST, Soap, GraphQL, GRPC, MQTT and WebSocket requests directly within Visual Studio Code
Stars: ✭ 18 (-40%)
Mutual labels:  grpc
spark-gradle-template
Apache Spark in your IDE with gradle
Stars: ✭ 39 (+30%)
Mutual labels:  spark
pulseha
PulseHA is a active-passive high availability cluster daemon that uses GRPC and is written in GO.
Stars: ✭ 15 (-50%)
Mutual labels:  grpc
yuzhouwan
Code Library for My Blog
Stars: ✭ 39 (+30%)
Mutual labels:  spark
page-content-tester
Paco is a Java based framework for non-blocking and highly parallelized Dom testing.
Stars: ✭ 13 (-56.67%)
Mutual labels:  jsoup
dalal-street-server
Server for Pragyan's Dalal Street
Stars: ✭ 65 (+116.67%)
Mutual labels:  grpc
memcached
Memcached Operator for Kubernetes
Stars: ✭ 18 (-40%)
Mutual labels:  memcached
mnemosyne
Session management service with RPC API based on protobuf.
Stars: ✭ 15 (-50%)
Mutual labels:  grpc
tron-rpc
波场钱包节点对接
Stars: ✭ 58 (+93.33%)
Mutual labels:  grpc

Search Ads Web Service

Online search advertisement platform & Realtime Campaign Monitoring

Project Description

  • Designed and developed web crawler which crawled 500000 product data from Amazon (Java, JSoup, Proxy)
  • Developed Search Ads workflow support: Query understanding, Ads selection from inverted index (with MemCached), Ads ranking, Ads filter, Ads pricing, Ads allocation
  • Designed and implemented feature engineering pipeline which generate features for query understanding and click prediction with Spark MapReduce

Crawler

Used Jsoup to crawler information on Amazon.

  • Finished
    • extract price, product detail url, product image url, category from web page
    • convert each product to Ads
    • store Ads to file, each ads in JSON format.
    • support paging
    • log all exception

Avoid Bot Detection

  • Proxy IP and rotating Brower
  • Distribute Crawler

Online Search Ads Platform

Search advertising is placing online advertisments on front end pages that show results to users from their search engine queries. This search ads server takes thousands of product data as ads candidates and selects, filters, ranks, allocates and prices the ads when search query comes in. The selection and ranking of search ads is based on the quality of ads and the bid price offered by advertisers.

alt text

Query Understanding

  • clean the text by Lucean
  • train word2vector model using ads keywords corpus and use synonyms to rewrite query

Query Relevancy Matching

Ads candiate will first be evaluated and filtered by relevance score. Relevance score is to measure how relevant query is to key words in ads. Here the relevance score = number of word match query / total number of words in key words. For quick retreival of ads infomation, the inverted index of ads keywords were built and store in cache.

The data layer for supporting online system:

  • Forward index for Ad detail information (MySQL)
  • Inverted index for Ad keywords (Memcached)

P-Click Prediction

The probability of user click (p-click) plays an important role in ads ranking.

Use spark ML process simulated user click log data and generate prediction model.

  • Click log

log: Device IP, Device id,Session id,Query,AdId,CampaignId,Ad_category_Query_category(0/1),clicked(0/1)

  • Feature space

pClick Features extracted from search log and stored in key-value store alt text

  • Model

Logistic Regression

Gradient Boosting Tree

Online Ads Ranking and Pricing

Quality Score = 0.25 * Relevance Score + 0.75 * pClick

Rank Score = Quality Score * Bid

Price(Cost Per Click) = next rank score / current quality score + 0.01

System

When receiving search query, the system matchs rewrote query with keywords of ads using inverted index to get relevance score, and predict the probability of click by the regression model generated from 50GB historical click data. The quality of ads will be determined by both relevance score and the probability of click. The ads engine calculates the quality score and combines it with ads bid price for final ranking and pricing.

alt text

Real Time Campaign Monitor

The real time campaign monitor system is built for collecting the ads relevant events generated by online ads server and visulizing the trending of campaigns.

Join Events Streams

he real time campaign monitoring system is a streaming pipeline which collects and processes the ads events generated by online search ads engine. The chance events, impression events and click events of ads are published to message queue and processed to store in database in streaming way. The front end dashboard visualizes the budget status and dynamic impression, click and pricing trending of campaigns.

Streaming Pipeline

alt text

Dashboard Visualization

alt text

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].