INTL
Freelancer
전문가
외주
원격 가능
Robust Web Scraping for Product Data Extraction
예산
$200~$350 USD
예상 기간
2~3개월
난이도
전문가
기술 스택
PHP
Python
Web Scraping
MySQL
Laravel
Data Extraction
Selenium
Playwright
Puppeteer
API Integration
Next.js
FastAPI
AI 분석 요약
G2A 웹사이트에서 24만 개 이상의 제품 데이터를 추출하고, 기존 G2A API 데이터와 정합성을 맞춰 Laravel 기반 백엔드에 통합하는 프로젝트입니다. 특히 API에서 부정확하거나 누락된 카테고리 구조, 상세 설명, 알림 상자, 활성화 가이드 등을 정확하게 스크랩하여 고도화된 데이터 정합성과 확장성 있는 웹 스크래핑 시스템 구축 역량이 요구됩니다.
프로젝트 원문 설명
We are looking for an experienced web scraping developer to build a robust and scalable scraping system for extracting product data from G2A and integrating it into our Laravel-based platform.
We already have an integration with the G2A API. However, some critical data is NOT available or NOT accurate via the API, which is why scraping is required.
---
Project Scope:
* Scrape and process data for 240,000+ products
* Integrate scraping output with an existing Laravel backend
* Ensure data consistency and correct mapping with existing API data
---
Important Clarification (API vs Scraping):
* We already receive the following from G2A API:
* Product ID (same as G2A)
* Slug (same as G2A)
* Price, currency
* Images
* Platform, region
* System requirements
* However:
* Categories from the API are NOT accurate and do not match G2A website structure
* Some important content (description sections, alert boxes, activation guides) is NOT available via API
Therefore, scraping is required to:
* Rebuild the correct categories structure exactly as on G2A
* Extract missing product content
All scraped data must be correctly matched with API data using the same product ID and slug.
---
Core Requirements:
1. Categories Mapping (Very Important)
* Extract the full categories structure exactly as on G2A website
* Include main categories, subcategories, and full hierarchy
* Replace API categories completely
* Each product must be mapped accurately to its category path
2. Product Data Extraction
* Basic product information
* Full product description (HTML)
* Yellow alert / warning box under “About this item” (must be extracted بالكامل with formatting)
* Activation guide (especially for software products)
3. Data Alignment
* Must correctly match scraped data with API data (using product ID and slug)
* No duplicates or mismatched records
---
Technical Requirements:
* Python (preferred with FastAPI or similar)
* Experience with headless browsers (Playwright / Selenium / Puppeteer)
* Ability to handle dynamic content and anti-bot protections
* Experience with scalable scraping (parallel workers, batching, queues)
* Strong error handling, retry logic, and logging
---
Infrastructure:
* Scraper will run on a dedicated VPS (8 CPU / 32GB RAM)
* Must support parallel execution
* Must not affect the main Laravel application
---
Milestones:
1. Initial Test (Mandatory)
* Scrape 50 varied products (games, software, gift cards)
* Display results on the live website
* Validate:
* Data accuracy
* Categories mapping (must match G2A exactly)
* Alert box extraction
* Activation guide
2. Scaling Phase
* Gradual scaling after validation
* Full scraping of 240K+ products
---
Payment Terms:
* No upfront payment without results
* Payment after successful delivery of the initial test (50 products)
* Further payments based on validated milestones
---
To Apply:
Please include:
* Examples of similar scraping projects (especially large-scale)
* Technologies used
* Your approach for handling this project (short explanation)
---
We are looking for someone who can deliver real, scalable results — not partial implementations.
We already have an integration with the G2A API. However, some critical data is NOT available or NOT accurate via the API, which is why scraping is required.
---
Project Scope:
* Scrape and process data for 240,000+ products
* Integrate scraping output with an existing Laravel backend
* Ensure data consistency and correct mapping with existing API data
---
Important Clarification (API vs Scraping):
* We already receive the following from G2A API:
* Product ID (same as G2A)
* Slug (same as G2A)
* Price, currency
* Images
* Platform, region
* System requirements
* However:
* Categories from the API are NOT accurate and do not match G2A website structure
* Some important content (description sections, alert boxes, activation guides) is NOT available via API
Therefore, scraping is required to:
* Rebuild the correct categories structure exactly as on G2A
* Extract missing product content
All scraped data must be correctly matched with API data using the same product ID and slug.
---
Core Requirements:
1. Categories Mapping (Very Important)
* Extract the full categories structure exactly as on G2A website
* Include main categories, subcategories, and full hierarchy
* Replace API categories completely
* Each product must be mapped accurately to its category path
2. Product Data Extraction
* Basic product information
* Full product description (HTML)
* Yellow alert / warning box under “About this item” (must be extracted بالكامل with formatting)
* Activation guide (especially for software products)
3. Data Alignment
* Must correctly match scraped data with API data (using product ID and slug)
* No duplicates or mismatched records
---
Technical Requirements:
* Python (preferred with FastAPI or similar)
* Experience with headless browsers (Playwright / Selenium / Puppeteer)
* Ability to handle dynamic content and anti-bot protections
* Experience with scalable scraping (parallel workers, batching, queues)
* Strong error handling, retry logic, and logging
---
Infrastructure:
* Scraper will run on a dedicated VPS (8 CPU / 32GB RAM)
* Must support parallel execution
* Must not affect the main Laravel application
---
Milestones:
1. Initial Test (Mandatory)
* Scrape 50 varied products (games, software, gift cards)
* Display results on the live website
* Validate:
* Data accuracy
* Categories mapping (must match G2A exactly)
* Alert box extraction
* Activation guide
2. Scaling Phase
* Gradual scaling after validation
* Full scraping of 240K+ products
---
Payment Terms:
* No upfront payment without results
* Payment after successful delivery of the initial test (50 products)
* Further payments based on validated milestones
---
To Apply:
Please include:
* Examples of similar scraping projects (especially large-scale)
* Technologies used
* Your approach for handling this project (short explanation)
---
We are looking for someone who can deliver real, scalable results — not partial implementations.
Freelancer에서 원본 확인
원본 보기