Robust Web Scraping for Product Data Extraction

2026.06.24 1회 데이터

예산

$200~$350 USD

예상 기간

2~3개월

난이도

전문가

기술 스택

PHP Python Web Scraping MySQL Laravel Data Extraction Selenium Playwright Puppeteer API Integration Next.js FastAPI

AI 분석 요약

G2A 웹사이트에서 24만 개 이상의 제품 데이터를 추출하고, 기존 G2A API 데이터와 정합성을 맞춰 Laravel 기반 백엔드에 통합하는 프로젝트입니다. 특히 API에서 부정확하거나 누락된 카테고리 구조, 상세 설명, 알림 상자, 활성화 가이드 등을 정확하게 스크랩하여 고도화된 데이터 정합성과 확장성 있는 웹 스크래핑 시스템 구축 역량이 요구됩니다.

프로젝트 원문 설명

We are looking for an experienced web scraping developer to build a robust and scalable scraping system for extracting product data from G2A and integrating it into our Laravel-based platform.

We already have an integration with the G2A API. However, some critical data is NOT available or NOT accurate via the API, which is why scraping is required.

---

Project Scope:

* Scrape and process data for 240,000+ products
* Integrate scraping output with an existing Laravel backend
* Ensure data consistency and correct mapping with existing API data

---

Important Clarification (API vs Scraping):

* We already receive the following from G2A API:

* Product ID (same as G2A)
* Slug (same as G2A)
* Price, currency
* Images
* Platform, region
* System requirements

* However:

* Categories from the API are NOT accurate and do not match G2A website structure
* Some important content (description sections, alert boxes, activation guides) is NOT available via API

Therefore, scraping is required to:

* Rebuild the correct categories structure exactly as on G2A
* Extract missing product content

All scraped data must be correctly matched with API data using the same product ID and slug.

---

Core Requirements:

1. Categories Mapping (Very Important)

* Extract the full categories structure exactly as on G2A website
* Include main categories, subcategories, and full hierarchy
* Replace API categories completely
* Each product must be mapped accurately to its category path

2. Product Data Extraction

* Basic product information
* Full product description (HTML)
* Yellow alert / warning box under “About this item” (must be extracted بالكامل with formatting)
* Activation guide (especially for software products)

3. Data Alignment

* Must correctly match scraped data with API data (using product ID and slug)
* No duplicates or mismatched records

---

Technical Requirements:

* Python (preferred with FastAPI or similar)
* Experience with headless browsers (Playwright / Selenium / Puppeteer)
* Ability to handle dynamic content and anti-bot protections
* Experience with scalable scraping (parallel workers, batching, queues)
* Strong error handling, retry logic, and logging

---

Infrastructure:

* Scraper will run on a dedicated VPS (8 CPU / 32GB RAM)
* Must support parallel execution
* Must not affect the main Laravel application

---

Milestones:

1. Initial Test (Mandatory)

* Scrape 50 varied products (games, software, gift cards)
* Display results on the live website
* Validate:

* Data accuracy
* Categories mapping (must match G2A exactly)
* Alert box extraction
* Activation guide

2. Scaling Phase

* Gradual scaling after validation
* Full scraping of 240K+ products

---

Payment Terms:

* No upfront payment without results
* Payment after successful delivery of the initial test (50 products)
* Further payments based on validated milestones

---

To Apply:

Please include:

* Examples of similar scraping projects (especially large-scale)
* Technologies used
* Your approach for handling this project (short explanation)

---

We are looking for someone who can deliver real, scalable results — not partial implementations.

Freelancer에서 원본 확인

원본 보기

이런 프로젝트, 직접 수주하고 싶다면?

TTJ 정규반에서 실전 프로젝트 경험과 수익화 방법을 배워보세요

정규반 보기

로그인

추가 정보 입력

회원가입

비밀번호 찾기

Robust Web Scraping for Product Data Extraction

기술 스택

AI 분석 요약

프로젝트 원문 설명

이런 프로젝트, 직접 수주하고 싶다면?

관련 프로젝트

Data Entry Specialist for Excel Projects

Excel-Based Statistics Homework Help

Multilingual Typing & Marathi Transcription

Profit Maximization Linear Programming