INTL
Freelancer
보통
외주
원격 가능
Clean & Extract Product Descriptions from HTML (~5,000 Rows)
예산
$30~$250 USD
예상 기간
1~3일
난이도
보통
기술 스택
Python
Data Processing
Data Extraction
Excel
HTML
BeautifulSoup
Pandas
Regular Expressions
AI 분석 요약
이 프로젝트는 약 5,000개 제품의 HTML 기반 eBay 리스팅 템플릿에서 실제 제품 설명을 추출하고 정제하는 데이터 처리 작업입니다. 특정 HTML 블록(<div class="desc-rd desc-text">) 내의 텍스트를 파싱하여 모든 HTML 태그를 제거하고 문단 구조(새 줄, 빈 줄)를 유지하며, 마지막에 붙은 재고 태그를 삭제한 후 엑셀 파일에 저장해야 합니다. 파이썬, BeautifulSoup, Pandas를 활용한 HTML 파싱 및 엑셀 데이터 처리에 능숙한 개발자가 필요합니다.
프로젝트 원문 설명
I have a spreadsheet (~5,000 product rows) where each row contains a full HTML eBay listing template.
Each row includes:
ID
SKU
Description
Short description
The Description field contains a large block of HTML (decorative listing template), but the actual product description is embedded inside it.
Your job is to extract the correct text.
1. Extract the Correct Description
In every row, the real product description is located inside this HTML block:
<div class="desc-rd desc-text">
Requirements:
Extract only the content inside this div
Ignore all other HTML content in the row (menus, images, headers, shipping info, footer, etc.)
Do not use the rest of the HTML outside this block
2. Clean the Extracted Text
The content inside the div typically contains HTML such as:
<p> tags
<span> tags
Requirements:
Remove all HTML tags
Preserve paragraph structure:
Each <p> should become a new line
<br> should become a new line
Output clean, readable plain text
Example result format:
Paragraph 1
(blank line)
Paragraph 2
3. Remove Trailing Inventory Tags
Each description ends with an internal tag such as:
BTG-6772
BTG-10284
Requirements:
Remove this tag from the final text
Clean up any leftover spacing
4. Final Output
Write the cleaned text into the Description column
Completely clear the Short description column for all rows
Do not modify the SKU column
Deliverable
One cleaned Excel file with:
Cleaned Description column
Empty Short description column
Requirements
Must be completed programmatically (Python preferred)
Experience parsing HTML (e.g., BeautifulSoup or similar)
Strong attention to detail
Each row includes:
ID
SKU
Description
Short description
The Description field contains a large block of HTML (decorative listing template), but the actual product description is embedded inside it.
Your job is to extract the correct text.
1. Extract the Correct Description
In every row, the real product description is located inside this HTML block:
<div class="desc-rd desc-text">
Requirements:
Extract only the content inside this div
Ignore all other HTML content in the row (menus, images, headers, shipping info, footer, etc.)
Do not use the rest of the HTML outside this block
2. Clean the Extracted Text
The content inside the div typically contains HTML such as:
<p> tags
<span> tags
Requirements:
Remove all HTML tags
Preserve paragraph structure:
Each <p> should become a new line
<br> should become a new line
Output clean, readable plain text
Example result format:
Paragraph 1
(blank line)
Paragraph 2
3. Remove Trailing Inventory Tags
Each description ends with an internal tag such as:
BTG-6772
BTG-10284
Requirements:
Remove this tag from the final text
Clean up any leftover spacing
4. Final Output
Write the cleaned text into the Description column
Completely clear the Short description column for all rows
Do not modify the SKU column
Deliverable
One cleaned Excel file with:
Cleaned Description column
Empty Short description column
Requirements
Must be completed programmatically (Python preferred)
Experience parsing HTML (e.g., BeautifulSoup or similar)
Strong attention to detail
Freelancer에서 원본 확인
원본 보기