Estratégia de Machine Learning
Sport Tech Club - Inteligência Artificial e Machine Learning
Visão Geral
Este documento define a estratégia de Machine Learning do Sport Tech Club, cobrindo casos de uso, arquitetura de MLOps, modelos e métricas.
1. Casos de Uso de ML
1.1 Visão Geral dos Casos de Uso
yaml
casos_de_uso:
recomendacao:
nome: Sistema de Recomendação de Quadras
prioridade: P0
impacto: Alto
complexidade: Média
previsao_demanda:
nome: Previsão de Demanda por Horário
prioridade: P0
impacto: Alto
complexidade: Alta
precificacao_dinamica:
nome: Precificação Dinâmica
prioridade: P1
impacto: Alto
complexidade: Alta
matchmaking:
nome: Matchmaking de Jogadores
prioridade: P1
impacto: Médio
complexidade: Média
deteccao_fraude:
nome: Detecção de Fraude em Pagamentos
prioridade: P2
impacto: Médio
complexidade: Alta
churn_prediction:
nome: Predição de Churn
prioridade: P2
impacto: Médio
complexidade: Média1.2 Roadmap de Implementação
Q1 2024: Recomendação de Quadras + Previsão de Demanda
Q2 2024: Precificação Dinâmica + Matchmaking
Q3 2024: Detecção de Fraude
Q4 2024: Predição de Churn + Otimizações2. Sistema de Recomendação
2.1 Arquitetura
┌─────────────────────────────────────────────────────────────────┐
│ Sistema de Recomendação │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Collaborative│ │ Content- │ │ Hybrid │ │
│ │ Filtering │ + │ Based │ = │ Ensemble │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Feature Store (Feast) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ User │ │ Arena │ │ Interaction│ │
│ │ Features │ │ Features │ │ Features │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘2.2 Features de Entrada
python
# User Features
user_features = {
# Demográficas
"age_bucket": ["18-25", "26-35", "36-45", "46+"],
"gender": ["M", "F", "O"],
"location_city": str,
"location_lat": float,
"location_lng": float,
# Comportamentais
"preferred_sports": List[str],
"skill_levels": Dict[str, float], # sport -> rating
"avg_booking_value": float,
"booking_frequency_weekly": float,
"preferred_time_slots": List[str],
"preferred_days": List[str],
# Engajamento
"total_bookings": int,
"total_hours_played": float,
"days_since_last_booking": int,
"app_sessions_weekly": float,
# Social
"frequent_partners": List[str],
"teams_count": int,
}
# Arena Features
arena_features = {
# Atributos
"sports_offered": List[str],
"amenities": List[str],
"price_range": str, # "low", "medium", "high"
"avg_rating": float,
"total_reviews": int,
# Localização
"city": str,
"neighborhood": str,
"lat": float,
"lng": float,
# Capacidade
"courts_count": int,
"avg_availability_rate": float,
# Performance
"booking_rate_7d": float,
"repeat_customer_rate": float,
}
# Interaction Features
interaction_features = {
"user_arena_bookings": int,
"user_arena_rating": Optional[float],
"user_arena_last_visit_days": int,
"user_arena_cancellation_rate": float,
"user_sport_preference_match": float,
"distance_km": float,
}2.3 Modelo de Recomendação
python
import tensorflow as tf
from tensorflow import keras
import tensorflow_recommenders as tfrs
class ArenaRecommender(tfrs.Model):
def __init__(self, user_model, arena_model, task):
super().__init__()
self.user_model = user_model
self.arena_model = arena_model
self.task = task
def compute_loss(self, features, training=False):
user_embeddings = self.user_model(features["user_id"])
arena_embeddings = self.arena_model(features["arena_id"])
return self.task(user_embeddings, arena_embeddings)
# User Tower
user_model = keras.Sequential([
keras.layers.StringLookup(vocabulary=user_ids),
keras.layers.Embedding(len(user_ids) + 1, 64),
# User features
keras.layers.Dense(128, activation="relu"),
keras.layers.Dense(64, activation="relu"),
])
# Arena Tower
arena_model = keras.Sequential([
keras.layers.StringLookup(vocabulary=arena_ids),
keras.layers.Embedding(len(arena_ids) + 1, 64),
# Arena features
keras.layers.Dense(128, activation="relu"),
keras.layers.Dense(64, activation="relu"),
])
# Two-tower retrieval task
task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=arena_dataset.batch(128).map(arena_model)
)
)
model = ArenaRecommender(user_model, arena_model, task)
model.compile(optimizer=keras.optimizers.Adam(0.001))2.4 API de Recomendação
python
from fastapi import FastAPI, Depends
from pydantic import BaseModel
from typing import List
app = FastAPI()
class RecommendationRequest(BaseModel):
user_id: str
sport: Optional[str] = None
location: Optional[Tuple[float, float]] = None
limit: int = 10
class ArenaRecommendation(BaseModel):
arena_id: str
score: float
reasons: List[str]
@app.post("/recommendations/arenas")
async def get_arena_recommendations(
request: RecommendationRequest,
model: ArenaRecommender = Depends(get_model),
feature_store: FeatureStore = Depends(get_feature_store),
) -> List[ArenaRecommendation]:
# Busca features do usuário
user_features = await feature_store.get_user_features(request.user_id)
# Gera candidatos
candidates = model.predict(user_features)
# Aplica filtros (localização, esporte)
if request.location:
candidates = filter_by_distance(candidates, request.location)
if request.sport:
candidates = filter_by_sport(candidates, request.sport)
# Gera explicações
recommendations = []
for arena_id, score in candidates[:request.limit]:
reasons = generate_explanation(user_features, arena_id)
recommendations.append(
ArenaRecommendation(
arena_id=arena_id,
score=score,
reasons=reasons,
)
)
return recommendations3. Previsão de Demanda
3.1 Modelo de Previsão
python
import prophet
from prophet import Prophet
import pandas as pd
class DemandForecaster:
def __init__(self):
self.model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=True,
seasonality_mode='multiplicative',
)
# Adiciona feriados brasileiros
self.model.add_country_holidays(country_name='BR')
# Adiciona regressores externos
self.model.add_regressor('temperature')
self.model.add_regressor('rain_probability')
self.model.add_regressor('is_holiday')
self.model.add_regressor('local_event')
def prepare_data(self, bookings_df: pd.DataFrame) -> pd.DataFrame:
"""Prepara dados no formato do Prophet."""
df = bookings_df.groupby('date').agg({
'booking_id': 'count',
'temperature': 'mean',
'rain_probability': 'mean',
'is_holiday': 'max',
'local_event': 'max',
}).reset_index()
df.columns = ['ds', 'y', 'temperature', 'rain_probability',
'is_holiday', 'local_event']
return df
def train(self, df: pd.DataFrame):
"""Treina o modelo."""
prepared_df = self.prepare_data(df)
self.model.fit(prepared_df)
def predict(self, periods: int = 30) -> pd.DataFrame:
"""Prevê demanda para os próximos N dias."""
future = self.model.make_future_dataframe(periods=periods)
# Adiciona regressores futuros (de APIs de clima, calendário, etc.)
future = self.add_future_regressors(future)
forecast = self.model.predict(future)
return forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
def predict_hourly(self, date: str, arena_id: str) -> pd.DataFrame:
"""Prevê demanda por hora para uma arena específica."""
# Modelo específico por hora
hourly_model = self.hourly_models.get(arena_id)
if not hourly_model:
hourly_model = self.train_hourly_model(arena_id)
return hourly_model.predict(date)3.2 Features Temporais
python
def create_temporal_features(df: pd.DataFrame) -> pd.DataFrame:
"""Cria features temporais para previsão."""
df = df.copy()
# Extrações básicas
df['hour'] = df['datetime'].dt.hour
df['day_of_week'] = df['datetime'].dt.dayofweek
df['day_of_month'] = df['datetime'].dt.day
df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year
df['week_of_year'] = df['datetime'].dt.isocalendar().week
# Features cíclicas (sin/cos encoding)
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
df['day_sin'] = np.sin(2 * np.pi * df['day_of_week'] / 7)
df['day_cos'] = np.cos(2 * np.pi * df['day_of_week'] / 7)
df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)
# Features binárias
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_morning'] = df['hour'].between(6, 11).astype(int)
df['is_afternoon'] = df['hour'].between(12, 17).astype(int)
df['is_evening'] = df['hour'].between(18, 22).astype(int)
df['is_peak_hour'] = df['hour'].isin([18, 19, 20]).astype(int)
# Lag features
for lag in [1, 7, 14, 28]:
df[f'demand_lag_{lag}d'] = df['demand'].shift(lag * 24)
# Rolling features
for window in [7, 14, 28]:
df[f'demand_rolling_mean_{window}d'] = (
df['demand'].rolling(window * 24).mean()
)
df[f'demand_rolling_std_{window}d'] = (
df['demand'].rolling(window * 24).std()
)
return df4. Precificação Dinâmica
4.1 Modelo de Pricing
python
from dataclasses import dataclass
from typing import Optional
@dataclass
class PricingContext:
arena_id: str
court_id: str
date: str
hour: int
sport: str
base_price: float
# Demanda prevista
predicted_demand: float
demand_percentile: float
# Contexto externo
weather_score: float # 0-1 (1 = perfeito)
is_holiday: bool
local_event: Optional[str]
# Histórico
avg_occupancy_rate: float
similar_bookings_7d: int
class DynamicPricingModel:
def __init__(self, config: PricingConfig):
self.config = config
self.min_multiplier = 0.8 # -20%
self.max_multiplier = 1.5 # +50%
def calculate_price(self, context: PricingContext) -> float:
"""Calcula preço dinâmico baseado no contexto."""
multiplier = 1.0
# Fator de demanda (0.9 - 1.3)
demand_factor = self.demand_multiplier(context.demand_percentile)
multiplier *= demand_factor
# Fator de clima (0.95 - 1.1)
weather_factor = self.weather_multiplier(context.weather_score)
multiplier *= weather_factor
# Fator de ocupação (0.85 - 1.2)
occupancy_factor = self.occupancy_multiplier(context.avg_occupancy_rate)
multiplier *= occupancy_factor
# Fator de horário (0.9 - 1.2)
time_factor = self.time_multiplier(context.hour, context.date)
multiplier *= time_factor
# Aplica limites
multiplier = max(self.min_multiplier, min(self.max_multiplier, multiplier))
return round(context.base_price * multiplier, 2)
def demand_multiplier(self, percentile: float) -> float:
"""Multiplier baseado na demanda prevista."""
if percentile > 0.9:
return 1.3 # Alta demanda
elif percentile > 0.7:
return 1.15
elif percentile < 0.3:
return 0.9 # Baixa demanda
return 1.0
def time_multiplier(self, hour: int, date: str) -> float:
"""Multiplier baseado no horário."""
day_of_week = datetime.strptime(date, '%Y-%m-%d').weekday()
# Horários de pico
if day_of_week < 5: # Dias de semana
if hour in [18, 19, 20]:
return 1.2 # Pico noturno
elif hour in [6, 7, 8]:
return 1.1 # Manhã cedo
else: # Fim de semana
if hour in [9, 10, 11, 16, 17, 18]:
return 1.15
# Horários baixos
if hour in [14, 15]: # Tarde durante semana
return 0.9
return 1.04.2 Regras de Negócio
yaml
pricing_rules:
# Limites gerais
min_discount: -20%
max_premium: +50%
# Janela de antecedência
last_minute: # < 2 horas
discount: -15%
condition: occupancy < 50%
early_bird: # > 7 dias
discount: -10%
condition: always
# Eventos especiais
holidays:
premium: +20%
rain_forecast:
discount: -10%
condition: probability > 70%
# Fidelidade
frequent_customer: # > 10 reservas/mês
discount: -5%
first_booking:
discount: -20%5. Matchmaking de Jogadores
5.1 Algoritmo de Matching
python
from dataclasses import dataclass
from typing import List, Tuple
import numpy as np
@dataclass
class Player:
id: str
skill_rating: float # 0-100
preferred_intensity: str # "casual", "competitive"
preferred_position: Optional[str]
available_times: List[str]
location: Tuple[float, float]
play_style: List[str] # ["aggressive", "defensive", etc.]
class MatchmakingService:
def __init__(self, config: MatchmakingConfig):
self.config = config
self.skill_weight = 0.4
self.location_weight = 0.2
self.time_weight = 0.2
self.style_weight = 0.2
def find_matches(
self,
player: Player,
candidates: List[Player],
team_size: int = 2,
) -> List[Tuple[Player, float]]:
"""Encontra os melhores matches para um jogador."""
scores = []
for candidate in candidates:
if candidate.id == player.id:
continue
score = self.calculate_match_score(player, candidate)
scores.append((candidate, score))
# Ordena por score descendente
scores.sort(key=lambda x: x[1], reverse=True)
return scores[:10] # Top 10 matches
def calculate_match_score(
self,
player1: Player,
player2: Player,
) -> float:
"""Calcula score de compatibilidade entre jogadores."""
# Skill similarity (preferência por níveis similares)
skill_diff = abs(player1.skill_rating - player2.skill_rating)
skill_score = max(0, 1 - skill_diff / 30) # 30 pontos de tolerância
# Location proximity
distance = self.calculate_distance(player1.location, player2.location)
location_score = max(0, 1 - distance / 20) # 20km de tolerância
# Time overlap
time_overlap = len(
set(player1.available_times) & set(player2.available_times)
)
time_score = min(time_overlap / 5, 1.0)
# Play style compatibility
style_overlap = len(
set(player1.play_style) & set(player2.play_style)
)
style_score = style_overlap / max(
len(player1.play_style), len(player2.play_style), 1
)
# Weighted score
total_score = (
skill_score * self.skill_weight +
location_score * self.location_weight +
time_score * self.time_weight +
style_score * self.style_weight
)
return total_score
def form_balanced_teams(
self,
players: List[Player],
team_size: int = 2,
) -> List[List[Player]]:
"""Forma times balanceados por skill."""
# Ordena por skill
sorted_players = sorted(
players,
key=lambda p: p.skill_rating,
reverse=True,
)
teams = []
num_teams = len(players) // team_size
# Algoritmo serpentine para balanceamento
for team_idx in range(num_teams):
team = []
for pick in range(team_size):
if pick % 2 == 0:
player_idx = team_idx + (pick // 2) * num_teams
else:
player_idx = (num_teams - 1 - team_idx) + (pick // 2) * num_teams
if player_idx < len(sorted_players):
team.append(sorted_players[player_idx])
teams.append(team)
return teams6. Infraestrutura MLOps
6.1 Arquitetura MLOps
┌─────────────────────────────────────────────────────────────────┐
│ MLOps Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Data │──▶│ Feature │──▶│ Training│──▶│ Model │ │
│ │ Ingestion│ │ Store │ │ Pipeline│ │ Registry│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────────┐ ┌─────────┐ │
│ │ Data │ │ MLflow │ │ Model │ │
│ │ Quality │ │ Tracking │ │ Serving │ │
│ │ Checks │ │ │ │ (TF Srv)│ │
│ └─────────┘ └─────────────┘ └─────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Monitoring │ │
│ │ & Alerting │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘6.2 Stack Tecnológica
yaml
mlops_stack:
feature_store:
tool: Feast
storage: Redis (online) + PostgreSQL (offline)
experiment_tracking:
tool: MLflow
storage: S3 + PostgreSQL
model_registry:
tool: MLflow Model Registry
versioning: semantic
orchestration:
tool: Apache Airflow
scheduler: Kubernetes
model_serving:
tool: TensorFlow Serving / FastAPI
infrastructure: Kubernetes
autoscaling: HPA
monitoring:
tool: Prometheus + Grafana
alerts: PagerDuty
data_quality:
tool: Great Expectations
validation: pre-training6.3 Pipeline de Treinamento
python
# dags/training_pipeline.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'ml-team',
'depends_on_past': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
}
with DAG(
'recommendation_model_training',
default_args=default_args,
schedule_interval='0 2 * * 0', # Domingos às 2h
start_date=datetime(2024, 1, 1),
catchup=False,
) as dag:
def extract_features():
"""Extrai features do Feature Store."""
from feast import FeatureStore
store = FeatureStore(repo_path="feature_repo/")
training_df = store.get_historical_features(
entity_df=get_entity_df(),
features=[
"user_features:booking_frequency",
"user_features:avg_rating_given",
"arena_features:avg_rating",
"interaction_features:visit_count",
],
).to_df()
return training_df
def validate_data(training_df):
"""Valida qualidade dos dados."""
import great_expectations as ge
ge_df = ge.from_pandas(training_df)
ge_df.expect_column_values_to_not_be_null("user_id")
ge_df.expect_column_values_to_be_between(
"skill_rating", min_value=0, max_value=100
)
validation_result = ge_df.validate()
if not validation_result.success:
raise ValueError("Data validation failed")
def train_model(training_df):
"""Treina o modelo."""
import mlflow
with mlflow.start_run():
model = ArenaRecommender()
model.fit(training_df)
# Log métricas
mlflow.log_metrics({
"ndcg@10": model.evaluate_ndcg(test_df, k=10),
"precision@10": model.evaluate_precision(test_df, k=10),
"recall@10": model.evaluate_recall(test_df, k=10),
})
# Log modelo
mlflow.tensorflow.log_model(
model,
"recommendation_model",
registered_model_name="arena-recommender",
)
def deploy_model():
"""Deploy do modelo para produção."""
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Promove modelo para produção
client.transition_model_version_stage(
name="arena-recommender",
version=latest_version,
stage="Production",
)
# Atualiza serving
update_serving_model()
extract_task = PythonOperator(
task_id='extract_features',
python_callable=extract_features,
)
validate_task = PythonOperator(
task_id='validate_data',
python_callable=validate_data,
)
train_task = PythonOperator(
task_id='train_model',
python_callable=train_model,
)
deploy_task = PythonOperator(
task_id='deploy_model',
python_callable=deploy_model,
)
extract_task >> validate_task >> train_task >> deploy_task7. Monitoramento de Modelos
7.1 Métricas de Performance
yaml
model_metrics:
recommendation:
online:
- click_through_rate
- conversion_rate
- avg_session_duration
- recommendations_per_session
offline:
- ndcg@10
- precision@10
- recall@10
- map@10
demand_forecast:
- mape (Mean Absolute Percentage Error)
- rmse (Root Mean Square Error)
- mae (Mean Absolute Error)
- forecast_bias
pricing:
- revenue_lift
- occupancy_rate
- price_elasticity
- customer_satisfaction
matchmaking:
- match_acceptance_rate
- game_completion_rate
- skill_balance_score
- player_satisfaction7.2 Drift Detection
python
from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
class ModelMonitor:
def __init__(self, reference_data: pd.DataFrame):
self.reference_data = reference_data
self.column_mapping = ColumnMapping(
target='target',
numerical_features=['skill_rating', 'booking_count'],
categorical_features=['sport', 'time_slot'],
)
def check_data_drift(
self,
current_data: pd.DataFrame,
threshold: float = 0.15,
) -> DriftReport:
"""Detecta drift nos dados de entrada."""
report = Report(metrics=[DataDriftPreset()])
report.run(
reference_data=self.reference_data,
current_data=current_data,
column_mapping=self.column_mapping,
)
drift_detected = report.as_dict()['metrics'][0]['result']['dataset_drift']
if drift_detected:
self.trigger_alert('data_drift_detected')
self.schedule_retraining()
return report
def check_prediction_drift(
self,
predictions: pd.DataFrame,
threshold: float = 0.1,
) -> bool:
"""Detecta drift nas predições."""
current_distribution = predictions['score'].describe()
reference_distribution = self.reference_predictions['score'].describe()
ks_statistic, p_value = ks_2samp(
predictions['score'],
self.reference_predictions['score'],
)
return p_value < threshold
def monitor_performance(self):
"""Monitora performance em produção."""
# Calcula métricas recentes
recent_metrics = self.calculate_metrics(window='7d')
# Compara com baseline
for metric, value in recent_metrics.items():
baseline = self.baseline_metrics[metric]
degradation = (baseline - value) / baseline
if degradation > 0.1: # 10% degradation
self.trigger_alert(
f'performance_degradation_{metric}',
details={
'metric': metric,
'current': value,
'baseline': baseline,
'degradation': degradation,
}
)8. Métricas e KPIs
8.1 Business Metrics
yaml
business_kpis:
recomendacao:
# Impacto direto
conversion_rate_lift: +15%
avg_booking_value_lift: +10%
user_engagement_lift: +20%
# Satisfação
recommendation_rating: 4.2/5
click_through_rate: 25%
previsao_demanda:
# Precisão
forecast_accuracy: 85%
mape: <15%
# Impacto
overbooking_reduction: -50%
understaffing_reduction: -40%
precificacao:
# Revenue
revenue_lift: +12%
yield_improvement: +8%
# Equilíbrio
off_peak_bookings_lift: +25%
peak_hour_satisfaction: >4.0
matchmaking:
# Engajamento
match_acceptance_rate: 75%
return_player_rate: +15%
# Qualidade
game_balance_score: 0.85
player_satisfaction: 4.3/59. Roadmap e Próximos Passos
9.1 Evolução dos Modelos
yaml
evolucao:
v1_mvp:
- Recomendação baseada em regras
- Previsão simples (Prophet)
- Matchmaking por skill
v2_ml:
- Two-tower recommendation
- Previsão com features externas
- Matchmaking multi-fator
v3_advanced:
- Deep learning recommendations
- Reinforcement learning pricing
- Real-time personalization
v4_autonomous:
- AutoML para seleção de modelos
- Continuous training
- Self-healing pipelines9.2 Checklist de Implementação
- [ ] Feature Store configurado (Feast)
- [ ] MLflow setup para tracking
- [ ] Pipeline de treinamento (Airflow)
- [ ] Model serving (TF Serving)
- [ ] Monitoramento de drift
- [ ] A/B testing framework
- [ ] Alertas de performance
- [ ] Documentação de modelos
Este documento serve como guia para a estratégia de ML do Sport Tech Club, sendo atualizado conforme os modelos evoluem.