Recent studies utilizing electrophysiological speech envelope reconstruction have sparked renewed interest in the cocktail party effect by showing that auditory neurons entrain to selectively attended speech. Yet, the neural networks of attention to speech in naturalistic audiovisual settings with multiple sound sources remain poorly understood. We collected functional brain imaging data while participants viewed audiovisual video clips of lifelike dialogues with concurrent distracting speech in the background. Dialogues were presented in a full-factorial design, comprising task (listen to the dialogues vs. ignore them), audiovisual quality and semantic predictability. We used univariate analyses in combination with multivariate pattern analysis (MVPA) to study modulations of brain activity related to attentive processing of audiovisual speech. We found attentive speech processing to cause distinct spatiotemporal modulation profiles in distributed cortical areas including sensory and frontal-control networks. Semantic coherence modulated attention-related activation patterns in the earliest stages of auditory cortical processing, suggesting that the auditory cortex is involved in high-level speech processing. Our results corroborate views that emphasize the dynamic nature of attention, with task-specificity and context as cornerstones of the underlying neuro-cognitive mechanisms.