The extract function finds and pulls out specific pieces of information. It transforms str → list[T], making it easy to collect:

  • Contact details (“email: alice@example.com” → [“alice@example.com”])
  • Measurements (“The room is 20.5 feet wide” → [20.5])
  • Dates (“Meeting on Jan 15th” → [datetime(2024, 1, 15)])
  • Structured data (“Alice (25) and Bob (30)” → [Person(name=“Alice”, age=25), Person(name=“Bob”, age=30)])

For complex extraction patterns, consider creating a custom task. The extract function is a convenient wrapper around Marvin’s task system - see Tasks for more details.

Usage

Pull email addresses from text:

import marvin

emails = marvin.extract(
    "Contact us at support@example.com or sales@example.com",
    str,
    instructions="Find email addresses"
)
print(emails)
["support@example.com", "sales@example.com"]

Parameters

  • data: The input data to extract from (any type)
  • target: The type of data to extract (defaults to str)
  • instructions: Required when target is str to specify what to extract
  • agent: Optional custom agent to use
  • thread: Optional thread for conversation context
  • context: Optional additional context

Async Support

The function is also available in an async version:

import marvin
import asyncio

async def main():
    result = await marvin.extract_async(
        "The temperature is 72°F today",
        float
    )
    print(result)  # [72.0]

asyncio.run(main())

Examples

Numeric Values

Find all numbers in text:

import marvin

temperatures = marvin.extract(
    "Today's high is 75°F with a low of 62°F",
    float
)
print(temperatures)
[75.0, 62.0]

Named Entities

Extract specific types of information:

import marvin

cities = marvin.extract(
    "Flights from New York to London",
    str,
    instructions="Extract city names"
)
print(cities)
["New York", "London"]

Dates and Times

Find temporal information in text:

import marvin
from datetime import datetime

dates = marvin.extract(
    "The meeting is on March 15, 2024 at 2:30 PM",
    datetime,
    instructions="Extract date and time information"
)
print(dates)
[datetime(2024, 3, 15, 14, 30)]

Structured Data

Pull complex information into structured types:

import marvin
from dataclasses import dataclass

@dataclass
class Contact:
    name: str
    email: str
    phone: str | None

contacts = marvin.extract(
    """
    John Smith (john@example.com, 555-0123)
    Mary Johnson (mary@example.com)
    """,
    Contact
)
print(f"{contacts[0].name}: {contacts[0].phone}")
print(f"{contacts[1].name}: {contacts[1].phone}")
John Smith: 555-0123
Mary Johnson: None