Skip to main content
Athena automatically generates deep-linked citations when extracting information from documents, spreadsheets, videos, and other assets. These citations provide precise references to the exact source location of every piece of information, enabling traceability and verification across all extraction workflows.

Overview

When Athena performs information extraction—whether summarizing documents, analyzing spreadsheets, transcribing meetings, or processing any asset—it automatically embeds citations that link directly to the specific location where the information originated. This citation system works across all file types and ensures that every extracted insight is traceable to its source.

How Athena Uses Citations

Automatic Citation Generation

Citations are automatically generated during all information extraction workflows:
  • Document Analysis: When summarizing or extracting data from PDFs, Word documents, or PowerPoint presentations
  • Spreadsheet Processing: When analyzing data from Excel or CSV files
  • Meeting Transcription: When transcribing and summarizing video or audio recordings
  • Code Analysis: When analyzing source code files or Jupyter notebooks
  • Multi-Asset Research: When synthesizing information across multiple assets

Citation Format

All citations follow a consistent markdown URL format:
[excerpt text](asset_id?type=ASSET_TYPE&location_params)
Each citation includes:
  • Asset ID: Unique identifier for the source asset
  • Asset Type: The type of asset (PDF, DOCX, XLSX, VIDEO, etc.)
  • Location Parameters: Specific location within the asset (page, cell range, timestamp, etc.)
  • Excerpt Text: The actual text or content being cited

Citation Types by Asset

PDF and PowerPoint Documents

Location Format: Page-based citations Citations reference specific pages or page ranges within PDF files and PowerPoint presentations:
  • Single Page: page=5 - Links to page 5
  • Page Range: pages=3-7 - Links to pages 3 through 7
  • With Excerpt: The citation text contains the relevant excerpt from that page
Example in Extraction:
Revenue increased by 15% in Q4 [according to the quarterly report](asset_123?type=PDF&page=12)

Word Documents and HTML

Location Format: Node-based citations Citations use data-id attributes to reference specific sections or nodes within structured documents:
  • Node ID: node=abc123 - Links to a specific section identified by its node ID
  • Precise Sections: References exact paragraphs, headers, or document sections

Excel Spreadsheets

Location Format: Cell and range citations Citations reference specific cells, cell ranges, or sheets within spreadsheet files:
  • Single Cell: range=A1 - Links to cell A1
  • Cell Range: range=A1:D10 - Links to a range of cells
  • Specific Sheet: sheet=Sheet1&range=B5:E20 - Links to a range on a named sheet
  • Sheet Index: sheetIndex=0&range=C3:F15 - Links to a range on the first sheet (zero-indexed)
Example in Extraction:
The total revenue of $2.5M [is shown in cell E10](asset_456?type=XLSX&sheet=Revenue&range=E10)

Video and Audio Files

Location Format: Time-based citations Citations reference specific timestamps or time ranges within video and audio recordings:
  • Single Timestamp: time=125 - Links to 125 seconds (2:05) from the start
  • Time Range: startTime=135&endTime=345 - Links to a segment from 2:15 to 5:45
  • Format: Time is specified in seconds from the beginning of the recording
Example in Extraction:
The CEO mentioned the new product launch [during the meeting](asset_789?type=VIDEO&time=245)

Code Files

Location Format: Line-based citations Citations reference specific lines or line ranges within source code files:
  • Single Line: line=42 - Links to line 42
  • Line Range: startLine=10&endLine=25 - Links to lines 10 through 25
  • File Types: Works with .py, .js, .sql, .json, and other text-based code files

Jupyter Notebooks

Location Format: Cell-based citations Citations reference specific cells within Jupyter notebook files:
  • Cell ID: cellId=abc-123-def - Links to a specific notebook cell
  • Cell Index: Can reference cells by their position in the notebook

Extraction Workflows with Citations

Document Summarization

When Athena summarizes a document, every key point includes citations to the specific pages or sections:
Executive Summary:
- Q4 revenue reached $2.5M [page 3], representing 15% growth [page 12]
- Customer acquisition costs decreased by 8% [page 15]
- New product launch scheduled for Q2 [page 18]

Spreadsheet Analysis

When analyzing spreadsheet data, citations link to the exact cells containing the data:
Key Metrics from Sales Data:
- Total Q4 sales: $1.2M [Sheet1, E45]
- Top performing region: West Coast with $450K [Sheet2, B12]
- Average deal size: $12,500 [Summary, D8]

Meeting Transcription

When transcribing meetings, citations reference the exact timestamps:
Action Items from Leadership Meeting:
- Launch marketing campaign [5:30] - Assigned to Sarah
- Complete budget review [12:45] - Due by end of week
- Schedule client demo [18:20] - Coordinate with sales team

Multi-Asset Research

When synthesizing information across multiple assets, each fact includes its source citation:
Competitive Analysis Summary:
- Competitor A launched new feature [Q3-Report.pdf, page 8]
- Market share data shows 23% growth [Market-Data.xlsx, Sheet1, C15]
- Customer feedback indicates price sensitivity [Survey-Results.csv, row 45]

Use Cases

Compliance & Audit

Every extracted data point is traceable to its source, supporting regulatory compliance and audit requirements

Research Documentation

Automatically cited research summaries provide verifiable references to source materials

Data Analysis

Financial analysis and reporting with automatic citations to specific spreadsheet cells

Meeting Intelligence

Meeting summaries with timestamps linking back to specific discussion points

Knowledge Management

Build knowledge bases where every fact links directly to its source document

Legal Review

Contract analysis with citations to specific clauses and page numbers

Benefits of Automatic Citations

Traceability

Every piece of extracted information can be traced back to its exact source location, ensuring transparency and accountability.

Verification

Users can instantly verify extracted information by clicking citations to view the original source content.

Compliance

Automatic citations support regulatory compliance by maintaining a clear audit trail of information sources.

Trust

Citations build user confidence by showing exactly where information comes from, rather than treating extraction as a black box. Citations serve as deep links, enabling quick navigation to specific content within large documents or datasets.

Technical Details

Citation Structure

Citations use a structured format that includes all necessary information for locating source content:
[excerpt](asset_id?type=ASSET_TYPE&param1=value1&param2=value2)

Supported Parameters

Different asset types support different location parameters:
Asset TypeKey ParametersExample
PDFpage, pagespage=5 or pages=3-7
DOCXnodenode=abc123
PPTXpagepage=3
XLSXsheet, sheetIndex, rangesheet=Data&range=A1:D10
VIDEO/AUDIOtime, startTime, endTimetime=125 or startTime=30&endTime=90
Code Filesline, startLine, endLineline=42 or startLine=10&endLine=25
JupytercellIdcellId=abc-123-def
I