Overview
When Athena performs information extraction—whether summarizing documents, analyzing spreadsheets, transcribing meetings, or processing any asset—it automatically embeds citations that link directly to the specific location where the information originated. This citation system works across all file types and ensures that every extracted insight is traceable to its source.How Athena Uses Citations
Automatic Citation Generation
Citations are automatically generated during all information extraction workflows:- Document Analysis: When summarizing or extracting data from PDFs, Word documents, or PowerPoint presentations
- Spreadsheet Processing: When analyzing data from Excel or CSV files
- Meeting Transcription: When transcribing and summarizing video or audio recordings
- Code Analysis: When analyzing source code files or Jupyter notebooks
- Multi-Asset Research: When synthesizing information across multiple assets
Citation Format
All citations follow a consistent markdown URL format:- Asset ID: Unique identifier for the source asset
- Asset Type: The type of asset (PDF, DOCX, XLSX, VIDEO, etc.)
- Location Parameters: Specific location within the asset (page, cell range, timestamp, etc.)
- Excerpt Text: The actual text or content being cited
Citation Types by Asset
PDF and PowerPoint Documents
Location Format: Page-based citations Citations reference specific pages or page ranges within PDF files and PowerPoint presentations:- Single Page:
page=5
- Links to page 5 - Page Range:
pages=3-7
- Links to pages 3 through 7 - With Excerpt: The citation text contains the relevant excerpt from that page
Word Documents and HTML
Location Format: Node-based citations Citations usedata-id
attributes to reference specific sections or nodes within structured documents:
- Node ID:
node=abc123
- Links to a specific section identified by its node ID - Precise Sections: References exact paragraphs, headers, or document sections
Excel Spreadsheets
Location Format: Cell and range citations Citations reference specific cells, cell ranges, or sheets within spreadsheet files:- Single Cell:
range=A1
- Links to cell A1 - Cell Range:
range=A1:D10
- Links to a range of cells - Specific Sheet:
sheet=Sheet1&range=B5:E20
- Links to a range on a named sheet - Sheet Index:
sheetIndex=0&range=C3:F15
- Links to a range on the first sheet (zero-indexed)
Video and Audio Files
Location Format: Time-based citations Citations reference specific timestamps or time ranges within video and audio recordings:- Single Timestamp:
time=125
- Links to 125 seconds (2:05) from the start - Time Range:
startTime=135&endTime=345
- Links to a segment from 2:15 to 5:45 - Format: Time is specified in seconds from the beginning of the recording
Code Files
Location Format: Line-based citations Citations reference specific lines or line ranges within source code files:- Single Line:
line=42
- Links to line 42 - Line Range:
startLine=10&endLine=25
- Links to lines 10 through 25 - File Types: Works with
.py
,.js
,.sql
,.json
, and other text-based code files
Jupyter Notebooks
Location Format: Cell-based citations Citations reference specific cells within Jupyter notebook files:- Cell ID:
cellId=abc-123-def
- Links to a specific notebook cell - Cell Index: Can reference cells by their position in the notebook
Extraction Workflows with Citations
Document Summarization
When Athena summarizes a document, every key point includes citations to the specific pages or sections:Spreadsheet Analysis
When analyzing spreadsheet data, citations link to the exact cells containing the data:Meeting Transcription
When transcribing meetings, citations reference the exact timestamps:Multi-Asset Research
When synthesizing information across multiple assets, each fact includes its source citation:Use Cases
Compliance & Audit
Every extracted data point is traceable to its source, supporting regulatory compliance and audit requirements
Research Documentation
Automatically cited research summaries provide verifiable references to source materials
Data Analysis
Financial analysis and reporting with automatic citations to specific spreadsheet cells
Meeting Intelligence
Meeting summaries with timestamps linking back to specific discussion points
Knowledge Management
Build knowledge bases where every fact links directly to its source document
Legal Review
Contract analysis with citations to specific clauses and page numbers
Benefits of Automatic Citations
Traceability
Every piece of extracted information can be traced back to its exact source location, ensuring transparency and accountability.Verification
Users can instantly verify extracted information by clicking citations to view the original source content.Compliance
Automatic citations support regulatory compliance by maintaining a clear audit trail of information sources.Trust
Citations build user confidence by showing exactly where information comes from, rather than treating extraction as a black box.Navigation
Citations serve as deep links, enabling quick navigation to specific content within large documents or datasets.Technical Details
Citation Structure
Citations use a structured format that includes all necessary information for locating source content:Supported Parameters
Different asset types support different location parameters:Asset Type | Key Parameters | Example |
---|---|---|
page , pages | page=5 or pages=3-7 | |
DOCX | node | node=abc123 |
PPTX | page | page=3 |
XLSX | sheet , sheetIndex , range | sheet=Data&range=A1:D10 |
VIDEO/AUDIO | time , startTime , endTime | time=125 or startTime=30&endTime=90 |
Code Files | line , startLine , endLine | line=42 or startLine=10&endLine=25 |
Jupyter | cellId | cellId=abc-123-def |
Related Documentation
Asset Organization
Learn how to organize assets for optimal extraction and citation
Notebooks
Understand how Athena extracts and cites information from data analysis notebooks
Sheets
Explore spreadsheet analysis with automatic cell-level citations
Meetings
Learn about meeting transcription with timestamp citations