URL2MDA

A fast tool to convert any website into LLM and AI Agent ready markdown data (MAGI markdown), with enhanced extraction for sites like Reddit, Twitter, and GitHub.

Usage Examples

Using curl:

$ curl 'https://url2mda.sno.ai/?url=https://example.com'

Using TypeScript (fetch):

import fetch from 'node-fetch'; // Or use browser fetch

const apiUrl = 'https://url2mda.sno.ai';
const targetUrl = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'; // Example YouTube URL

async function getMarkdown() {
  try {
    const fetchUrl = apiUrl + '?url=' + encodeURIComponent(targetUrl);
    const response = await fetch(fetchUrl);
    if (!response.ok) {
      throw new Error('HTTP error! status: ' + response.status);
    }
    const markdown = await response.text();
    console.log(markdown);
  } catch (error) {
    console.error('Error fetching markdown:', error);
  }
}

getMarkdown();

Using Python (requests):

import requests
import urllib.parse

api_url = 'https://url2mda.sno.ai'
target_url = 'https://github.com/openai/gpt-3' # Example GitHub URL

params = {'url': target_url}

try:
    response = requests.get(api_url, params=params)
    response.raise_for_status() # Raise exception for bad status codes
    markdown = response.text
    print(markdown)
except requests.exceptions.RequestException as e:
    print(f"Error fetching markdown: {e}")

Parameters

Required:

  • url (string): The website URL to convert.

Optional:

  • subpages (boolean, default: false): Attempts to crawl and return markdown for up to 10 linked subpages found on the provided URL. When using text response format, all subpages will be combined into a single document with sections.
    # Example (curl)
    $ curl 'https://url2mda.sno.ai/?url=https://example.com/blog&subpages=true'
  • llmFilter (boolean, default: false): Processes the extracted markdown through an LLM to filter out boilerplate, ads, and other non-essential content.
    # Example (curl)
    $ curl 'https://url2mda.sno.ai/?url=https://example.com&llmFilter=true'

Response Types

  • Default: Returns plain text markdown (Content-Type: text/plain).
    • With subpages=true: Returns a single combined document with sections for each subpage.
  • Use Content-Type: application/json header for JSON response.
    • For single page: Returns a JSON object with the URL and markdown content.
    • With subpages=true: Returns a JSON array where each object represents a page.