65

I mean, how do they work, generally? How do they receive the link to a video stream itself (not just the page containing a Flash player)?

I searched on the web but couldn't find anything useful, all links point to such services, but none indicate how they are actually implemented.

Ramhound
  • 41,734
  • 35
  • 103
  • 130
PaulD
  • 799
  • 1
  • 6
  • 6

2 Answers2

55

There is a very popular open source command-line downloader called youtube-dl, which does exactly that. It grabs the actual video and audio file links from a given YouTube link – or any other popular web video site like Vimeo, Yahoo! Video, uStream, etc.

To see how that's done, look into the YouTube extractor. That's just too much to show here. Other extractors exist for simpler sites.

In order to find the video stream, you'd have to pretend to be the actual browser client, trying to load the video. This means you first have to parse the HTML code, load the relevant Javascript code, and initialize a player object, which plays video through an HTML <video> element.

This means that somewhere in the Javascript execution, there is initialization code for the player, containing important parameters like where to actually find the video.

In the simplest case, the video might be present as a URL to some MP4 file, directly in some configuration object. This is very easy to parse by looking at the src attribute of the <video> element. But it could also be generated on the fly with some specific download tokens negotiated between client and some authentication server. The video might also play through a blob URL, so you cannot see it directly, because it's generated via MediaSource APIs.

Often, the Javascript code itself is obfuscated to make it harder to re-engineer it, using variables like xyz rather than player.

Most video websites these days use MPEG-DASH or Apple's HTTP Live Streaming (HLS) behind the scenes. These do not use direct URLs to a video file, but instead work with a so-called "manifest" file. The manifest provides meta-information to get the actual video stream. The manifest file (.mpd for example in DASH, and .m3u8 for HLS) will contain links to segments of video and audio, which you'd later have to combine to get a playable file.

Many websites transmit these manifests from the server to the client player, so if you can inspect the network requests made by the client, so might find a .mpd file which you can then just use to download the video segments from your own client.

However, the manifest could also be transmitted via other side-channels, embedded into some Javascript code, generated on-the-fly, etc. For youtube-dl, you can see how the code tries to extract the DASH manifest URL from the transmitted configuration information.

There's no general solution for this. It requires careful inspection and debugging of the target site.

slhck
  • 223,558
  • 70
  • 607
  • 592
  • 3
    One question, what is Youtube/Google's policy on this? Are they ok with this, or not so much? – JMK Jun 26 '14 at 15:53
  • 33
    The YouTube [Terms of Service](https://www.youtube.com/static?gl=GB&template=terms) in §5.1.L disallow consumption of their content through any other means than streaming, so theoretically it's not allowed. In practice, they won't be able to enforce that though. Any downloader can more or less simulate that it's just streaming. – slhck Jun 26 '14 at 16:23
  • Well, the URL need not come from an external source, a Flash application could generate it itself. Still, because (for whatever reason) all actual HTTP requests are deferred to the hosting browser, the URLs are easily found out. – Daniel B Jun 26 '14 at 17:41
  • 1
    @DanielB You're right. But usually, static Flash players are embedded, which can be configured through JS. The HTTP requests of course need to be made by the browser (who else?). That's also why you'll never be able to block people from downloading, unless you meticulously check user agents, use access tokens, and scramble the actual content. But even then the browser source could be modified to dump raw media content, I suppose. – slhck Jun 26 '14 at 17:43
  • 2
    @StevenPenny do you have any non minified version of that? – TankorSmash Jun 26 '14 at 18:08
  • 5
    @slhck Flash could also make HTTP requests by itself. Instead, it uses the browser’s HTTP engine. If Flash itself made the requests, they wouldn’t be “visible” to the browser. Sure would be great for advertisers. ;) – Daniel B Jun 26 '14 at 19:49
  • 3
    @slhck they can't enforce it *programmatically*, but if they got their lawyer team out could they enforce it *legally*? – Cruncher Jun 27 '14 at 15:45
  • @Cruncher Only if it is illegal in your country to violate ToS, and I suppose only when it's obvious that you're doing it in a way that's hurting them. They might be able to just ban your account if it was easily possible to find out. But they don't need a lawyer for that. – slhck Jun 27 '14 at 15:47
  • Most video downloaders (be it a web-based service or something running on your own computer to download the video directly to your web browser i.e. not via another server) do not link to your account with the respective video providers. – micheal65536 Sep 11 '15 at 10:55
  • If you are into Linux, you can easily download without 3rd party websites which has restrictions, waiting time and spams. You will be able to download single video, multiple videos or even a playlist. You can also select the download media file type. Check this video for the tutorial - https://www.youtube.com/watch?v=rEDNcs23YAQ&t=11s – PeakGen Jun 02 '17 at 14:49
  • @DanielB What do you mean can make request by itself?? – Suraj Jain Feb 18 '18 at 17:25
  • @SurajJain Any program can make HTTP requests. As such, Flash can do so, too. Even when running as a plugin. It doesn’t have to use the host browser mechanics to make requests. I don’t quite get what’s unclear about that. – Daniel B Feb 18 '18 at 19:55
39

Start with a typical video:

https://www.youtube.com/watch?v=XeojXq6ySs4

Using the same ID, construct a URL like this:

https://www.youtube.com/get_video_info?eurl=https://www.youtube.com&video_id=XeojXq6ySs4

The response will be a query string, like this (edited for readability):

innertube_api_version=v1&
innertube_context_client_version=2.20210504.09.00&
player_response=%7B%22responseContext%22%3A%7B%22serviceTrackingParams%22%3A...
ps=desktop-polymer&
root_ve_type=27240&

Extract the player_response value. This will be a JSON object, like this:

{
  "streamingData": {
    "adaptiveFormats": [
      {
        "itag": 137,
        "mimeType": "video/mp4; codecs=\"avc1.640020\"",
        "bitrate": 570464,
        "height": 1080,
        "signatureCipher": "s=VZVZOq0QJ8wRgIhANWm3sPF-2hbzQQGrErjQFMNmxTfALco..."
      }
    ]
  }
}

Then extract the signatureCipher value, this is a query string, like this:

sp=sig&
s=VZVZOq0QJ8wRgIhANWm3sPF-2hbzQQGrErjQFMNmxTfALcoZkZ4IVR1djIpAiEA8HFKix6d4B3T...&
url=https://r3---sn-q4flrnek.googlevideo.com/videoplayback%3Fexpire%3D16201927...

The url is the URL to the audio or video. However before you can access the URL, you must add an entry to the query string. The new key, is the value under sp above (sig in this case). The new value, is the value under s above (VZVZOq0QJ8wRgIhANWm3sPF-2hbzQQGrErjQFMNmxTfALcoZkZ4IVR1djIpA... in this case). However before you can add the new entry, you must decode the s value. To decode the value, take the following steps. First, visit the original page:

https://www.youtube.com/watch?v=XeojXq6ySs4

In the source code, will be some text like this:

/s/player/3e7e4b43/player_ias.vflset/en_US/base.js

which you can turn into:

https://www.youtube.com/s/player/3e7e4b43/player_ias.vflset/en_US/base.js

In this new page, will be some code like this:

var uy={an:function(a){a.reverse()},
gN:function(a,b){a.splice(0,b)},
J4:function(a,b){var c=a[0];a[0]=a[b%a.length];a[b%a.length]=c}};
vy=function(a){a=a.split("");uy.gN(a,2);uy.J4(a,47);uy.gN(a,1);uy.an(a,49);
uy.gN(a,2);uy.J4(a,4);uy.an(a,71);uy.J4(a,15);uy.J4(a,40);return a.join("")};

Take the original s value, and run it through this function:

vy('_l_lOq0QJ8wRAIgc-yNc9Z4lSO2CozG4B-W9uC5zeuTATDvqHlnQaHGNmkCICsZJGbEjKDmD...')

Result will look about the same, but scrambled:

AOq0QJ8wRAIgc-ylc9Z4lSO2CozG4B-W9uC5zeuTNTDvqH_nQaHGNmkCICsZJGbEjKDmDSnKg_atTR...

Finally you can construct the resulting URL:

https://r3---sn-q4fl6nz7.googlevideo.com/videoplayback?vprv=1&
id=o-AHThxQXyxJ3jfw5EBUJeT0IJLrdQeYpMdCsCImMfbuac&
sig=AOq0QJ8wRAIgc-ylc9Z4lSO2CozG4B-W9uC5zeuTNTDvqH_nQaHGNmkCICsZJGbEjKDmDSnKg_...

I have a library and program that does these steps:

https://pkg.go.dev/github.com/89z/mech/youtube

Zombo
  • 1
  • 24
  • 120
  • 163
  • Does this still work? I checked out https://www.youtube.com/get_video_info?eurl=https%3A%2F%2Fwww.youtube.com&video_id=QVEBO6Zuppk in my browser and it got me a 404. – Hermanboxcar Oct 23 '22 at 04:58