Daniel Paulus, Director of Engineerinng at Checkly

In my last job..

thank you for the image, sauce labs ๐Ÿ˜‰

I reverse engineered a few things iOS&Android :-)

Mac OS X allows you to mirror iOS video and audio

BUT How to solve? ๐Ÿ˜ฑ

1. let's google it โ˜๏ธ๐Ÿ˜Œ

2. found video on youtube ๐Ÿฅณ

3. found discussions online without solutions ๐Ÿ˜‘

4. no source code ๐Ÿ˜ญ

5. how hard can it be? ๐Ÿคจ

Usually you have two ways of finding out how these features work:

1. Try and understand from the code

Disassemble the binary and try to understand it using static analysis and a debugger

/System/Library/Frameworks/CoreMediaIO.framework/Versions/A/Resources/iOSScreenCapture.plugin/Contents/Resources/iOSScreenCaptureAssistant

What is a disassembler?

It will try to create C-like code or plain ASM from a compiled piece of code
2. Eavesdrop on the communication (USB in this particular case)
Luckily, this is very easy and cheap nowadays on MacOSX
sudo ifconfig XHC20 up

Similar Problem, different Perspective

Let's assume you want to reverse engineer a REST API, what would you rather do?
Read complicated JavaScript code...
var _0x3589=['ARTIST','childNodes','nodeValue','','TITLE','','demo','innerHTML','onreadystatechange','readyState','status','open','GET','cd_catalog.xml','send','responseXML','getElementsByTagName','length',''];(function(_0x307c39,_0x57eca5){var _0x4fb85c=function(_0xd89e6d){while(--_0xd89e6d){_0x307c39['push'](_0x307c39['shift']());}};_0x4fb85c(++_0x57eca5);}(_0x3589,0xec));var _0x50c1=function(_0x319890,_0x3940ec){_0x319890=_0x319890-0x0;var _0x4f50ff=_0x3589[_0x319890];return _0x4f50ff;};function loadXMLDoc(){var _0x1ec44a=new XMLHttpRequest();_0x1ec44a[_0x50c1('0x0')]=function(){if(this[_0x50c1('0x1')]==0x4&&this[_0x50c1('0x2')]==0xc8){myFunction(this);}};_0x1ec44a[_0x50c1('0x3')](_0x50c1('0x4'),_0x50c1('0x5'),!![]);_0x1ec44a[_0x50c1('0x6')]();}function myFunction(_0x18fd9c){var _0x4bbd7b;var _0x77fb02=_0x18fd9c[_0x50c1('0x7')];var _0x50a728='ArtistTitle';var _0x301231=_0x77fb02[_0x50c1('0x8')]('CD');for(_0x4bbd7b=0x0;_0x4bbd7b<_0x301231[_0x50c1('0x9')];_0x4bbd7b++){_0x50a728+=_0x50c1('0xa')+_0x301231[_0x4bbd7b][_0x50c1('0x8')](_0x50c1('0xb'))[0x0][_0x50c1('0xc')][0x0][_0x50c1('0xd')]+_0x50c1('0xe')+_0x301231[_0x4bbd7b][_0x50c1('0x8')](_0x50c1('0xf'))[0x0][_0x50c1('0xc')][0x0][_0x50c1('0xd')]+_0x50c1('0x10');}document['getElementById'](_0x50c1('0x11'))[_0x50c1('0x12')]=_0x50a728;}
Or look at HTTP-requests and replay them?
So I open Wireshark, start playing a video, stop playing the video and save the capture. USB may work with packets but I assume that the data is sent as a stream of some kind of media samples, so I write a little python script to extract and save all bytes received and sent in a separate file.

AND THEN...

I get this ๐Ÿค”


							00000000  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
							00000010  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
							00000020  24 00 00 00 63 6e 79 73  01 00 00 00 00 00 00 00  |$...cnys........|
							00000030  61 70 77 63 40 ae d5 18  01 00 00 00 b0 57 8d 19  |apwc@........W..|
							00000040  01 00 00 00 44 00 00 00  63 6e 79 73 40 15 e1 5c  |....D...cnys@..\|
							00000050  fb 7f 00 00 74 6d 66 61  90 4d af 18 01 00 00 00  |....tmfa.M......|
							00000060  00 00 00 00 00 70 e7 40  6d 63 70 6c 4c 00 00 00  |.....p.@mcplL...|
							00000070  04 00 00 00 01 00 00 00  04 00 00 00 02 00 00 00  |................|
							00000080
					

When a network connection sends you an endless stream of bytes, how do you know when a message is complete?

  1. Fixed length
  2. Delimiter-based messages
  3. 4 byte int containing length + payload of that length

So let's try fixed length..


							00000000  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
							00000010  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
							00000020  24 00 00 00 63 6e 79 73  01 00 00 00 00 00 00 00  |$...cnys........|
							00000030  61 70 77 63 40 ae d5 18  01 00 00 00 b0 57 8d 19  |apwc@........W..|
							00000040  01 00 00 00 44 00 00 00  63 6e 79 73 40 15 e1 5c  |....D...cnys@..\|
							00000050  fb 7f 00 00 74 6d 66 61  90 4d af 18 01 00 00 00  |....tmfa.M......|
							00000060  00 00 00 00 00 70 e7 40  6d 63 70 6c 4c 00 00 00  |.....p.@mcplL...|
							00000070  04 00 00 00 01 00 00 00  04 00 00 00 02 00 00 00  |................|
							00000080
					

So let's try delimiter based..


								00000000  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
								00000010  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
								00000020  24 00 00 00 63 6e 79 73  01 00 00 00 00 00 00 00  |$...cnys........|
								00000030  61 70 77 63 40 ae d5 18  01 00 00 00 b0 57 8d 19  |apwc@........W..|
								00000040  01 00 00 00 44 00 00 00  63 6e 79 73 40 15 e1 5c  |....D...cnys@..\|
								00000050  fb 7f 00 00 74 6d 66 61  90 4d af 18 01 00 00 00  |....tmfa.M......|
								00000060  00 00 00 00 00 70 e7 40  6d 63 70 6c 4c 00 00 00  |.....p.@mcplL...|
								00000070  04 00 00 00 01 00 00 00  04 00 00 00 02 00 00 00  |................|
								00000080
						

So let's try 4 byte length based..


									00000000  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
									00000010  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
									00000020  24 00 00 00 63 6e 79 73  01 00 00 00 00 00 00 00  |$...cnys........|
									00000030  61 70 77 63 40 ae d5 18  01 00 00 00 b0 57 8d 19  |apwc@........W..|
									00000040  01 00 00 00 44 00 00 00  63 6e 79 73 40 15 e1 5c  |....D...cnys@..\|
							

We have the frames, but still the strings don't make sense


							00000000  10 00 00 00 67 6e 69 70  00 00 00 00 01 00 00 00  |....gnip........|
					
๐Ÿคฏ Endianness, suddenly it all makes sense!

								const (
									ASYN            uint32 = 0x6173796E //nysa - asyn
									FEED            uint32 = 0x66656564 //deef - feed  
									RELS            uint32 = 0x72656C73
									HPD1            uint32 = 0x68706431 //hpd1 - 1dph | For specifying/requesting the video format
									HPA1            uint32 = 0x68706131 //hpa1 - 1aph | For specifying/requesting the audio format
									NEED            uint32 = 0x6E656564 //need - deen
									EAT             uint32 = 0x65617421 //contains audio sbufs		
									KeyValuePairMagic uint32 = 0x6B657976 //keyv - vyek
									StringKey         uint32 = 0x7374726B //strk - krts
									IntKey            uint32 = 0x6964786B //idxk - kxdi
									BooleanValueMagic uint32 = 0x62756C76 //bulv - vlub
									DictionaryMagic   uint32 = 0x64696374 //dict - tcid
									DataValueMagic    uint32 = 0x64617476 //datv - vtad
									StringValueMagic  uint32 = 0x73747276 //strv - vrts
								)

Going through the hexdump you'll find

Ping, Asyn, Sync and Rply

Ping

Boring, just for saying hello

Asyn

Here you will find sbuf which is a serialized CMSampleBuffer instance from Apple's CoreMedia Framework

							00000000  d7 65 01 00 6e 79 73 61  60 2f c3 5c fb 7f 00 00  |.e..nysa`/.\....|
							00000010  64 65 65 66 c3 65 01 00  66 75 62 73 20 00 00 00  |deef.e..fubs ...|
							00000020  73 74 70 6f 68 54 8d 40  3b 57 00 00 00 ca 9a 3b  |stpohT.@;W.....;|
							00000030  01 00 00 00 00 00 00 00  00 00 00 00 50 00 00 00  |............P...|
							00000040  61 69 74 73 01 00 00 00  00 00 00 00 3c 00 00 00  |aits........<...|
							00000050  01 00 00 00 00 00 00 00  00 00 00 00 68 54 8d 40  |............hT.@|
							00000060  3b 57 00 00 00 ca 9a 3b  01 00 00 00 00 00 00 00  |;W.....;........|
							00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
							00000080  00 00 00 00 00 00 00 00  00 00 00 00 86 62 01 00  |.............b..|
							00000090  74 61 64 73 00 00 00 1e  06 05 1a 47 56 4a dc 5c  |tads.......GVJ.\|
							000000a0  4c 43 3f 94 ef c5 11 3c  d1 43 a8 01 ff cc cc ff  |LC?....<.C......|
							000000b0  02 01 31 2d 00 80 00 01  62 58 25 b8 20 07 ff f0  |..1-....bX%. ...|
							000000c0  84 35 2d 02 55 ff 03 5b  4d cd b6 1a 16 09 f1 5d  |.5-.U..[M......]|
							000000d0  46 bf ea 3e 4d 0e 0c e4  6d 84 5a 00 00 03 00 00  |F..>M...m.Z.....|
							000000e0  03 00 00 03 00 01 42 bc  53 80 bf d1 d8 3b 47 e0  |......B.S....;G.|
							000000f0  21 61 98 83 c3 d1 62 92  4f 0f 67 7f 4a 6b 10 6a  |!a....b.O.g.Jk.j|
					

Let's try the same trick as before...


						00000090  74 61 64 73 00 00 00 1e  06 05 1a 47 56 4a dc 5c  |tads.......GVJ.\|
						000000a0  4c 43 3f 94 ef c5 11 3c  d1 43 a8 01 ff cc cc ff  |LC?....<.C......|
						000000b0  02 01 31 2d 00 80 00 01  62 58 25 b8 20 07 ff f0  |..1-....bX%. ...|
						000000c0  84 35 2d 02 55 ff 03 5b  4d cd b6 1a 16 09 f1 5d  |.5-.U..[M......]|
						000000d0  46 bf ea 3e 4d 0e 0c e4  6d 84 5a 00 00 03 00 00  |F..>M...m.Z.....|
						000000e0  03 00 00 03 00 01 42 bc  53 80 bf d1 d8 3b 47 e0  |......B.S....;G.|
						000000f0  21 61 98 83 c3 d1 62 92  4f 0f 67 7f 4a 6b 10 6a  |!a....b.O.g.Jk.j|
						

So what is format is the payload?


					06051A47 564ADC5C 4C433F94 EFC5113C D143A801 FFCCCCFF 0201312D 0080
				
After reading a lot of documentation, I realized the payload byte format matches that of a h264 network abstraction layer unit (NALu)
I spare you all the details but essentially this will create a playable video file:

							var delimiter = []byte{00, 00, 00, 01}

							func (avfw AVFileWriter) writeNalu(naluBytes []byte) error {
								_, err := avfw.h264FileWriter.Write(delimiter)
								if err != nil {
									return err
								}
								_, err = avfw.h264FileWriter.Write(naluBytes)
								if err != nil {
									return err
								}
								return nil
							}
					

BTW: Good example for a delimiter based format ๐Ÿ˜Œโ˜๏ธ

Reminder: If nobody is sleeping yet, talk about reference documentation

First Prototype was written in Java, Why use go?

Usb Programming is super simple

(I โค๏ธ gousb package)


					ctx := gousb.NewContext()
					devices, err := ctx.OpenDevices(func(desc *gousb.DeviceDesc) bool {
						// this function is called for every device present.
						// Returning true means the device should be opened.
						return validDeviceChecker(desc)
					})
					device := devices[0]
					conf, _ := device.Config(configIndex)
					iface, _ := conf.Interface(confNum, altSettingIndex)
					inEndpoint, _ := iface.InEndpoint(grabInboundBulkEndpoint(iface.Setting))
					outEndpoint, _ := iface.OutEndpoint(grabOutboundBulkEndpoint(iface.Setting))
					stream, _ := inEndpoint.NewStream(4096, 5)
					buffer := make([]byte, 65536)
					stream.Read(buffer)
					//do things with the buffer contents...
					outEndpoint.Write([]byte{1,2,3,4})
				

You have signed and unsigned primitives


						//you have unsigned ints for every byte size,
						//makes the java developer in me wanna cry out of happiness
						var unsigned_one_byte_integer uint8 = 3
						var unsigned_two_byte_integer uint16 = 6
						var unsigned_four_byte_integer uint32 = 12
						var unsigned_eight_byte_integer uint64 = 24

						//also, converting primitives back and forth
						//is elegant and simple
						var some_float float64 = 0.5
						var float_as_uint64 uint64
						float_as_uint64 = math.Float64bits(some_float)
						binary.LittleEndian.PutUint64(someByteArray, float_as_uint64)
						

Working with byte slices is soooo cool


					responseBytes := make([]byte, 24)
					binary.LittleEndian.PutUint32(responseBytes, 24)
					binary.LittleEndian.PutUint32(responseBytes[4:], ReplyPacketMagic)
					binary.LittleEndian.PutUint64(responseBytes[8:], sp.CorrelationID)
					binary.LittleEndian.PutUint32(responseBytes[16:], 0)
					binary.LittleEndian.PutUint32(responseBytes[20:], 0) 
					//or
					responseBytes := make([]byte, 60)
					length := writePayload(responseBytes[24:])
					writeHeader(responseBytes[:24], length)
				

You can even write and read structs directly from byte streams (like in C)


					type CMTime struct {
						CMTimeValue uint64 
						CMTimeScale uint32 
						CMTimeFlags uint32 
						CMTimeEpoch uint64
					}
					func NewCMTimeFromBytes(data []byte) (CMTime, error) {
						r := bytes.NewReader(data)
						var cmTime CMTime
						err := binary.Read(r, binary.LittleEndian, &cmTime)
						if err != nil {
							return cmTime, err
						}
						return cmTime, nil
					}
				

How do you unit test this?

I recommend using golden files for testing the codec (which were really awesome when last week I had to fix #38)

						func TestDecoder(t *testing.T) {
							//read example message from golden file
							//I created a separate file for every message there is
							dat, err := ioutil.ReadFile("fixtures/asyn-msg")
							if err != nil {
								log.Fatal(err)
							}
							
							//call code under test
							asynPacket, err := packet.NewAsynTbasPacketFromBytes(dat)
							
							//check that values are parsed correctly
							//I got the correct values by reading the hexdump manually, only if
							//my decoder is correct, they will be the same
							if assert.NoError(t, err) {
								assert.Equal(t, uint64(0x11123bc18), asynPacket.ClockRef)
								assert.Equal(t, uint64(0x1024490c0), asynPacket.SomeOtherRef)
								assert.Equal(t, "ASYN_TBAS{ClockRef:11123bc18, UnknownRef:1024490c0}", asynPacket.String())
							}
						}
						

Reverse Engineering makes you a better engineer

  • Make theories on how "they" built it, and test them one by one
  • Writing clean, unit tested application code without knowing the end result
  • Learn many cool new things like: networking basics, h264, USB coding
  • LowLevel: There is no magic

+ a lot of people shared their story ๐Ÿฅฐ

Demo ๐Ÿค— ๐Ÿคฉ

Bonus Content: GStreamer

Gstreamer - an extensible, open source multimedia framework with nice pipeline approach which is available for almost all platforms and lanuages

Build/Debug your pipeline using the shell..


					gst-launch-1.0 filesrc location=music.mp3 !
					mpegaudioparse ! mpg123audiodec ! audioconvert !
					audioresample ! pulsesink

					gst-launch-1.0 v4l2src !
					video/x-raw,width=128,height=96,format=UYVY ! videoconvert !
					ffenc_h263 ! video/x-h263 ! rtph263ppay pt=96 ! 
					udpsink host=192.168.1.1 port=5000
				

.. or build it using code


const gstreamer = require('gstreamer-superficial');
const pipeline = new gstreamer.Pipeline(`videotestsrc ! 
textoverlay name=text ! autovideosink`);
	
pipeline.play();
				

You can also grab binary data from an existing pipeline or feed data into one :-D


const appsink = pipeline.findChild('sink');
function onData(buf, caps) {
	if (caps) {
		console.log('CAPS', caps);
	}
	if (buf) {
		console.log('BUFFER size', buf.length);
		appsink.pull(onData);
	}

	// !buf probably means EOS
}
appsink.pull(onData);
				
				

Golang Reference Implementation ๐Ÿค“

  • Works on Mac and Linux
  • Supports custom Gstreamer pipelines for transcoding, RTP-Streaming, WebRTC etc.
  • No Windows support (yet? ever)

github.com/danielpaulus/quicktime_video_hack

ask me

  • all the questions about iOS
  • ๐Ÿคฉ about contentful ๐Ÿคฉ
  • about everything else

& thx for listening