Saturday, 14 October

Two years ago I’ve made some predictions about upcoming RealMedia HD and little I knew that it was finished in 2015! So finally I’ve had a look at it and here are some details.

First of all, it seems to be China-oriented since the only version with RMHD support I was able to find (even from US site) was Chinese version of RealMedia player (stream in peace…). And since it’s China it bundles CEmpeg libraries with RMHD support built-in. Good luck obtaining the modified source though.

The actual decoder was in a separate library though with the usual interface for RealVideo decoders and obviously I could not resist looking inside.

It turns out that RMHD corresponds to RealVideo 11 or RV60. Either they thought it’s too advanced to be merely RV50 or NGV was intended to be RV50 but they’ve buried it in the same grave with MPEG-3 and such.

Anyway, it’s time for juicy technical details.

RV60 is based on ITU H.EVC or its draft. It is oriented on multi-threading decoding and they have a lot of crap cut out and thrown away and I fully approve that. It’s the problem with many standardised codecs: you have so much flexibility in configuring coding parameters that you have to invent special objects to signal coding parameters for the following group of frames unless you want to waste 10% of bitrate on them in every slice header; and then you invent profiles because not all of the features can be supported by existing hardware (for example, because they’ve not been added to the standard yet). RV60 has rather simple frame header and coding units are always size 64 and they seem to comprise all three planes instead of coding planes separately.

The biggest disappointment is motion compensation of course. RV2 had 1/2-pel MC, RV3 had 1/3-pel MC, RV4 had 1/4-pel MC. I obviously expected 1/5-pel MC for RV5 but instead they’ve stopped on 1/4-pel MC (with the bog standard 1 -5 20 20 -5 1 filter for luma and bilinear interpolation for chroma too).

Spatial (aka intra) prediction is very close to H.EVC as well.

Transforms are present in 4×4, 8×8 and 16×16 sizes that are some poor integer approximations of DCT.

And now for the juiciest part: coefficients coding! Coefficients are coded with with lots of context-adaptive codebooks, for intra/inter, luma/chroma and various quantisers. And since it’s RealVideo and not Thor (and its Norwegian developer does not seem to work any more on it) all codebooks are static (total over 32k entries) and stored in compact and obfuscated form. Compact means that the description has only code lengths packed into nibbles and obfuscated means they were XORed with a string containing name and mail of the guy who probably generated those codebooks (this reminds me a bit of Cineform HD and Sierra AGI resources) and it makes me shout “RealLy!? What were you trying to achieve with that?“. I’ll look closer at actual coefficient coding a bit later (or significantly later—when I have a desire to do so) but so far it looks like the coefficients are coded in 4×4 subblocks but in general following H.EVC coding scheme.

Deblocking seems to be present and depends on block size. SAO is not present it seems (and I don’t miss it either).

There seems to be only three frame types (no RADL-, WTFL- or AltGr-frames) but the frames may have multiple references.

Overall, my predictions turned out to be mostly true. I should be surprised, I guess.

I’m yet to see any real samples too but this makes it one of the best H.EVC-based codecs (better than actual H.EVC or VPix—because nobody cares about it) so there’s nothing much to complain about.

P.S. I’ve working RealVideo 1 decoder in NihAV already so maybe the first opensource decoder for RV60 will be in NihAV too.

Tuesday, 03 October

This category can be alternatively titled wild animal adventures and it contains probably the most famous Dingo Pictures cartoons.

Lord of the Jungle

It’s the retelling of the famous story of Tarzan from his childhood to his struggle with the former lord of the jungle to the point when he met Jane and got his own children. Definitely not the worst adaptation. And from a technical point this cartoon features something that’s hard to find in other Dingo Pictures works: full motion. Usually actors just enter the screen and stay there (or leave, but in the next scene) but here there are swings across the whole screen so this alone makes this cartoon worth checking out.

Baby Tarzan

BTW the ape who found him plays significant role in other films in this category too: she’s the mother of young black panther in King of the Animals Part II and she’s the one who pleads mercy on dinosaurs in the eponymous adventure.

A rare scene: dingo laughing.

Crawling lion.

You cannot see it but in this scene he’s crawling backwards (because of Tarzan). And at the end the professor (pictured below) will crawl watching after his grandson (who is obviously crawling too).


It is remarkable how he has managed to find an oak branch in the jungle.

Me Tarzan, you Jane.

It reminded me of the quote from Trollkarlens Hatt that everybody speaks English when in Jungle.

Animal Football

This is the Dingo All-Stars feature film. Almost all animals from other cartoons can be seen here and more! The premise is simple: animals had nothing to do so they decided to organise an event and play football. But the little animals are left behind so they form their own team and join…

Newsletter being distributed around.

As you can see, it’s Wild Dogs against Jungle Kings (and as expected, one team is dogs and another one is various jungle animals).

Wabuu! His expression says it all.

Some of the fans.

Again, knowing that Dingo Pictures is from Taunus in Hessen, I’m pretty sure those fans are from Darmstadt (you need to be Hessisch to get the reference).

Jungle kings fans.

Wild dogs fans.


They remind me of Willi from KSC for some reason.

More cheerleaders.

It’s hard to describe the game itself and I’m not a football fan at all (despite living in Germany and such) but let’s say it goes spectacularly and the team of small animals win. And they had a lot of fun playing too.

King of the Jungle Parts I and II

I have described the plot of these before (and the second part is well-known anyway). So here’s a picture of mole miners digging diamonds for black panther (the company is named after him BTW).

In the Land of Dinosaurs / The Little Dinosaur / Dinos

This story centres around young dinosaur Tio, who wants to have fun, learn to fly and such but he’s sent to school instead.

Tio is the larger green dinosaur

But during the lesson his teacher shows his new invention—a seismograph—that predicts volcano eruptions (the device as it’s pictured above might actually work BTW) and it gives and alarm.
So dinosaurs have to flee from the volcano, during their flight Tio with some other kids and their teacher are separated from the rest and have to find them…

Told ya!

Finally Tio manages to meet his younger brother Tio II who’s happy to admit he was named after a dead elder brother.

Tio I and II

Anyway, everybody is fine and they’ve reunited at last. Happy end!


Here they’ve resorted to CGI to animate waves properly.

This is yet another adaptation of the famous story about some dogs delivering diphtheria medicine to some small town in Alaska.

Then again, these hunters don’t look very American. Or Russian.

This is also complicated by rivalry between Balto and Komo.

I hope you can tell which is which.

And they have the same love interest: Judy.

And if you complain they look similar, I should point it out that all dogs of the same breed look very similar, so this is true in real life as well.

Here’s a sledful of them.

So Balto and Komo are selected to run to another town to deliver the message about outbreak of diphtheria and bring the medicine back. They manage to do it but on the way home they’re so exhausted that Komo could not jump over the gap and fell down and Balto dropped fully exhausted in the forest a bit further.

Luckily for everybody, his friends (including a seal and white bear) managed to locate him and fend off wolves before one of the hunters came and brought Balto back to town (you could see his sled earlier).

And he used his third arm to shoot!

So everybody is saved, happy end!

The last two categories have seven entries each so I’ll split them into two posts. Also I’ve not watched most of them yet so it’ll take some time.

Friday, 22 September

There are only three stories in this category and six-seven in the remaining ones so I don’t have to split this post into two parts.

The Musicians of Bremen

This is a retelling of the classical story in modern times. A countryside has turned into an amusement park so a donkey goes searching for a better life elsewhere. The donkey plays bass:

The founder of The Bremen Four

In his journeys, donkey meets another members of the band: a dog, a rooster and a cat.

Not quite medieval scenery, eh?

And later they find a house full of wanted criminals.

Guess how I know they’re wanted robbers?

But as expected, the animals are clever enough to deal with the criminals and then they can pursue their dream and get the happy ending they deserve!

Hmm, jazz or disco? Watch yourself to find that out.

This cartoon features a lot of music and singing too as one would expect from such title.

And there’s another thing worth noting: this Dingo Pictures presentation is narrated by a human:

The Narrator

This is only one of two Dingo Pictures works I know that used live acting (the other one is The Sword of Camelot but I’ll talk about it probably in the last or penultimate post of the series).

Puss In Boots

The classical tale by Charles Perrault. And to acknowledge that the main hero is played by cat named Charlie in the main Dingo Pictures cast.

There’s a scene where his “owner” orders a pair of boots to be made for his cat too.

You should know the story well: it’s about a miller’s son who has inherited only a cat and how that cat provided him with the standard “happily ever after” set (a princess, own castle and such).

This adaptation is quite faithful to the original, it even has quite French atmosphere:

Royal palace

Le Roi

And the king speaks with a French accent and slips French words time from time. And if you say his accent is false—there was even a French guy working on this cartoon.

The only difference is that ogre’s castle is not so menacing here:
It makes me think of Bavaria for some reason.

And the ogre was replaced with an evil wizard (on the other hand, in the original ogre knew some magic too):

But otherwise it’s the same story with the same happy ending.

I bet you didn’t expect this bit though.

Also it’s worth noting that there was a 3D version of the same story made later by French and it looks surrealistic, ugly and baffling. Even the voice of William Shattner in the English dub does not save it. So I’d recommend watching German version instead.


This is the longest cartoon from Dingo Pictures (or Media Concept as it was known back then), no other cartoon of theirs has a runtime of one hour. It was quite ambitious for its time (1993), full of music and such.

Dancing animals…
…scenes of judging the disputes of the commons…
… CGI …
…and even visual effects!

Again, the story is quite faithful to the original (which is not a part of the original Arabian 1001 Nights by the way) with some details altered for better enjoyment of course, unlike the butchered Disney version (honestly, Robin Williams is the only good thing in that version plus some songs and princess design).

It starts with a sorcerer searching for an appropriate boy to retrieve the magic lamp:
The speciist parrot cries that they all look the same though.

Then, with money and lies, he gets to know Aladdin and his mother and sends the boy to retrieve the lamp. Obviously he fails to achieve his goal and leaves Aladdin to rot in the cave.

Luckily for the protagonist, he had a ring with a lesser genie that could bring him out of the cave (exactly like in the original):
Genie of the ring

And later Aladdin discovers that the lamp that old sorcerer wanted hosts an even more powerful genie:

After that it’s the usual love story: boy meets girl, boy falls in love with girl, boy finally gets a chance to marry the girl. But unfortunately the sorcerer hears about it and steals the lamp, the castle and the princess (well, at least Mario can envy Aladdin). But our hero has a ring and its genie gives a flying carpet which can bring him to the castle:

The carpet ride is actually accompanied not just by a special theme but by a special song praising the advantages of flying with carpet (including the words “100% ecological” even though nobody back then cared about carbon footprint). And I’d rather fly on a flying carpet too then go to the Frankfurt airport at inconvenient time, pass security control, fly in a plane in economy class near small children… Ahem, back to the story.

So the hero gets to the castle, convinces princess to put sleeping powder into the wine, she drugs the sorcerer and the lamp is retrieved.

Have a nice defeat

Overall, it was an interesting cartoon with unexpectedly good music (the main theme sounds like Eastern version of that song by Boney M that would fit better in Anastasia but it still sets the proper atmosphere and it’s nice and hard to forget; oh, and the carpet ride sequence, the carpet ride sequence! you need to see it to believe it) and faithful telling of the story. Approved!

Your reaction when somebody talks bad about Dingo Pictures

Monday, 11 September

Today I’m covering the great works from Dingo Pictures. I intend to split the review into roughly the same categories as they are put on the official website and today we start with the first section. Its name is “Krimis” in German which I think is more appropriately translated into “thriller” than “mystery story” or “detective story”.

The Case for Mouse Police (aka simply Mouse Police)

Self-describing title card

The story is rather simple: Max and Sophie Emmental help the detective and police to investigate the theft of cheese.

Max and Sophie. Mickey and Minnie, eat your heart out!

Since it’s a detective story, I should not disclose plot much. But I assure you, it has some twists and thrills.

Have fun unseeing this.

But in the end the band is caught!

Band photo part 1…

… and part 2

So justice is served and cheese is recovered.

The heroes reading about themselves in a newspaper

BTW beside the advertisement for French cheese-resembling product, the curious thing is the newspaper name. It’s Lauterbacher Käseblatt or Cheesepaper of Lauterbach and I’m pretty sure this is a shout out to Lauterbach in Hessen, not that far from where Dingo Pictures headquarters are located.

Lucy and Lionel (aka Nice Cats)

This is a simple story: there was a lady with a cat and two kittens (Lucy and Lionel). The mother cat is a refined cat lady, Lionel is a calm kitten who likes to read and Lucy is a playful kitten who does not regard the rules much.

Lucy and Lionel. The book title is “Animals in laboratory” by Dr. Jekyl, not sure about the genre though.

So once they went to vacation and Lucy was kidnapped there. With the help of another cat named Charlie, Lucy managed to escape but it was too late: her family had left back home already! So two friends travel back to San Francisco.

Lucy and Charlie on the way home.

And the story ends well.

The happy ending.

Curious fact worth mentioning:
Luca B. had a cameo here.

Janis, the Little Piggy (aka Jamie, aka Babe, aka Piglet)

This story is a social commentary in a form of story about young piglet that could not fit well into existing environment. Like right after being born she’s asked how many siblings she got and after her guessing several numbers she only gets a remark “what to expect from a pig, she can’t count at all”. And that’s instead of being impressed by newborn pig knowing numbers! I’ve seen human children four-five years older that count about equally good (even worse actually, since they can’t guess number like “seventy six”) and yet are not considered dumb.

Janis tries to learn various things from other farm animals, like French from local rooster or jumping like a goat. But mostly she’s told not to do such nonsense because pigs are supposed just to eat, get fat and be sold to other farms to have their own piglets there. Even if a runaway pig Mr. Müller tells otherwise, nobody believes him (IIRC there was a movie ripping off the same plot just a couple years ago).

Eventually Janis is sold but she ends up at a slaughterhouse and not a farm.

Slaughterhouse is not the best place to leave good impressions on piglets IMO

Luckily, she manages to escape and even visit the farm once again before moving to live in a forest as a wild pig with Mr. Müller, who has managed to get less domestic since they last met.

Herr Müller, something tells me he might’ve been from Bavaria

So, beside looking like a simple cartoon for young children, this cartoon has an unexpected deep thoughts too. And there were some other surreal moments not exactly for children to understand where an owl in a forest sounded exactly like an alarm or this duck producing her own siren sound (actually now I think about her when I here sirens of passing ambulances or police cars).

da-DI-dada da-DI-dada

In the Search of Dalmatians

This action/thriller cartoon tells the story of dalmatians who went out to play once but were caught by two big mean twins Castor and Pollux (black dogs) who force puppies to work by gluing labels in their own company (yes, even dogs can own and run a company).

The happy dalmatians (about a hundred of them, I suppose)

Less happy dalmatians

The other dogs tried to help parents to find any trace of the missing puppies until Butcher (he has English name in the original) finds something. So later the dogs manage to rescue the puppies and trap Castor and Pollux (I shan’t tell how—watch it!).

There are some artwork details worth mentioning.

First, this feature has many CG backgrounds instead of traditional hand-painted ones.

Here’s an example.

And another one, if you look at the shots from the sequence below, you can spot one thing:




That’s right, Dingo Pictures were so artistic that their cartoon still pays homage to the classical theatre with actors waiting to enter the stage. Those touches are exactly the thing that makes their cartoons so remarkable.

…Even More Dalmatians (aka Dalmatians 3)

I think the best description for this cartoon would be calling it In the Search of Dalmatians reboot. You have about the same story (dalmatian puppies going to the city, being caught, forced to work for Castor and Pollux, and being rescued after that) but there are twists and significantly different details even is the cast is mostly the same.

First, puppies live with some old lady (never pictured here) instead of living with their parents.

Second, they go to the city not just for fun but to buy a present to her—gingerbread.

Third, they are misunderstood for thieves even if they wanted to barter and caught by the police and put into the animal shelter (where Castor and Pollux force them to work).

Fourth, now it’s a cat helping in searches.

I bet you did not expect cat to carry a laptop back in early 2000s.

Fifth, Castor and Pollux are disposed of completely differently.

But one thing does not change: the puppies get a happy ending!

There’s one more thing to note: look at the picture with narrator flying.

It makes me wonder whether it was painted after the actual place, probably somewhere near Taunus.

Okay, that’s enough for now. I hope to continue it in a few weeks (there’s still NihAV left to implement after all).

Friday, 01 September

The prolific German animation studio has made 28 animated films during the 1990s and early 2000s and obviously they’ve managed to make their own unique style. In this post I’ll try to describe it.

First of all, Dingo pictures is known for virtuoso combination of watercolour backgrounds and 2D computer animation of the characters (at least in one case you can spot a paint cursor copied along with the sprite, nice Easter egg!), the only exceptions I can name are Balto where they used computer animation for waves in the background and Bremen Musicians and The Sword of Camelot where they’ve inserted live narrator (in appropriate costumes) but I’ll talk about it in the post dedicated to every Dingo Pictures cartoon (I’ve managed to get all but two of them on DVDs, another one I’ve watched on BaidUTube and hopefully can still buy but for the last one, Little Witch Arisha, I can only rely on the trailor and description provided at their site).

they value the work of their artists and reuse it as much as possible. The same backgrounds appear in various pieces, like the same shopping street can be seen in Lucy and Lionel, …Even More Dalmatians and Anastasia or the same cottage appears in Anastasia and Teddy The Little Bear. And this background is so famous that it doesn’t need an introduction:

The same can be said about the fascinating tunes composed by Ludwig Ickert. After watching three of four Dingo Pictures cartoons you’ll recognize them immediately.

Third, they have quite canonical portrayal of their characters, here are few examples:
Indian kid from Pocahontas

European kid from Pocahontas


Not-exactly-captain Smith

Pocahontas and not-exactly-captain Smith

Esmeralda from The Ringer of Notre Dame

Pierre from The Ringer of Notre Dame

Clone troopers Soldiers from Pocahontas

As I said in my previous post, Dingo Pictures followed the steps of Osamu Tezuka and created a cast of characters that appear in almost every their feature (sometimes using different dresses and wigs): there’s Wabuu and stupid animals (the bear and dogs change their colour, the others usually don’t change anything), Generic Kid (featured above, mostly known as Aladdin or junior Aaron, brother of Moses from Prince of Egypt), Generic Young Adult (Pocahontas/Smith, Esmeralda/Pierre, Hercules, Anastasia, Jane from Tarzan Jungle King and many more), Generic Thick and Thin Duo, Cookie (the cook and lute player, sometimes doing both in the same film), Generic Old Lady and many more.

Fourth, they try to follow the strict canons in every picture: they try to even pacing by inserting some ambient scene after or inside a dialogue (i.e. the scene switches from talking persons to some unrelated action, usually involving animals) and they have their own style in depicting reactions—bobbing heads for laughs and zooming in and out heads for being shocked or scared. Another canonical thing is not using margins of the scene in full (you can spot it on the left edge of the shot with soldiers above). Usually it’s left margin that does not have animation (or even a bit of other background) but it can be right or top margin as well. And of course walking—it’s done by flipping between two-four frames of the character sprite in the middle of the picture while moving background behind them.

Fifth, they allow their heroes to have a day in the limelight: the bear is not used in scene transitions, he has minor roles in King of the Animals part 2(aka Son of the Lion King) and Teddy, the Little Bear and plays very important role in Goldie (along with the title character who’s usually seen just running around or sniffing a mushroom in other cartoons). Small birds are either cast as scene filler or as narrators (plus small roles in Pocahontas and King of the Animals), the red cat is the main hero of Puss in Boots and Charlie in Lucy and Lionel(aka Nice Cats—the main hero there plays Wabuu’s girlfriend in Pocahontas) etc etc.

And last but not least the story is quite often has an unexpected twist. Who knew about Wabuu playing a significant role in relations between Indians and Europeans; or that Moses and Aaron had a pet crocodile Beniamin (who actually saved Moses); or that Puss in Boots tricked out a wizard and not an ogre; or that puppies in animal shelter had to work by gluing labels to tins; or small details in most of their cartoons that cater to more adult audience (like a little pig visiting a slaughterhouse in Janis or mouse floozy in Mouse Police). Really, Dingo Pictures has created a fascinating world and you should watch their works on the first occasion, especially after hacking some code like libswscale.

P.S. And here’s a bonus: talking plants!
The Elder Bush from Pocahontas

Talking tree from Goldie

(little witch Arischa and the talking tree—taken from the trailor)

P.P.S. And here’s probably the only product placement I’ve seen in Dingo Pictures works (beside some shops having real world names):
I hope that chain of UK supermarkets has paid them a lot for this promotion

Thursday, 31 August

As you probably don’t care, I’m working on RealMedia demuxer for NihAV. And it’s very straightforward: chunks without nesting, version field to guard against surprises, clean layout. The only peculiarities it has are audio data interleavers and so-called logical stream (the special entry that describes how to select streams for streaming depending on bitrate available). And yet the implementation for this format in libavformat is quite complex and baffling. This observation led me to playing Captain Obvious and stating these three problems:

  1. Following specification. Unless it’s ISO/IEC or ITU codec you usually have quite lacking specification either with details omitted (or as DT$ representatives put it, but we have it in the SDK!. Which helps a lot when you can download only ETSI paper). In some case the original implementation is the only specification you have. I’m no stranger to working with binary specifications but it still quite often doesn’t say what to expect in some cases (and then fuzzing happens…);
  2. Supporting hacks and abuses of specification. Two examples: MP3 and AVI. Or MP3 in AVI. For instance, MP3 has an optional CRC field (so if you don’t want CRC you simply don’t put it there) but I’ve seen samples that put zero checksum instead. Or in AVI you’re supposed to have 42db chunks for uncompressed video frames, 42dc for compressed video frames and 42wb for audio data. In reality you can have dc and db identifiers mixed in the same stream or even 0041 chunks put inside LIST rec chunk (that’s Indeo 4.1 in CivII clips). And of course there are many many more examples that everybody who has encountered them tries to forget.
  3. Seeking. You might wonder how seeking gets here and I’ll tell you how: most formats are not designed for random seeking and even if they are, users would still want to ignore indices, jump to a random position and find a start of the next chunk and timestamp as well. And in libavformat that is performed by a binary search that invokes special read_timestamp function of the demuxer (if present) which is supposed to do exactly that—searching for packet start and reporting a timestamp.

The moral of the story: if you can allow to ignore stupid user requests, do so and cherish the fact that you can. In NihAV I’m going to implement seeking only for formats that allow that (with reading index) or by more generic linear seeking that skips frames. This should be enough for my needs and it’ll keep code simple too.

Now that it’s become a bit colder I might actually resume my work on NihAV and even more important thing—describing Dingo Pictures art style and works.

Saturday, 05 August

If you talk about German films as a foreigner you might know some good ones like influential Fritz Lang movies (for Frau im Mond they’ve hired Hermann Oberth himself and as the result their depiction of space travel looks much more realistic than modern Hollywood flics) or Gojko Miti? adventure features (the Eastern Westerns). But if you’re not from Germany, what German cartoons do you know? Looks like the only German cartoons that got some widespread action are those from Dingo Pictures.

Dingo Pictures is a company located in Taunus that has produced about thirty Hess(l)ich cartoons in the second half of 1990s. Some of those were completely unique and some were ripped off by D*sney and Don Bluth earlier.

Dingo Pictures has their own unique and easily recognizable style. But before I explain it, here are the eponymous animals in one of their cartoons:

So, Dingo Pictures was a pioneer, combining computer drawn animation with 2D drawn background (watercolours no less!). Also like the anime father, Osamu Tezuka, the company had a cast of actors always appearing in every cartoon.

For example:
The Bear (he often changes scenes and complains about everything)

Goldie (named so after the Austrian book Bambi) and Wuschel (the squirrel), Ringo the Hare is not pictured here

And here’s the star of all Dingo Pictures cartoons, the one and only Wabuu:

In case you didn’t know, Wabuu was so popular that he had his own original cartoon, title song (that can be heard in several other cartoons too) and even audio books! Even now the DVD with his own cartoon costs at least twenty Euro on Amazon and about ten Euro used (for comparison, you can buy almost any other used DVD with Dingo Pictures cartoon for one eurocent).

Anyway, we were talking about the style. It’s hard to express in words what makes Dingo Pictures cartoons so charming but I think phrases “record-mending animation quality”, “copy-pasting everything”, “reusing the same scenes in other cartoons”, “more padding that Star Trek The Motionless Picture” and “dull voice acting” would do.

Here are some shots from one of their longer films, King of the Animals (or Lion King for short), don’t mind the quality, I tried to be lazy:

Title card. One of the best ones, honestly.

Every animal is uniquely redrawn.

The titular king.

Nice backgrounds.

One of moles (as you can guess from its look this mole is Italian).

The story is very typical: the lion rules the jungle full with monkeys, hippos, crocodiles and vultures (and with hares, squirrels and bears—the bear picture above is from this cartoon). One day a monkey finds diamonds but they decide not to mine them in fear of humans coming. With the birth of his son, the lion king loses interest in ruling and the black panther seizes the power with cheating and false promises and exiles the king. Later, with the help of snake, vultures and bear the panther is defeated. If you think you’ve heard this story elsewhere, don’t worry—it has unexpected twists in it.

And in the end we have scenes like this:
The vultures asked for a computer with phone and modem for their help.

The Black Panther is defeated!

BTW the snake enjoys reading books and quotes Shakespeare, Goethe, Karl Marx and Gorbachov. If you find this strange how could you get past the fact the panther’s name is Bocassa?

How one can not enjoy masterpieces like this! Oh, and every time I hear sirens I remember the duck from Animal Football, that’s how much their art has touched me.

There’s a sequel to it, simply called King of the Animals The Second Part but it’s of lesser quality IMO.

The other noteworthy cartoons are Aladdin(the genie there is a famous actor who is not Robin Williams), Animals Football(there they’ve copy-pasted all their animal characters and then some), Bremen Musicians(it has live narrator filmed, not just animation), The Case for Mouse Police(it simply needs to be seen to be believed) and of course Wabuu.

P.S. For some reason DVDs are distributed by P*wer Channel GmbH and don’t mention the original creator anywhere except in the video. They are that modest.

P.P.S. Honestly, I don’t think I’ve heard about any other German cartoons. But these cartoons have reviews in BaidUTube channels of people from countries like Canada and Sweden (the latter is in Swedish of course, actually Wabuu song sounds even better in Swedish than in German).

Let me start with a bit of history.

Normal don’t care much how to eat their pasta—they simply cook it, add whatever they have (even mayo probably or nothing at all) and eat it. Italians are different, they select pasta sauce first and then decide what pasta will go fine with it. In case of meat sauce (or ragù as locals call it) Italians considered that wide plain pasta would go best with it for some reason. So they competed who can use the wider noodles and the guy who simply took the whole plates won. But it was a bit inconvenient to cook them and then mix with sauce so they’ve switched to oven baking the whole thing in sauce instead. And that’s how lasagne was born (probably; Italians have a completely different story to tell but they always do).

Since I’d better avoid meat entirely, I decided to cook my own version with various components (in several tries too) and here’s my short summary:

  • it’s better to use thick sauces;
  • tomato sauce is a definite must, it adds flavour;
  • cheese sauce is good mostly for the lowest layer (to lay lasagne plates on it) and for pouring on top;
  • ricotta and Quark make fine layers too, you can even mix them with some vegetables;
  • sliced boiled eggs would make a nice addition to a layer with tomato sauce;
  • mozzarella is better avoided since it will result in hard chewy chunks contrasting with the texture of the rest of the dish.

Overall, it’s nice dish, would bake again.

Thursday, 03 August

Okay, I’ve made some changes so hopefully the server will withstand the curiosity of more than two people if it will go like the last time.

So, after implementing Indeo 4/5 decoders for NihAV I nano-benchmarked it and my decoder was about twice as slow compared to libavcodec. And since neither has SIMD optimisations they should be good enough to compare.

The tested file was 00186002.avi — Indeo 4 sample with scalability feature(i.e. luma is split into four bands and uses Haar wavelet to compose the output plane) and duration over ten minutes. The results I got will be given in Linux perf sample counts as those should be representative enough.

avconv — 13.4 seconds, 10K cycles. About 24% spent in luma plane recombination (with Haar wavelet), about 40% of time is taken by bitstream decoding and the rest is mostly transforms and motion compensation.

nihav-tool — 31.6 seconds, 20K cycles. 30% spend in luma plane recombination, 48% of time is taken by bitstream decoding, 11% is for motion compensation and the rest is mostly transforms. Or in samples: recombination — 9900 (against 3300 in libavcodec), bitstream decoding (dirty estimate, it includes some DSP functions inlined) — 15800 against
5600. Motion compensation — 3500 against 1700. Transforms — 1300 against 1500 (they are not equivalent though, my code only transforms the block and output costs are hidden in bitstream decoding). Overall, my code is consistently worse. Is there any way to optimise it a bit?

Step 1. Committing explicit murder.

My DSP function for motion compensation for case 0 (no interpolation) uses this code:

            for _ in 0..h {
                for x in 0..w {
                    dst[didx + x] = src[sidx + x];
                sidx += sstride;
                didx += dstride;

Replacing it with

            for _ in 0..h {
                let mut dest = &mut dst[didx..didx+w];
                sidx += sstride;
                didx += dstride;

gives 2800 samples (new function plus memcpy time) instead of 3500 in the old function.

Conclusion: rustc is too stupid to recognize copies.

Step 2. Going after the biggest function.
The plane recombination function is very simple:

        for _ in 0..(h/2) {
            for x in 0..(w/2) {
                let p0 = src[idx0 + x];
                let p1 = src[idx1 + x]; // idx0 + w/2
                let p2 = src[idx2 + x]; // idx0 + (h/2) * stride
                let p3 = src[idx3 + x]; // idx0 + w/2 + (h/2) * stride
                // oidx1 = oidx0 + stride
                dst[oidx0 + x * 2 + 0] = clip8(((p0 + p1 + p2 + p3 + 2) >> 2) + 128);
                dst[oidx0 + x * 2 + 1] = clip8(((p0 + p1 - p2 - p3 + 2) >> 2) + 128);
                dst[oidx1 + x * 2 + 0] = clip8(((p0 - p1 + p2 - p3 + 2) >> 2) + 128);
                dst[oidx1 + x * 2 + 1] = clip8(((p0 - p1 - p2 + p3 + 2) >> 2) + 128);
            idx0 += sstride;
            idx1 += sstride;
            idx2 += sstride;
            idx3 += sstride;
            oidx0 += dstride * 2;
            oidx1 += dstride * 2;

Using intermediates (p0±p2, p1±p3) cuts it from 9900 samples to 9200.
But I like to live dangerously so I rewrite it into an unsafe mess with raw pointers like *d0_ptr.offset(1) = clip8(((d0 + d1 + 2) >> 2) + 128); d0_ptr = d0_ptr.offset(2);. Now it’s only 6700 samples. I try wrapping_add() instead of sum — 6600, no significant change. I roll back and cast p0-p3 to i32. 4700 cycles! I try wrapping (i.e. without checks) arithmetics now. 4600 cycles. Maybe a statistical deviation, maybe a real small improvement.

Then I try a bit smarter clipping function:

fn mclip8(a: i32) -> u8 {
    if (a as u16) > 255 { !(a >> 16) as u8 }
    else { a as u8 }

4300 samples. Not libavcodec‘s 3300 but also not the original 9900.


  1. the compiler is not smart enough yet;
  2. maybe there is/there should be a way to tell compiler “hey, this slice is guaranteed to be this size, drop the unneeded checks”;
  3. non-checking arithmetic is a bit too inconvenient to write (by design, I suppose) but it may help to squeeze some performance;
  4. having optimised routines for basic type downconversion with saturation would make life a bit easier;
  5. the compiler sucks at doing basic calculations with smaller types.

Step 3. Strategically placed inlines.

I strategically put some #[inline(always)] on certain bitstream reading functions. Overall bitstream reading time reduces to 14600 samples (from 15800).

Let’s see how well it works with constant arguments—I force motion compensation function to be called with explicit block size (it’s either 4 or 8). 2300 samples instead of 2800, not bad. In total I’ve managed to shave off about 8 seconds.

Conclusion: compiler does inlining okayish but some hints would improve the performance significantly.

Step 4. Improving bitstream reading.

Indeo 4/5 codebooks are represented as unary prefix and optional bits to read that may vary for each prefix (e.g. first predefined codebook for macroblock data starts with codes 0, 10xxxx, 110xxxxx and 1110xxxx). So after caching offsets for each possible prefix and using quick routine for determining prefix ((!bitread.peek(16).trailing_zeros() since the actual codes come LSB-first) instead of a loop I got bitstream handling time reduced to 11200 samples (from 14600). It’s nice that Rust has those count leading/trailing zeroes functions. This part is done differently in my decoder (it doesn’t build LUT for the codebook) so it’s hard to compare it directly to libavcodec but there’s still a lot of room for improvement.

Step 5. More pointers!

I rewrite some other DSP functions using pointers. Overall decoding time drops to 17.3 seconds (from 31.6 seconds originally and versus 13.4 seconds for avconv) so I call it a day.

Overall conclusion

It is possible to write fast code in Rust but the compiler is not smart enough to do all possible optimisations and you have to make sacrifices if you want your code to be safer by default (and I’m fine with it). Also it’s obvious how Rust makes the code that sacrifices checks for speed uglier (compare a + b and a.wrapping_add(b) or arr[i] and *ptr.offset(i as isize)). But in most cases it’s better to have a function in pure assembly language to perform the task instead. Maybe I’ll get to that stage eventually, maybe not. I still have NihAV to design and implement and piglets can shave themselves.

Monday, 31 July

Disclaimer: obviously it’s my opinion, feel free to prove me wrong or just ignore.

Now I should qualify for zoidberg (slang name for lowly programmer in Rust who lives somewhere in a dumpster and who is also completely ignored—perfect definition for me) I want to express my thoughts about programming experience with Rust. After all, NihAV was restarted to find out how modern languages fare for my favourite task and there was about one language that was promising enough. So here’s a short rant about the aspects of this programming language that I found good and not so good.

Good things

  • Modern language features: standard library containers, generics, units and their visibility etc etc. And at least looks like Rust won’t degrade into metaprogramming language any time soon (that’s left for upcoming Rust+=1 programming language);
  • Reasonable encapsulation: I mean both (sub)modules organisation and the fact that functions can be implemented just for some structure;
  • Powerful enums that can act both as plain C set of values and also as tagged objects, e.g. the standard Result enum has two values—Ok(result) and Err(error) where both result and error are two different user-defined types, so returned value can contain either while being the same type (Result);
  • More helpful error messages (e.g. it tries to suggest a correction for mistyped variable name or explains an error a bit more detailed). Sure, Real Programmers™ don’t need that but it’s still nice;
  • No need for dependency resolving: you can have stuff in one module referencing stuff in another module and vice versa at the same time, same for no need
  • Traits (standard interfaces for objects) and the fact that operations are implemented as specific traits (i.e. if you need to have a + b with your custom object you can implement std::ops::Add for it and it will work). Also it’s nice to extend functionality of some object by making an implementation for some trait: e.g. my bitstream reader is defined in one place but in another module I made another trait for it for reading codebooks so I can invoke let val = bitread.read_codebook(&cb)?; later.

Unfortunately, it’s not all rosy and peachy, Rust has some things that irritate me. Some of them are following from the advantages (i.e. you pay for many features with compilation time) and other are coming from language design or implementation complexity.

Irritating things that can probably be fixed

  • Compilation time is too large IMO. While the similar code in Libav is recompiled in less than a second, NihAV (test configuration) is built in about ten seconds. And any time above five seconds is irritating to wait. I understand why it is so and I hope it will be improved in the future but for now it’s irritating;
  • And, on the similar note, benchmarks. While overall built-in testing capabilities in Rust are good (file it under good things too), the fact that benchmarking is available only for limbo nightly Rust is annoying;
  • No control over allocation. On one hoof I like that I can not worry about it, on the other hoof I’d like to have an ability to handle it.
  • Poor primitive types functionality. If you claim that Rust is systems programming language then you should care more about primitive types than just relying on as keyword. If you care about systems programming and safety you’d have at least one or two functions to convert type into a smaller one (e.g. i16/u16 -> u8) and/or check whether the result fits. That’s one of the main annoyances when writing codecs: you often have to convert result into byte with range clipping;
  • Macros system is lacking. It’s great for code but if you want to use macros to have more compact data representation—tough luck. For example, in Indeo3 codebooks have sequences like (a,b), (-a,-b), (b,a), (-b,-a) which would be nice to shorten with a macro. But the best solution I saw in Rust was to declare whole array in a macro using token tree manipulation for proper submacro expansion. And I fear it might be the similar story with implementing motion compensation functions where macros are used generate required functions for specific block sizes and operations (simple put or average). I’ve managed to work it around a bit in one case with lambdas but it might not work so well for more complex motion compensation functions;
  • Also the tuple assignments. I’d like to be able to assign multiple variables from a tuple but it’s not possible now. And maybe it would be nice to be able to declare several variables with one let;
  • There are many cases where compiler could do the stuff automatically. For example, I can’t take a pointer to const but if I declare another const as a pointer to the first one it works fine. In my opinion compiler should be able to generate an intermediate second constant (if needed) by itself. Same for function calling—why does - 42); fail borrow check while let pos = bitread.tell() - 42;; doesn’t?
  • Borrow checker and arrays. Oh, borrow checker and arrays.

This is probably the main showstopper for implementing complex video codecs in Rust effectively. Rust is anti-FORTRAN in a sense that FORTRAN was all about arrays and could operate arrays safely while Rust safely prevents you from operating arrays.

Video codecs usually operate on planes and there you’d like to operate with different chunks of the frame buffer (or plane) at the same time. Rust does not allow you to mutably borrow parts of the same array even when it should be completely safe like let mut a = &mut arr[0..pivot]; let mut b = &mut arr[pivot..];. Don’t tell me about ChunksMut, it does not allow you to work with them both simultaneously. And don’t tell me about Bytes crate—it should not be a separate crate, it should be a core language functionality. In result I have to resort to using indices inside frame buffer and Rc<RefCell<...>> for frames themselves. And only dream about being able to invoke mem::swap(&mut arr[idx1], &arr[idx2]);.

Update: so there’s slice::split_at_mut() which does some of the things I want, thanks Tomas for pointing it out.

And it gets even more annoying when I try to initialise an array of codebooks for further user. The codebook structure does not implement Clone because there’s no good reason for it to be cloned or copied around, but when I initialise an array of them I cannot simply declare it and fill the contents in a loop, I have to resort to unsafe { arr = mem::uninitialized(); for i in 0..arr.len() { ptr::write(&arr[i], Codebook::new(...); } }. I know that if there’s an error creating new element compiler won’t be able to ensure that it drops only already initialised elements but it’s still a problem for compiler not being smart enough yet. Certain somebody had an idea of using generator to initialise arrays but I’m not sure even that will be implemented any time soon.

And speaking about cloning, why does compiler refuse to generate Clone trait for a structure that has a pointer to function?

And that’s why C is still the best language for systems programming—it still lets you to do what you mean (the problem is that most programmers don’t really know what they mean) without many magical incantations. Sure, it’s very good to have many common errors eliminated by design but when you can’t do basic things in a simple way then what it is good for?

Annoying things that cannot be fixed

  • type keyword. Since it’s a keyword it can’t be used as a variable name and many objects have type, you know. And it’s not always reasonable to give a longer name or rewrite using enum. Similar story with ref but I hardly ever need it for a variable name and ref_<something> works even better. Still, it would be better if language designers picked typedef instead of type;
  • Not being able to combine if let with some other condition (those nested conditions tend to accumulate rather fast);
  • Sometimes I fear that compilation time belongs to this category too.

Overall, Rust is not that bad and I’ll keep developing NihAV using it but keep in mind it’s still far from being perfect (maybe about as far as C but in a different direction).

P.S. I also find the phrase “rewrite in Rust” quite stupid. Rust seems to be significantly different from other languages, especially C, so while “Real Programmers can write FORTRAN program in any language” it’s better to use new language features to redesign interfaces and make new overall design instead of translating the same mistakes from the old code. That’s why NihAV will lurch where somebody might have stepped before but not necessarily using the existing roads.

Sunday, 30 July

So, despite work, heat, travels, and overall laziness, I’ve managed to complete more or less full-featured Indeo 4 and 5 decoder. That means that my own decoder decodes Indeo 4 and 5 files with all known frame types (including B-frames) and transforms (except DCT because there are no known samples using it) and even transparency!

Here are two random samples from Civilization II and Starship Titanic decoded and dumped as PGM (click for full size):

I’m not going to share the code for transparency plane decoding, it’s very simple (just RLE) and the binary specification is easy to read. The only gotchas are that it’s decoded as contiguous tile aligned to width of 32 (e.g. the first sample has width 332 pixels but the transparency tile is 352 pixels) and the dirty rectangles provided in the band header are just a hint for the end user, not a thing used in decoding.

This decoder was written mostly so that I can understand Indeo better and what can I say about it: Indeo 4/5 is about the same codec with some features fit for more advanced codecs of the later era. While the only things it reuses from the previous frames are pixels and band transform mode, it can reuse decoded quantisers and motion vectors from the first band for chroma bands and luma bands 1-3 in scalability mode too. It has variable block sizes (4×4, 8×8 and 8×8 in 16×16 macroblock) with various selectable transforms and scans (i.e. you can have 2D, row or column Slant, Haar or (theoretically) DCT and scans can be diagonal, horizontal or vertical too). And there were several frame types too: normal I-, P- and B-frames, droppable I- and P-frames, and droppable P-frame sequence (i.e. P-frames that reference the previous frame of such type or normal I/P-frame). Had it had proper stereo support, it’d be still as hot as ITU H.EVC.

The internal design between Indeo 4 and 5 differs in small details, like Indeo 4 having more frame types (like B-frames and droppable I-frames) — but Indeo 5 had introduced droppable P-frame sequence; picture and band headers differ between versions but (macro)block information and actual content decoding is the same (Indeo 5 does a bit trickier stuff with macroblock quantisers but that’s all). Also Indeo 4 had transparency information and different plane reconstruction (using Haar wavelet instead of 5/7 used in Indeo 5). So, in result my decoder was split into several modules reflecting the changes: and for codec-specific functions, for common structures and types (e.g. picture header, frame type and such), for transforms and motion compensation and for the actual decoding functions.

As with Intel H.263 decoder, Indeo 4/5 decoders provide implementations for IndeoXParser that parse picture header, band header and macroblock information and also recombine back plane in case it was coded as scalable. In result they store not so much information, just the codebooks used in decoding and for Indeo5 the common picture information that is stored only for I-frames (in other words, GOP info).

In result, here’s how Indeo 4 main decoding function looks like:

    fn decode(&mut self, pkt: &NAPacket) -> DecoderResult<NAFrameRef> {
        let src = pkt.get_buffer();
        let mut br = BitReader::new(src.as_slice(), src.len(), BitReaderMode::LE);

        let mut ip = Indeo4Parser::new();
        let bufinfo = self.dec.decode_frame(&mut ip, &mut br)?;
        let mut frm = NAFrame::new_from_pkt(pkt,, bufinfo);

with the actual interface for parser being

pub trait IndeoXParser {
    fn decode_picture_header(&mut self, br: &mut BitReader) -> DecoderResult<PictureHeader>;
    fn decode_band_header(&mut self, br: &mut BitReader, pic_hdr: &PictureHeader, plane: usize, band: usize) -> DecoderResult<BandHeader>;
    fn decode_mb_info(&mut self, br: &mut BitReader, pic_hdr: &PictureHeader, band_hdr: &BandHeader, tile: &mut IVITile, ref_tile: Option<Ref<IVITile>>, mv_scale: u8) -> DecoderResult<()>;
    fn recombine_plane(&mut self, src: &[i16], sstride: usize, dst: &mut [u8], dstride: usize, w: usize, h: usize);

And the nano-benchmarks:
the longest Indeo4 file I have around (00186002.avi) — nihav-tool 20sec, avconv 9sec plus lots of error messages;
Mask of Eternity opening (Indeo 5) — nihav-tool 8.1sec, avconv 4.1sec.
Return to Krondor intro (Indeo 5) — nihav-tool 5.8sec, avconv 2.9sec.
For other files it’s also consistently about two times slower but whatever, I was not trying to make it fast, I tried to make it work.

The next post should be either about the things that irritate me in Rust and make it not so good for codec implementing or about cooking.

Tuesday, 25 July

I did not want to have personal rants in my restarted blog but sometimes material just comes and presents itself.

As some of you might know, I prefer travelling by rail; yet sometimes I travel by plane because it’s faster. Most of those flights are semiannual flights to Sweden and an occasional flight to elenril-city. And here’s the list of unpleasant things I had with flights:

  • Planes being late for more than an hour (because of technical reasons) — two Lufthansa flights from Arlanda;
  • Plane being late just because — Aerosvit, once;
  • Baggage not loaded on plane — Cimber Sterling (aka Danish Aerosvit);
  • Flights cancelled because of strike — once SAS and once Lufthansa;
  • Flight being cancelled because of plane malfunction — Lufthansa once;
  • Flight being cancelled because they didn’t want to wait for the passengers — Lufthansa once (yup, people were waiting at the gate but they decided to skip boarding entirely and send the plane away without passengers);
  • Flight where I could not check in — Lufthansa once.

To repeat myself, most flights I make during the year are with SAS to/from Sweden though sometimes segment is operated by LH. So far return trips to Frankfurt with LH were mostly okay except for some delays but the last “flight” was something different.

I booked a flight FRA-PRG-FRA. The flight to Praha was cancelled because plane arrived to Frankfurt at least half an hour later than expected and after another hour it was decided it’s not good enough to fly again. Okay. So they could not find a replacement plane and rebooked me to flight at 22:15. Fine, but it turned out that I could reach Praha by train faster and cheaper (twice as cheap actually) so I decided not to wait.

Then the time for return flight came and I could not check in at all because they have modified something (that’s the message: “Cannot check in to your flight because of modifications, refer to Lufthansa counter.” And there’s no LH representative there. And if you can withstand their call centre, you’re a much better person than I am). So it was another train back to Germany (which also broke down in the middle of nowhere but at least it was resolved in an hour and a half). Maybe it’s because of the selected Cattle Lowcostish fare (Economy but without check-in baggage or seat selection) instead of the usual one but at least with SAS when I wasn’t able to take flight to Arlanda (because of Frankfurt Airport staff strike) I still had no problems flying back from there.

Call me picky (and I shan’t argue, I am picky) but I expect better statistics because the most irritating cases were happening with the certain company that I don’t fly with often and that’s comparable with SAS in quantity (but, sadly, not quality).

And that means I’ll avoid using it in the future even if that means not being able to get to some places by plane in reasonable time. There are still trains for me.

P.S. This rant is just to vent off my anger and frustration from the recent experience. And it should make me remember not to take Air Allemagne flights ever again.
P.P.S. Hopefully the next post will be more technical.

Sunday, 25 June

Obviously it moves very slowly: I spend most of my time on work, sleep, cooking and travelling around. Plus it was too hot to think or do anything productive.

Anyway, I’ve completed IMC/IAC decoder for NihAV. In case you’ve forgotten or didn’t care to find out at all, the names stand for Intel Music Coder and Indeo Audio software with IAC being slightly upgraded version of IMC that allows stereo and has tables calculated for every supported sample rate instead of the set of them precalculated for 22kHz. And despite what you might think it is rather complex audio codec that took a route of D*lby AC-3, G.722.1/RealAudio Cooker and CELT—parametric bit allocation codecs. It’s the kind of audio codecs that I dislike less than speech codecs but more than the rest because they have large and complex function that calculates how many bits/values should be spent on each individual coefficient or subband. In IMC/IAC case it gets even worse since the codec uses floating point numbers so the results are somewhat unstable between implementations and platforms (a bit more on that later). Oh, and this codec has I- and P-frames since some blocks are coded as independent and others are coded using information from the previous block.

Rust does not have much to do with C so you cannot simply copy-paste code and expect it to work and it’s against the principles of the project anyway. Side note: the only annoying Rust feature so far is array initialisation, I’d like to be able to fill array in a loop before using it without initialising array contents to some default value (which I can’t do for some types) or resorting to mem::uninitialized() and ptr::write(). Anyway, I had to implement my own version of the code so it’s structured a bit differently, has different names, uses bitstream reader in MSB16LE mode instead of block swapping and decodes most files I could test without errors unlike libavcodec—so it’s NIH all the way!

I wasted time mostly on validating my code against the binary specifications so this version actually decodes most files as intended while libavcodec fails to do that. To describe the problem briefly, it all comes from the same place: the codec first produces bit allocation for all bits still available then determines how to read flags for skipping coefficients in some bands, reads those flags and adjusts bit allocation for the number of bits freed by this operation; the problem is that bit allocation may go wrong and in result skip flags take more bits than the coefficients that would be coded otherwise and decoder would fail to adjust bit allocation for that case (it’s not supposed to do that in the specification) and will read more bits than the block contains. For the thirty-something IMC and IAC in AVI samples only one fails now for me because in bit allocation the wrong band gets selected for coefficient length decreasing. And the reason is the difference in the fourth or fifth digit after the decimal point in one array of constants that makes the wrong value minimum (and thus selected for coefficients length decreasing). Since it takes several minutes with gdb+mplayer2 to get information at this point (about at 10-second position in 14-second audio) I decided not to dig further.

Also I had to write other pieces of code like split-radix FFT, byte writer and WAV dumper that accepts audio packets and writes them with the provided ByteWriter.

P.S. Nanobenchmarks ahoy: decoding the longest IMC stream that I had (a bit more than two minutes) takes 0.124s with avconv and 0.09s with nihav-tool. Actual decoding functions take about the same time though Rust implementation is still faster by couple percents and my FFT implementation is slower (but on the other hoof it’s called for every frame since it decodes that file without errors).

P.P.S. So next is Indeo 4/5 with all wonderful features like scalable decoding, B-frames and transparency (that reminds me that Libav and ScummVM had a competition who would be the last to implement proper transparency support for Indeo 4, now they both might win). And then I’d probably go back to implementing the features I wanted: being able to tell the demuxer to discard and don’t demux certain streams, better streams reporting from the demuxer, seeking and decoder reset, frame reordering functionality, maybe WAV support too. And then maybe back to decoders. I want to have several codec families fully implemented, like RAD (Smacker, Bink and Bink2), Duck/On2 (TM1, TM-RT, TM2, TM2X, TM VP3, VP4, VP5, AVC, VP6 and VP7) and RealMedia (again). But I’m not in a hurry.

P.P.P.S. I’m not going to publish all source code but bits of it may be either posted when relevant or leaked to rust-av, its developer(s) has(have) shown some interest, so enquire there.

Saturday, 10 June

After a lot of procrastination I’ve finally more or less completed decoder for I.263 (Intel version of H.263) in NihAV.

It can decode I-, P- and PB-frames quite fine (though B-frames have some artefacts) and deblock them too (except B-frames because I’m too lazy for that). Let’s have a look at the overall structure of the decoder.

Obviously I’ve tried to make it modular but not exceeding the needs of H.263 decoder (i.e. I’m not going to extend the code to make it work with MPEG-2 part 2 and similar though some code might be reused), so it’s been split into several modules. Here’s a walk on all modules and their functionality review.


This module has three public functions:

pub fn put_blocks(buf: &mut NAVideoBuffer<u8>, xpos: usize, ypos: usize, blk: &[[i16;64]; 6]) 
pub fn add_blocks(buf: &mut NAVideoBuffer<u8>, xpos: usize, ypos: usize, blk: &[[i16;64]; 6])
pub fn copy_blocks(dst: &mut NAVideoBuffer<u8>, src: &NAVideoBuffer<u8>,
                   dx: usize, dy: usize, sx: isize, sy: isize, bw: usize, bh: usize,
                   preborder: usize, postborder: usize,
                   mode: usize, interp: &[fn(&mut [u8], usize, &[u8], usize, usize, usize)])

One puts blocks on the framebuffer, another one adds them to the already present pixels and the third one does motion compensation by checking the source block position, filling missing data if block is (partly) located past the edge and then invoking the actual interpolation function.


This is the core module for H.263-related decoding, the root module contains some declarations that are used by the decoder:

pub trait BlockDecoder {
    fn decode_pichdr(&mut self) -> DecoderResult<PicInfo>;
    fn decode_slice_header(&mut self, pinfo: &PicInfo) -> DecoderResult<Slice>;
    fn decode_block_header(&mut self, pinfo: &PicInfo, sinfo: &Slice) -> DecoderResult<BlockInfo>;
    fn decode_block_intra(&mut self, info: &BlockInfo, quant: u8, no: usize, coded: bool, blk: &mut [i16; 64]) -> DecoderResult<()>;
    fn decode_block_inter(&mut self, info: &BlockInfo, quant: u8, no: usize, coded: bool, blk: &mut [i16; 64]) -> DecoderResult<()>;
    fn is_slice_end(&mut self) -> bool;

    fn filter_row(&mut self, buf: &mut NAVideoBuffer<u8>, mb_y: usize, mb_w: usize, cbpi: &CBPInfo);

This is the interface for codec-specific functions. The main decoder core is provided with an instance of it and calls it for the bitstream parsing while the main logic is contained in codecs::h263::decoder. I wanted to make it as stateless as possible (i.e. in the best case it simply contains bitstream reader and all other information is passed through *Info structures).

pub enum Type { I, P, Skip, Special }
pub struct PicInfo { ... }
pub struct PBInfo { ... }
pub struct Slice { ... }
pub struct BlockInfo { ... }
pub struct BBlockInfo { ... }
pub struct MV { x: i16, y: i16, }
pub struct CBPInfo { ... }

The structures used to pass information between bitstream reading part and the main decoder: picture information, current slice information, current block information plus the option B-part of such. Also there’s CBPInfo which stores information about which blocks were coded and with what quantiser for the last two rows—this information is used during the deblocking.


pub fn h263_idct(blk: &mut [i16; 64])
pub const H263_INTERP_FUNCS: &[fn(...
pub const H263_INTERP_AVG_FUNCS: &[fn(...

H.263 specific functions: IDCT and halfpel motion compensation (put and average).


This module contains various tables used by H.263-based decoders, mostly codebook description for e.g. CBP, MV or coefficient codes.


The core for H.263-based decoders implemented in pub struct H263BaseDecoder { } with the following functions:

impl H263BaseDecoder {
    pub fn new() -> Self { ... }
    pub fn is_intra(&self) -> bool { self.ftype == Type::I }
    pub fn get_dimensions(&self) -> (usize, usize) { (self.w, self.h) }
    pub fn parse_frame(&mut self, bd: &mut BlockDecoder) -> DecoderResult<NABufferType> { ... }
    pub fn get_bframe(&mut self) -> DecoderResult<NABufferType> { ... }

Yes, that’s it. parse_frame() decodes a frame when possible plus saves B-frame data in an array of B-frame macroblock descriptors (motion vectors plus restored coefficients) so that later it can render B-frame in get_bframe(). Plus it has struct MVInfo {...} that holds motion vectors for the current macroblock row plus bottom part of the previous row. Luckily there are no standalone B-frames in H.263 so I don’t need to keep more information than that.

Internally, the decoder core simply calls codec bitstream parser via provided BlockDecoder interface, reconstructs motion vectors, performs motion compensation and outputs blocks (plus in-loop filtering, plus preparing B-frame information for later reconstruction).


The module that actually implements I.263-specific bits. Here you have Intel263Decoder implementing NADecoder and see how its decoding function looks like:

    fn decode(&mut self, pkt: &NAPacket) -> DecoderResult<NAFrameRef> {
        let src = pkt.get_buffer();

        if src.len() == 8 {
            let bret = self.dec.get_bframe();
            let buftype;
            let is_skip;
            if let Ok(btype) = bret {
                buftype = btype;
                is_skip = false;
            } else {
                buftype = NABufferType::None;
                is_skip = true;
            let mut frm = NAFrame::new_from_pkt(pkt,, buftype);
            frm.set_frame_type(if is_skip { FrameType::Skip } else { FrameType::B });
            return Ok(Rc::new(RefCell::new(frm)));
        let mut ibr = Intel263BR::new(&src, &self.tables);

        let bufinfo = self.dec.parse_frame(&mut ibr)?;

        let mut frm = NAFrame::new_from_pkt(pkt,, bufinfo);
        frm.set_frame_type(if self.dec.is_intra() { FrameType::I } else { FrameType::P });

and here’s the decoder structure definition:

struct Intel263Decoder {
    info:    Rc<NACodecInfo>,
    dec:     H263BaseDecoder,
    tables:  Tables,

This decoder simply initialises all tables needed for the decoding (that are later passed to an instance of Inter263BR that implements BlockDecoder and parses the actual stream) plus the instance of H.263 base decoder that actually does all the heavy work. I could’ve implemented e.g. Sorenson Spark (aka FLV1) decoder in the same way (by adding the new file very similar to src/codecs/h263/ only with FLV1-specific bitstream parsing) but I’ll leave that to rust-av, they have FLV demuxer after all.

Actually I don’t plan writing any other H.263-based decoder though I might do WMV3 later that will support beta streams properly. Indeo 4 and 5 are probably next and IAC/IMC. Or something completely different, it’s not like I have a roadmap to follow.

P.S. On the longest sample I have (320×240, 123 seconds) it decodes all frames in 2 seconds while avconv does it in 0.94 seconds (1.5 seconds with -cpuflags 0), so I’m quite fine with the result.

Sunday, 04 June

So I’ve decided to implement container format detection for NihAV. This is a work of progress and I’m pretty sure I’ll change it later but it should do for now.

The main principles are quite simple: formats are detected by extension and by the contents, so there’s a score for it:

pub enum DetectionScore {

I don’t see why some format should not be detected properly if demuxer for it is disabled or not implemented at all. So in NihAV there’s a specific detect module that offers just one function:

pub fn detect_format(name: &str, src: &mut ByteReader) -> Option< (&'static str, DetectionScore)>;

It takes input filename and source stream reader and then tries to determine whether some format matches and returns format name and detection score on success (or nothing otherwise). I might add probing individual format later if I feel like it.

Before I explain how detection works let me quote the source of the detection array (in hope that it will explain a lot by itself):

const DETECTORS: &[DetectConditions] = &[
    DetectConditions {
        demux_name: "avi",
        extensions: ".avi",
        conditions: &[CheckItem{offs: 0,
                                cond: &CC::Or(&CC::Str(b"RIFF"),
                                              &CC::Str(b"ON2 ")) },
                      CheckItem{offs: 8,
                                cond: &CC::Or(&CC::Or(&CC::Str(b"AVI LIST"),
                                              &CC::Str(b"ON2fLIST")) },
    DetectConditions {
        demux_name: "gdv",
        extensions: ".gdv",
        conditions: &[CheckItem{offs: 0,
                                cond: &CC::Eq(Arg::U32LE(0x29111994))}],

So what is the way to detect format? First the name is matched to see whether one of the listed extensions fits, then the file contents are checked for markers inside. These checks are descriptions like “check that at offset X there’s data of type <type> that (equals/less than/greater than) Y”. Also you can specify several alternative checks for the same offset and there’s range check condition too.

This way I can describe most sane formats, like “if at offset 1024 you have tag M.K. then it’s ProTracker module” or “if it starts with BM and 16-bit LE value here is less than this and here it’s in range 1-16 then this must be BMP”.

One might wonder how well it would work on MP3s renamed to “.acm” (IIRC one game did that). I’ll reveal the secret: it won’t work at all. Dealing with raw streams is actually beside format detector because it is raw stream and not a container format. You can create raw stream demuxer, then try all possible chunkers to see which one fit but that is stuff for the upper layer (maybe it will be implemented there inside the input stream handling function eventually). NihAV is not a place for automagic things.

Saturday, 03 June

This is rather controversial topic because different countries recognize different kinds of cheese let alone what can be made out of it so what bears the name “cheese cake/pie” in one country might be not recognized as such in another.

So, cheese. Depending on country you have either one or two categories of cheese recognized: so called cottage cheese (or Quark/kvarg in Germanic language countries) and the rest of hard or semi-hard products made of milk. There’s also Italy where some cheeses (like mozzarella, provolone or scamorza) are considered to be pasta but that’s Italy and it doesn’t deserve second mention in this post.

Cottage cheese can be also divided into two categories: grainy and homogeneous mass. The first kind is more common in Eastern Europe (I’ve seen it in Ukraine, Czechia and Hungary for example; it can be also found in Germany but only in rather small packaging and runny), the second kind is more common in Germany.

The conventional hard or semi-hard cheese can be made into a pie usually by grating it, mixing with cream and eggs or sour cream and baking.

And of course there’s USA where what they call cheesecake is made (if you believe Wickedpedia) from either cream cheese (i.e. product where cheese-making process was terminated halfway) or ricotta (made from whey instead of milk, so not a cheese either).

Now, let’s look at real cheese cakes/pies I’ve encountered so far or even made myself:

  • Ukraine — there’s a traditional Ukrainian dish ??????? (if Cyrillic letters are not displayed correctly then ask Mike when he fixes it), patties made from grainy cottage cheese mixed with semolina or millet and flour and fried. Those I like and approve;
  • Germany — there are two similar variation of what is called käsekuchen(literally cheese cake). In both cases it’s mostly Quark (homogeneous cottage cheese) mixed with semolina and baked, in one case they’re also made more cake-like by mixing milk and starch and adding pieces of tangerine. This variation I bake myself time from time, it goes even better with a bit of sour cream (Schmand) or gräddfil on top;
  • Switzerland — there they have Chäschueche(essentially käsekuchen pronounced in Swiss German) which is obviously nothing like its German counterpart. Instead we have a small tart made from local chäs(semi-hard semi-sticky Swiss cheese with stinky rind) that’s rather savoury instead of sweet. I’ve tried them once, found them edible but not something spectacular;
  • Sweden — this country has ostkaka(literally cheese cake) which can be described as an interesting cheese that was too good to wait for it so it was baked instead of ripening all the way. Obviously I buy it when possible and eat with lingon jam, there’s especially good version available in Jul season;
  • Sweden — there’s not enough of it! Sweden also offers västerbottensostpaj(or simply västerbottenpaj) which is a quiche-like pie with filling made from the best cheese in the world (from Burträsk obviously) combined with eggs and cream (I should try gräddfil instead) and baked. I enjoy them both in Sweden and sometimes bake it myself (when I have The Cheese) because it’s worth it.

And an the end several fun facts:

  • German name for cottage cheese (Quark) is most likely the one that got into Finnegans Wake, from which it was borrowed later for certain physical term (though physicists playing stringed models refuse to acknowledge that concept);
  • in Czechia grainy cottage cheese (tvaroh) is sold in pressed triangles, if you wrap a cabbage leaf around it you can troll Japanophiles that it’s local onigiri (like I did once);
  • in Sweden they actually have different names for grainy cottage cheese (called “cottage cheese”) and homogeneous one (called “kvarg”);
  • and in Ukraine it’s all called simply “cheese” (maybe because hard cheese was not common in Ukraine, only hard cheese-like product sold in Soviet times).

Okay, back to doing nothing.

Thursday, 01 June

Looks like I’m going to repeat the same things over and over in every NihAV-related post so I’d better sum them up and whenif people ask why some decision was made like that I can point them here.

So, let’s start with what NihAV IS. NihAV is the project started by me and me alone with the following goals:

  • design multimedia framework from the ground in the way I see fit (hence the NIH in the name);
  • do that without any burden of legacy (should be obvious why);
  • implement real working code to both test the concepts and to keep me interested in continuing the project (it gets boring pretty quickly when you design, write code and it still does not do anything visible at all);
  • ignore bullshit cases like interlaced H.264 (the project is written by me and for myself and I’ll do fine without it, thank you very much);
  • let me understand Rust better (it’s not that important but a nice bonus nevertheless).

Now what NihAV is NOT and is NOT going to be:

  • a full-stack multimedia framework (i.e. which lacks only handling user input and audio/video output to become a media player too, more about it below);
  • transcoder for all your needs (first, I hardly care about my own needs; second, transcoder belongs elsewhere);
  • supporting things just because they’re standard (you can leave your broadcasting shit to yourself, including but not limited to MXF, interlacing and private streams in MPEG-TS);
  • designed with the most convenient way of usage for the end user (e.g. in frame management I already output dummy frames that merely signal there was no change from the previous frame; also frame reordering will be implemented outside decoders);
  • have other FFeatures just because some other project has them;
  • depend on many other crates (that’s the way of NIH!);
  • have hacks to support some very special cases (I’m not going to be paid for e.g. fixing AVI demuxer to support some file produced by a broken AVI writer anyway).

What it might become is a foundation for higher level multimedia data management which in turn can be either a library for building transcoder/player or just used directly in such tools. IMO libav* has suffered exactly from the features that should be kept in transcoder creeping into the libraries, the whole libavdevice is an example of that. Obviously it takes some burden off library users (including transcoding tool developers) but IMO library should be rather finished piece with clearly defined functionality, not a collection of code snippets developers decided to reuse or share with the world. Just build another layer (not wrapper, functional layer!) on top of it.

For similar reasons I’m not going to hide serious functionality in utility code or duplicate it in codecs. In NihAV frames will be output in the same order as received and reordering for the display will be done in specific frame reorderer (if needed), same for filling missing timestamps; dummy frame that tells just to repeat the previous frame is used there in GDV decoder already:

    let mut frm = NAFrame::new_from_pkt(pkt,, NABufferType::None);

Some things do not belong to NihAV because they are either too low-level (like protocols) or too high-level (subtitles rendering, stream handling for e.g. transcoding or playback, playlist support). Some of them deserve to be made into separate library(ies) later, others should be implemented by the end user. Again, IMO libav* suffers from exactly this mix of low- and medium-level stuff that feels too low-level and not low-level enough at the same time (just look how much code those ffmpeg or avconv tools have). Same goes for hardware-accelerated decoding where the library should just demux frame data and parse its headers, the rest is up to hwaccel chain in the end application, but instead lazy users prefer libavcodec to try all possible hwaccels on the frame and fall back to multithreaded software decoding automatically if required. And preferably all other processing in e.g. libavfilter should be done using custom hwaccel format too. Since I’m all for this approach (…NOT), NihAV will recognize that the frame uses some hwaccel format and that’s all. It’s up to the upper layer to build custom processing chain.

I hope the domain for NihAV is clear: it will take ByteIO input, demux data using it (packets or elementary stream chunks—if you want them in packet format then use a parser), optionally fill timestamp information, decode frames, reorder them in display order if requested, similar approach for writing data. Anything else will belong to other crates (and they might appear in the future too). But for now this is enough for me.

P.S. If I wanted to have multimedia player I’d write one that can take MP4/FLAC/WV for input and decode AAC/FLAC/WavPack plus feed H.264 to VAAPI. I know my hardware and my content, others can write their own players.

P.P.S. If you want multimedia framework written in Rust for wide userbase just wait until rust-av is ready.

Wednesday, 31 May

For testing how well NihAV handles palettised formats I’ve decided to add support for Gremlin Digital Video format (8-bit only). So now I can decode various cutscenes from Normality, one of very few 3D first person adventure games for DOS. I’ve tested my implementation and it works fine.

The funny thing is that this demuxer and decoder for GDV (actually there’s also GDV DPCM but the samples I have seem to use raw PCM) are missing from CEmpeg. Wiki description also has some parts missing.

The first frame I was decoding started with a code for copying 8 bytes from offset -56. The first frame. At the very first pixel. So I’ve consulted the VAG’s code and the original binary specification (even by dumping executed instructions in DosBox and analysing them—it helped me in debugging later) to see where it went wrong. And it turns out the decoder is really supposed to do that because it has specially initialised buffer before the actual frame data (kinda like the original LZHUF did, also there’s no need to check if we copy before the buffer start since it’s not possible) plus some other small issues. I’ll try to correct the Wiki article on GDV in the following days.

And I don’t really plan to add any other old game codecs beside VMD and Smacker (I have soft spot for them after all). Next decoders should be either for audio or more modern ones, like H.26x or Indeo 4/5 since I still have some ideas to test out.

Update to to this update: my decoder code is here.

Saturday, 27 May

It might be hard to believe but the number of decoders in NihAV has tripled! So now there are three codecs supported in NihAV: Intel Indeo 2, Intel Indeo 3 and PCM.

Before I talk about the design I’d like to say some things about Indeo 3 implementation. Essentially it’s an improvement over Indeo 2 that had simple delta compression—now deltas are coming from one of 21 codebooks and can be applied to both pairs and quads of pixels, there is motion compensation and planes are split into cells that use blocks for coding data in them (4×4, 4×8 or 8×8 blocks). libavcodec had two versions of the decoder: the first version was submitted anonymously and looks like it’s a direct translation of disassembly for XAnim; the second version is still based on some binary specifications but also with some information coming from the Intel patent. The problem is that those two implementations are both rather horrible to translate directly into Rust because of all the optimisations like working with a quad of pixels as 32-bit integer plus lots of macros and overall control flow like a maze of twisty little passages. In result I’ve ended with three main structures: Indeo3Decoder for main things, Buffers for managing the internal frame buffers and doing pixel operations like block copy and CellDecParams for storing current cell decoding parameters like block dimensions, indices to the codebooks used and pointers to the functions that actually apply deltas or copy the lines for the current block (for example there are two different ways to do that for 4×8 block).

Anyway, back to overall NihAV design changes. Now there’s a dedicated structure NATimeInfo for keeping DTS, PTS, frame duration and timebase information; this structure is used in both NAFrame and NAPacket for storing timestamp information. And NAFrame now is essentially the wrapper for NATimeInfo, NABufferType plus some metadata.

So what is NABufferType? It is the type-specific frame buffer that stores actual data:

pub enum NABufferType {
    Video      (NAVideoBuffer<u8>),
    Video16    (NAVideoBuffer<u16>),
    AudioU8    (NAAudioBuffer<u8>),
    AudioI16   (NAAudioBuffer<i16>),
    AudioI32   (NAAudioBuffer<i32>),
    AudioF32   (NAAudioBuffer<f32>),
    Data       (NABufferRefT<u8>),

As you can see it declares several types of audio and video buffers. That’s because you don’t want to mess with bytes in many cases: if you decode 10-bit video you’d better output pixels directly into 16-bit elements, same with audio; for the other cases there’s AudioPacked/VideoPacked. To reiterate: the idea is that you allocate buffer of specific type and output native elements into it (floats for AudioF32, 16-bit for packed RGB565/RGB555 formats etc. etc.) and the conversion interface or the sink will take care of converting data into designated format.

And here’s how audio buffer looks like (video buffer is about the same but doesn’t have channel map):

pub struct NAAudioBuffer<T> {
    info:   NAAudioInfo,
    data:   NABufferRefT<T>,
    offs:   Vec<usize>,
    chmap:  NAChannelMap,

impl<T: Clone> NAAudioBuffer<T> {
    pub fn get_offset(&self, idx: usize) -> usize { ... }
    pub fn get_info(&self) -> NAAudioInfo { }
    pub fn get_chmap(&self) -> NAChannelMap { self.chmap.clone() }
    pub fn get_data(&self) -> Ref<Vec<T>> { }
    pub fn get_data_mut(&mut self) -> RefMut<Vec<T>> { }
    pub fn copy_buffer(&mut self) -> Self { ... }

For planar audio (or video) get_offset() allows caller to obtain the offset in the buffer to the requested component (because it’s all stored in the single buffer).

There are two functions for allocating buffers:

pub fn alloc_video_buffer(vinfo: NAVideoInfo, align: u8) -> Result<NABufferType, AllocatorError>;
pub fn alloc_audio_buffer(ainfo: NAAudioInfo, nsamples: usize, chmap: NAChannelMap) -> Result<NABufferType, AllocatorError>;

Video buffer allocated buffer in the requested format with the provided block alignment (it’s for the codecs that actually code data in e.g. 16×16 macroblocks but still want to report frame having e.g. width=1366 or height=1080 and if you think that it’s better to constantly confuse avctx->width with avctx->coded_width then you’ve forgotten this project name). Audio buffer allocator needs to know the length of the frame in samples instead.

As for subtitles, they will not be implemented in NihAV beside demuxing the stream with subtitle data. I believe subtitles are the dependent kind of stream and because of that they should be rendered by the consumer (video player program or whatever). Otherwise you need to take, say, RGB-encoded subtitles, convert them into proper YUV flavour and draw in the specific region of the frame which might be not the original size if you use e.g. DVD rip encoded into different size with DVD subtitles preserved as is. And for textual subtitles you have even more rendering problems since you need to render them with proper font (stored as the attachment in the container), apply using the proper effect, adjust positions if needed and such. Plus the user may want to adjust them during playback in some way so IMO it belongs to the rendering pipeline and not NihAV (it’s okay though, you’re not going to use NihAV anyway).

Oh, and PCM “decoder” just rewraps buffer provided by NAPacket as NABufferType::AudioPacked, it’s good enough to dump as is and the future resampler will take care of format conversion.

No idea what comes next: maybe it’s Indeo audio decoders, maybe it’s Indeo 4/5 video decoder or maybe it’s deflate unpacker. Or something completely different. Or nothing at all. Only the time will tell.

Saturday, 20 May

I don’t like to write the code that does nothing, it’s the excitement of my code doing at least something that keeps me writing code. So instead of designing a lot of new interfaces and such that can describe all theoretically feasible stuff plus utility code to handle the things passed through aforementioned interfaces, I’ve just added some barely working stuff, wrote a somewhat working demuxer and made a decoder.

And here it is:

use io::bitreader::*;
use io::codebook::*;
use formats;
use super::*;

static INDEO2_DELTA_TABLE: [[u8; 256]; 4] = [
      0x80, 0x80, [...the rest is skipped for clarity...]

static INDEO2_CODE_CODES: &[u16] = &[
    0x0000, 0x0004, [...the rest is skipped for clarity...]

static INDEO2_CODE_LENGTHS: &[u8] = &[
     3,  3,  [...the rest is skipped for clarity...]

struct IR2CodeReader { }

impl CodebookDescReader<u8> for IR2CodeReader {
    fn bits(&mut self, idx: usize) -> u8  { INDEO2_CODE_LENGTHS[idx] }
    fn code(&mut self, idx: usize) -> u32 { INDEO2_CODE_CODES[idx] as u32 }
    fn sym (&mut self, idx: usize) -> u8 {
        if idx < 0x7F { (idx + 1) as u8 } else { (idx + 2) as u8 }
    fn len(&mut self) -> usize { INDEO2_CODE_LENGTHS.len() }

struct Indeo2Decoder {
    info:    Rc<NACodecInfo>,
    cb:      Codebook<u8>,
    lastfrm: Option<Rc<NAFrame>>,

impl Indeo2Decoder {
    fn new() -> Self {
        let dummy_info = Rc::new(DUMMY_CODEC_INFO);
        let mut coderead = IR2CodeReader{};
        let cb = Codebook::new(&mut coderead, CodebookMode::LSB).unwrap();
        Indeo2Decoder { info: dummy_info, cb: cb, lastfrm: None }

    fn decode_plane_intra(&self, br: &mut BitReader,
                          frm: &mut NAFrame, planeno: usize,
                          tableno: usize) -> DecoderResult<()> {
        let offs = frm.get_offset(planeno);
        let (w, h) = frm.get_dimensions(planeno);
        let stride = frm.get_stride(planeno);
        let cb = &self.cb

        let mut buffer = frm.get_buffer_mut().unwrap();
        let mut data = buffer.get_data_mut().unwrap();
        let mut framebuf: &mut [u8] = data.as_mut_slice();

        let table = &INDEO2_DELTA_TABLE[tableno];

        let mut base = offs;
        let mut x: usize = 0;
        while x < w {
            let idx = br.read_cb(cb)? as usize;
            if idx >= 0x80 {
                let run = (idx - 0x80) * 2;
                if x + run > w { return Err(DecoderError::InvalidData); }
                for i in {
                    framebuf[base + x + i] = 0x80;
                x += run;
            } else {
                framebuf[base + x + 0] = table[(idx * 2 + 0) as usize];
                framebuf[base + x + 1] = table[(idx * 2 + 1) as usize];
                x += 2;
        base += stride;
        for _ in 1..h {
            let mut x: usize = 0;
            while x < w {
                let idx = br.read_cb(cb)? as usize;
                if idx >= 0x80 {
                    let run = (idx - 0x80) * 2;
                    if x + run > w { return Err(DecoderError::InvalidData); }
                    for i in {
                        framebuf[base + x + i] = framebuf[base + x + i - stride];
                    x += run;
                } else {
                    let delta0 = (table[idx * 2 + 0] as i16) - 0x80;
                    let delta1 = (table[idx * 2 + 1] as i16) - 0x80;
                    let mut pix0 = framebuf[base + x + 0 - stride] as i16;
                    let mut pix1 = framebuf[base + x + 1 - stride] as i16;
                    pix0 += delta0;
                    pix1 += delta1;
                    if pix0 < 0 { pix0 = 0; }
                    if pix1 < 0 { pix1 = 0; }
                    if pix0 > 255 { pix0 = 255; }
                    if pix1 > 255 { pix1 = 255; }
                    framebuf[base + x + 0] = pix0 as u8;
                    framebuf[base + x + 1] = pix1 as u8;
                    x += 2;
            base += stride;

    fn decode_plane_inter(&self, br: &mut BitReader,
                          frm: &mut NAFrame, planeno: usize,
                          tableno: usize) -> DecoderResult<()> {
        let offs = frm.get_offset(planeno);
        let (w, h) = frm.get_dimensions(planeno);
        let stride = frm.get_stride(planeno);
        let cb = &self.cb

        let mut buffer = frm.get_buffer_mut().unwrap();
        let mut data = buffer.get_data_mut().unwrap();
        let mut framebuf: &mut [u8] = data.as_mut_slice();

        let table = &INDEO2_DELTA_TABLE[tableno];

        let mut base = offs;
        for _ in 0..h {
            let mut x: usize = 0;
            while x < w {
                let idx = br.read_cb(cb)? as usize;
                if idx >= 0x80 {
                    let run = (idx - 0x80) * 2;
                    if x + run > w { return Err(DecoderError::InvalidData); }
                    x += run;
                } else {
                    let delta0 = (table[idx * 2 + 0] as i16) - 0x80;
                    let delta1 = (table[idx * 2 + 1] as i16) - 0x80;
                    let mut pix0 = framebuf[base + x + 0] as i16;
                    let mut pix1 = framebuf[base + x + 1] as i16;
                    pix0 += delta0 * 3 >> 2;
                    pix1 += delta1 * 3 >> 2;
                    if pix0 < 0 { pix0 = 0; }
                    if pix1 < 0 { pix1 = 0; }
                    if pix0 > 255 { pix0 = 255; }
                    if pix1 > 255 { pix1 = 255; }
                    framebuf[base + x + 0] = pix0 as u8;
                    framebuf[base + x + 1] = pix1 as u8;
                    x += 2;
            base += stride;

const IR2_START: usize = 48;

impl NADecoder for Indeo2Decoder {
    fn init(&mut self, info: Rc<NACodecInfo>) -> DecoderResult<()> {
        if let NACodecTypeInfo::Video(vinfo) = info.get_properties() {
            let w = vinfo.get_width();
            let h = vinfo.get_height();
            let f = vinfo.is_flipped();
            let fmt = formats::YUV410_FORMAT;
            let myinfo = NACodecTypeInfo::Video(NAVideoInfo::new(w, h, f, fmt));
   = Rc::new(NACodecInfo::new_ref(info.get_name(), myinfo, info.get_extradata()));
        } else {
    fn decode(&mut self, pkt: &NAPacket) -> DecoderResult<Rc<NAFrame>> {
        let src = pkt.get_buffer();
        if src.len() <= IR2_START { return Err(DecoderError::ShortData); }
        let interframe = src[18];
        let tabs = src[34];
        let mut br = BitReader::new(&src[IR2_START..], src.len() - IR2_START, BitReaderMode::LE);
        let luma_tab = tabs & 3;
        let chroma_tab = (tabs >> 2) & 3;
        if interframe != 0 {
            let mut frm = NAFrame::new_from_pkt(pkt,;
            for plane in 0..3 {
                let tabidx = (if plane == 0 { luma_tab } else { chroma_tab }) as usize;
                self.decode_plane_intra(&mut br, &mut frm, plane, tabidx)?;
            let rcf = Rc::new(frm);
            self.lastfrm = Some(rcf.clone());
        } else {
            let lf = self.lastfrm.clone();
            if let None = lf { return Err(DecoderError::MissingReference); }
            let lastfr = lf.unwrap();
            let mut frm = NAFrame::from_copy(lastfr.as_ref());
            for plane in 0..3 {
                let tabidx = (if plane == 0 { luma_tab } else { chroma_tab }) as usize;
                self.decode_plane_inter(&mut br, &mut frm, plane, tabidx)?;
            let rcf = Rc::new(frm);
            self.lastfrm = Some(rcf.clone());

pub fn get_decoder() -> Box<NADecoder> {

mod test {
    use codecs;
    use demuxers::*;
    use frame::NAFrame;
    use io::byteio::*;
    use std::fs::File;
    use std::io::prelude::*;

    fn test_indeo2() {
        let avi_dmx = demuxers::find_demuxer("avi").unwrap();
        let mut file = File::open("assets/laser05.avi").unwrap();
        let mut fr = FileReader::new_read(&mut file);
        let mut br = ByteReader::new(&mut fr);
        let mut dmx = avi_dmx.new_demuxer(&mut br);;
        let mut dec = (codecs::find_decoder("indeo2").unwrap())();

        let mut str: u32 = 42;
        for i in 0..dmx.get_num_streams() {
            let s = dmx.get_stream(i).unwrap();
            let info = s.get_info();
            if info.is_video() && info.get_name() == "indeo2" {
                str = s.get_id();

        loop {
            let pktres = dmx.get_frame();
            if let Err(e) = pktres {
                if (e as i32) == (DemuxerError::EOF as i32) { break; }
            let pkt = pktres.unwrap();
            if pkt.get_stream().get_id() == str {
                let frm = dec.decode(&pkt).unwrap();
                write_pgmyuv(pkt.get_pts().unwrap(), &frm);

    fn write_pgmyuv(num: u64, frm: &NAFrame) {
        [...]skipped for clarity...]

(In case you wonder what are all those .unwrap() for, Rust doesn’t have NULL pointers and uses other means like enum Option which can be either None or Some(x), so in order to access contents you have to unwrap it. Same for results where you can either get requested output or some error.)

Anyway, if you look at the end of the code (at the test function) you can see how it should work in principle:

  1. you request demuxer by name (in the future it will be possible to get demuxer for MIME type or file extension plus probing);
  2. you create a new demuxer instance for certain ByteReader input (in the future it should be easy to add chained demuxers);
  3. you try opening input (demuxer reads header then);
  4. you scan the streams declared by demuxer and decide how to handle them;
  5. you request decoder(s) for the provided stream(s) in the same fashion as demuxers;
  6. you loop until demuxer gives an error or ends demuxing and feed packets from proper stream to the decoder and do whatever you like with the output.

Decoder submodule exports just get_decoder() function which is used in the main module to create demuxer instances on request (the example of usage is in the code above):

pub struct DecoderInfo {
    name: &'static str,
    get_decoder: fn () -> Box,

const DECODERS: &[DecoderInfo] = &[
    DecoderInfo { name: "indeo2", get_decoder: indeo2::get_decoder },

pub fn find_decoder(name: &str) -> Option<fn () -> Box<NADecoder>> {
    for &dec in DECODERS {
        if == name {
            return Some(dec.get_decoder);

The data structures are quite nested: NAFrame and NAPacket have a pointer to NACodecInfo, which contains codec name, possible extradata and codec type information. That codec type information has type-specific information tied to the type itself:

pub enum NACodecTypeInfo {

Where NAVideoInfo includes (currently) frame dimensions and NAPixelFormaton (salvaged from the NAScale which I described long time ago). NAAudioInfo has number of channels, sample rate, block size length and NASoniton following the same model but for audio sample:

pub struct NASoniton {
    bits:       u8,
    be:         bool,
    packed:     bool,
    planar:     bool,
    float:      bool,
    signed:     bool,

Here you have sample size (in bits) and its type (signed/unsigned integer or float, in BE/LE format) and two confusing flags: packed is for signalling that individual samples are stored in packed form or not, that matters only if you have e.g. 20-bit samples that can be stored in 24 bits individually or two samples crammed into 5 bytes; and planar is for signalling whether channel data is stored in separate buffers or interleaved in single buffer. Since I don’t care much about audio at this stage the finer details about obtaining proper buffers and managing proper channel maps are left for the later.

Also as I said in the original NihAV manifest, decoders and demuxers are distinguished by text names because I strongly dislike enumerations spanning several screens. So AVI demuxer calls register::find_codec_from_avi_fourcc() or register::find_codec_from_wav_twocc() and it will return the codec name as a string; you can use that string to search for an appropriate decoder or retrieve known codec information from the same registry.

That’s all for now, the next things that are likely to happen (in no particular order):

  • refactoring data structures, moving them between modules and adding more utility code;
  • work on proper audio support;
  • work on proper video frame management (especially ownership);
  • some Indeo video or audio decoder;
  • utility code for more automated demuxer output handling (automatic stream skipping, better demuxer assignment and such);
  • anything else.

Until next time.

Sunday, 30 April

I decided to write SIMD optimizations for HEVC decoder inverse transform (which is IDCT approximation) for ARMv7. (Here is an interesting post about DCT.) The inverse transform for HEVC operates on 4x4, 8x8, 16x16 and 32x32 blocks and I have finished them recently. For each block there are 2 functions, one for 8 bitdepth and the other for 10 bitdepth:

  • 4x4 block: ff_hevc_idct_4x4_8/10_neon, the speed up vs C code on A53 core ~3x
  • 8x8 block: ff_hevc_idct_8x8_8/10_neon, speed up (A53) ~4x (github)
  • 16x16 block: ff_hevc_idct_16x16_8/10_neon, speed up (A53) ~8x (github)
  • 32x32 block: ff_hevc_idct_32x32_8/10_neon, speed up (A53) ~13x (github)
Here are some things I learned about NEON ARMv7.
  • The values of q4 - q7 have to be preserved (with vpush/vpop {q4-q7} ) when one wants to use them. VPUSH/VPOP pushes and pops to/from stack.
  • Try to do things in parallel because many of the smaller ARM cores do not have out-of-order execution like x86 does.
  • Do not forget to preserve LR (link register) value when calling a function. When LR value is preserved it can be used as any other GPR.
  • If LR value was pushed to stack, one does not have to do pop lr and then bx lr to return but it's better to return with simply pop {pc} .
  • Use VSHL (Vector Shift Left) instead of VMUL (Vector multiply by scalar) when possible, it's much faster. (The same is valid in general, for example for x86.)
  • Align loads/stores when possible, it's faster.
  • To align the stack and allocate a temporary buffer there (rx is some GPR) mov rx, sp and rx, sp, #15 add rx, rx, #buffer_size sub sp, sp, rx sp now points to the buffer. After using the buffer, the stack pointer has to be restored with add sp, sp, rx .
  • Try to keep functions small. If it is needed to call some big macro several times, make a function of such a macro. Too big futions may fail to build and may hurt performance.

  • Always try to play with the instruction order when it is possible and benchmark the results. But what improves the performance on one core (mine is A53) may cause (or may not) a slowdown on some other core (A7, A8, A9).
Many of the things I learned could be found in ARM Architecture Reference Manual ARMv7-A or the other ARM documentation therefore it is important to read such documents. Many thanks to Kostya Shishkov, who introduced me to ARM and many thanks to Martin Storsjö, an ARM expert who reviewed my patches and helped me a lot with optimizing them.

Friday, 28 April

I tried my skills at optimising HEVC. My SIMD IDCT (Inverse Discrete Cosine Transform) for HEVC decoder was merged lately. What I did was 4x4, 8x8, 16x16 and 32x32 IDCTs for 8 and 10 bitdepths. Both 4x4 and 8x8 are supported on 32-bit CPUs but 16x16 and 32x32 are 64-bit only.

The larger transforms calls the smaller ones, 32 calls 16, 16 calles 8 and so on, so 4x4 is used by all the other transforms. Here is how the actual assembly looks:
; void ff_hevc_idct_4x4__{8,10}_(int16_t *coeffs, int col_limit)
; %1 = bitdepth
%macro IDCT_4x4 1
cglobal hevc_idct_4x4_%1, 1, 1, 5, coeffs
mova m0, [coeffsq]
mova m1, [coeffsq + 16]

TR_4x4 7, 1, 1
TR_4x4 20 - %1, 1, 1

mova [coeffsq], m0
mova [coeffsq + 16], m1
*coeffs is a pointer to coefficients I want to transform. They are loaded to XMM registers and then TR_4x4 macro is called. This macro transforms the coeffs according to the following equations: res00 = 64 * src00 + 64 * src20 + 83 * src10 + 36 * src30
res10 = 64 * src01 + 64 * src21 + 83 * src11 + 36 * src31
res20 = 64 * src02 + 64 * src23 + 83 * src12 + 36 * src32
res30 = 64 * src03 + 64 * src23 + 83 * src13 + 36 * src33
Because the transformed coefficients are written back to the same place, "res" (as residual) is used for the results and "src" for the initial coefficients. The results from the calculations are then scaled res = (res + add_const) >> shift and the (4x4) block of the results is transposed. The macro is called again to perform the same transform but this time to rows.
; %1 - shift
; %2 - 1/0 - SCALE and Transpose or not
; %3 - 1/0 add constant or not
%macro TR_4x4 3
; interleaves src0 with src2 to m0
; and src1 with scr3 to m2
; src0: 00 01 02 03 m0: 00 20 01 21 02 22 03 23
; src1: 10 11 12 13 -->
; src2: 20 21 22 23 m1: 10 30 11 31 12 32 13 33
; src3: 30 31 32 33

SBUTTERFLY wd, 0, 1, 2

pmaddwd m2, m0, [pw_64] ; e0
pmaddwd m3, m1, [pw_83_36] ; o0
pmaddwd m0, [pw_64_m64] ; e1
pmaddwd m1, [pw_36_m83] ; o1

%if %3 == 1
%assign %%add 1 << (%1 - 1)
mova m4, [pd_ %+ %%add]
paddd m2, m4
paddd m0, m4

SUMSUB_BADC d, 3, 2, 1, 0, 4

%if %2 == 1
psrad m3, %1 ; e0 + o0
t psrad m1, %1 ; e1 + o1
psrad m2, %1 ; e0 - o0
psrad m0, %1 ; e1 - o1
packssdw m3, m1
packssdw m0, m2
; Transpose
SBUTTERFLY wd, 3, 0, 1
SBUTTERFLY wd, 3, 0, 1
SWAP 3, 1, 0
SWAP 3, 2, 0
The larger transforms are a bit more complicated but they works in a similar way.

There are the results benchmarked by checkasm bench_new() function for the bitdepth 8 (the results are similar for bitdepth 10). Checkasm can benchmark SIMD functions with --bench option, in my case the full command was:

    ./tests/checkasm/checkasm --bench=hevc_idct.
    The overall HEVC performace was benchmarked with perf:
    perf stat -r5 ./avconv -threads 1 -i sample.mkv -an -f null -. The sample details: duration 0:12:14, bitrate 200kb/s, yuv420p, 1920x1080, Divx encode of Tears of Steel. The result is 10% speed up after my SIMD optimisations.

    Many thanks to Kostya Shishkov and Henrik Gramner for their advices during the development process.

    Sunday, 22 January

    I asked Kostya Shishkov, an experienced ARM developer, to check my basic NEON knowledge. So here are his questions and my answers to them:

    • Where do you find informations about instruction details?
    • ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition.
    • How many SIMD registers are there and how can you address them?
    • SIMD registers can be viewed as 32 64-bit registers named D0 - D31 or as 16 128-bit registers called Q0 - Q15. VFP view as 32 32-bit registers S0-S31 (mapped to D0- D15) can be also used.
    • In what ways can you load/store data to/from NEON registers and why use one over another?
      • use immediate constant to fill SIMD register:
      • vmov.i32 q0, #42 - move immediate constant 42 to q0 SIMD register, suffix i32 specifies the data type, 32-bit integer in this case, as q0 is 128-bit register, there will be 4 32-bit 42 constants
      • use GPR to store the offset for load/store instruction:
      • mov r1, #16 - move number 16 to r1 GPR register add r0, r1 - add 16 bytes to the address stored in r0 vst1.s16 {q0}, [r0] - store the content of q0 to the address stored in r1
      • update the address after loading (storing)
      • add r1, r0, \offset - add the offset to r0 and store the result in r1mov r2, \step - move the constant step to r2 vld1.s16 {d0}, [r1], r2 - store d0 content to the address in r1, then update r1 = r1 + r2
      • move data between GPR and SIMD registers:
      • vmov d0[0], r1
    • Why it's better to use all different registers in NEON instruction for all its arguments?
    • ( for example: why it's better to use vadd.s16 q1, q2, q3 instead of vadd.s16 q1, q1, q2?)Because it is faster in some cases.
    • What are the differences between vld1.16 q1, [r0] and vld1.16 q1, [r0,:128]?
    • :128 means the data are loaded/stored 128bit aligned. Aligned store means the addr I'm storing at minus the addr of the first array member is 128bit (in this case) multiple.
    • Why some instructions use e.g. vxxx.i8 form and others use vxxx.s8 form?
    • i stands for integer, s for signed integer, I is used when signedness does not matter just to know the element size.
    • Where would you use VZIP, VUZP and VTRN?
    • Those are useful for matrix transposition and various data shuffling.
    • When one NEON instruction can replace several different operations in other SIMDs?
      • VMLAL - multiply corresponding elements in 2 vectors and add the products to destination vector
      • VQRSHRN - Vector Saturating Shift Right Narrow - Shifts right vector elements and clips them into narrower element.

    Monday, 31 October

    Using hwaccel

    Had been a while since I mentioned the topic and we made a huge progress on this field.

    Currently with Libav12 we already have nice support for multiple different hardware support for decoding, scaling, deinterlacing and encoding.

    The whole thing works nicely but it isn’t foolproof yet so I’ll start describing how to setup and use it for some common tasks.

    This post will be about Intel MediaSDK, the next post will be about NVIDIA Video Codec SDK.



    • A machine with QSV hardware, Haswell, Skylake or better.
    • The ability to compile your own kernel and modules
    • The MediaSDK mfx_dispatch

    It works nicely both on Linux and Windows. If you happen to have other platforms feel free to contact Intel and let them know, they’ll be delighted.


    The MediaSDK comes with either the usual Windows setup binary or a Linux bash script that tries its best to install the prerequisites.

    # tar -xvf MediaServerStudioEssentials2017.tar.gz

    Focus on SDK2017Production16.5.tar.gz.

    tar -xvf SDK2017Production16.5.tar.gz


    The MediaSDK leverages libva to access the hardware together with an highly extended DRI kernel module.
    They support CentOS with rpms and all the other distros with a tarball.

    BEWARE: if you use the installer script the custom libva would override your system one, you might not want that.

    I’m using Gentoo so it is intel-linux-media_generic_16.5-55964_64bit.tar.gz for me.

    The bits of this tarball you really want to install in the system no matter what is the firmware:


    If you are afraid of adding custom stuff on your system I advise to offset the whole installation and then override the LD paths to use that only for Libav.

    BEWARE: you must use the custom iHD libva driver with the custom i915 kernel module.

    If you want to install using the provided script on Gentoo you should first emerge lsb-release.

    emerge lsb-release
    source /etc/profile.d/*.sh
    echo /opt/intel/mediasdk/lib64/ >> /etc/

    Kernel Modules

    The patchset resides in:


    The current set is 143 patches against linux 4.4, trying to apply on a more recent kernel requires patience and care.

    The 4.4.27 works almost fine (even btrfs does not seem to have many horrible bugs).


    In order to use the Media SDK with Libav you should use the mfx_dispatch from yours truly since it provides a default for Linux so it behaves in an uniform way compared to Windows.

    Building the dispatcher

    It is a standard autotools package.

    git clone git://
    cd mfx_dispatch
    autoreconf -ifv
    ./configure --prefix=/some/where
    make -j 8
    make install

    Building Libav

    If you want to use the advanced hwcontext features on Linux you must enable both the vaapi and the mfx support.

    git clone git://
    cd libav
    export PKG_CONFIG_PATH=/some/where/lib/pkg-config
    ./configure --enable-libmfx --enable-vaapi --prefix=/that/you/like
    make -j 8
    make install


    Media SDK is sort of temperamental and the setup process requires manual tweaking so the odds of having to do debug and investigate are high.

    If something misbehave here is a checklist:
    – Make sure you are using the right kernel and you are loading the module.

    uname -a
    • Make sure libva is the correct one and it is loading the right thing.
    strace -e open ./avconv -c:v h264_qsv -i test.h264 -f null -
    • Make sure you aren’t using the wrong ratecontrol or not passing all the parameters required
    ./avconv -v verbose -filter_complex testsrc -c:v h264_qsv {ratecontrol params omitted} out.mkv

    See below for some examples of working rate-control settings.
    – Use the MediaSDK examples provided with the distribution to confirm that everything works in case the SDK is more recent than the updates.


    The Media SDK support in Libav covers decoding, encoding, scaling and deinterlacing.

    Decoding is straightforward, the rest has still quite a bit of rough edges and this blog post had been written mainly to explain them.

    Currently the most interesting format supported are h264 and hevc, but even other formats such as vp8 and vc1 are supported.

    ./avconv -codecs | grep qsv


    The decoders can output directly to system memory and can be used as normal decoders and feed a software implementation just fine.

    ./avconv -c:v h264_qsv -i input.h264 -c:v av1 output.mkv

    Or they can decode to opaque (gpu backed) buffers so further processing can happen

    ./avconv -hwaccel qsv -c:v h264_qsv -vf deinterlace_qsv,hwdownload,format=nv12 -c:v x265

    NOTICE: you have to explicitly pass the filterchain hwdownload,format=nv12 not have mysterious failures.


    The encoders are almost as straightforward beside the fact that the MediaSDK provides multiple rate-control systems and they do require explicit parameters to work.

    ./avconv -i input.mkv -c:v h264_qsv -q 20 output.mkv

    Failing to set the nominal framerate or the bitrate would make the look-ahead rate control not happy at all.

    Rate controls

    The rate control is one of the most rough edges of the current MediaSDK support, most of them do require a nominal frame rate and that requires an explicit -r to be passed.

    There isn’t a default bitrate so also -b:v should be passed if you want to use a rate-control that has a bitrate target.

    Is it possible to use a look-ahead rate-control aiming to a quality metric passing -global_quality -la_depth.

    The full list is documented.


    It is possible to have a full hardware transcoding pipeline with Media SDK.


    ./avconv -hwaccel qsv -c:v h264_qsv -i input.mkv -vf deinterlace_qsv -c:v h264_qsv -r 25 -b:v 2M


    ./avconv -hwaccel qsv -c:v h264_qsv -i input.mkv -vf scale_qsv=640:480 -c:v h264_qsv -r 25 -b:v 2M -la_depth 10

    Both at the same time

    ./avconv -hwaccel qsv -c:v h264_qsv -i input.mkv -vf deinterlace_qsv,scale_qsv=640:480 -c:v h264_qsv -r 25 -b:v 2M -la_depth 10

    Hardware filtering caveats

    The hardware filtering system is quite new and introducing it shown a number of shortcomings in the Libavfilter architecture regarding format autonegotiation so for hybrid pipelines (those that do not keep using hardware frames all over) it is necessary to explicitly call for hwupload and hwdownload explictitly in such ways:

    ./avconv -hwaccel qsv -c:v h264_qsv -i in.mkv -vf deinterlace_qsv,hwdownload,format=nv12 -c:v vp9 out.mkv

    Future for MediaSDK in Libav

    The Media SDK supports already a good number of interesting codecs (h264, hevc, vp8/vp9) and Intel seems to be quite receptive regarding what codecs support.
    The Libav support for it will improve over time as we improve the hardware acceleration support in the filtering layer and we make the libmfx interface richer.

    We’d need more people testing and helping us to figure use-cases and corner-cases that hadn’t been thought of yet, your feedback is important!

    Tuesday, 18 October

    I decided to organise the Libav sprint again, this time in a small village near Pelhřimov. The participants:

    • Luca Barbato - came with a lot of Venchi chocolate
    • Anton Khirnov
    • Kostya Shishkov - came with a lot of Läderach chocolate
    • Mark Thompson
    • Alexandra Hájková (me) 
    All the chocolate was sooo tasty, we ate all of it of course.
    • Luca - Altivec
    • Anton and Mark - coworking on QSV and VP9
    • Alexandra - x86 SIMD HEVC IDCT
    • Kostya - consultations for the rest of us

    I rented a cosy cottage for the sprint. It was surprisingly warm for the end of September and we enjoyed a nice garden sitting not only with the table and chairs but even with a couch. The sitting was under the roof and it was possible to work outside which was really pleasant for me. There was also a grill so we grilled some sausages for one of our dinners. It started to rain the last day and making a fire in the fireplace made a nice feeling.

    Because the weather was really nice we decided to explore the countyside a bit, we finally found the path to the forest and spent mid-afternoon on its fresh air.

    Both Luca and me like to cook and to try new foods and dishes. We cooked a lot during the sprint, Luca prepared some delicious Italian meals, Kostya cooked us traditional ukranian dish from millet called куліш which was very tasty and I want to try it at home sometime. At least for me the most interesting thing of the cooking part was making another traditional Ukrainian meal вареники which is kind of filled-pasta. We filled one half of them with potato-salami  with fried onion filling and the other half with cottage cheese, both very good I can't decide which one was better.  The вареники was eaten with sour cream, there're also some dried cranberries on the picture.

    I almost finished my x86 SIMD  optimisation for HEVC IDCT there, Luca introduced altivec to me and I wrote PPC optimised 4x4 IDCT (github).

    A lot of work was done during the sprint, many patches sent (ML), many patches pushed, all of this in a friendly atmosphere in a comfortable cottage, fresh air and with a good cuisine. Personally I enjoyed the sprint very much, I'm glad I organised it and I hope the other people liked it as well.

    Thank you everyone for coming!

    Monday, 02 May

    Some time ago Niels Möller proposed a new method of bitreading that should be faster then the current one (here). It is an interesting idea and I decided to try it. Luca Barbato considered it to be a good idea and had his company sponsored this work. The new bitstream reader (bitstream.h) is faster in many cases and is never  slower than the existing one (get_bits.h).

    All the new and equivalent old bitreading functions was benchmarked using TIME macros in a simple test program. Because the results were good, I converted all the decoders to use the new bitstream reader. The performances of the most important decoders using the new and old bitreaders was benchmarked with perf stat (using x86_64, 64-bit (Intel Core i3-2120, 3.30GHz)) are pretty good and even on arm32 I could not see speed regressions.

    The old bitstream reader is quite inconsistent, with its core API made of macros and with at least 3 or 4 higher level functions reading a not easy to guess number of bits.
    static inline unsigned int get_bits(GetBitContext *s, int n){ register int tmp; OPEN_READER(re, s);
    UPDATE_CACHE(re, s); tmp = SHOW_UBITS(re, s, n); LAST_SKIP_BITS(re, s, n); CLOSE_READER(re, s); return tmp;}
    The new bitstream reader is written to be easier to use, more consistent and to be easier to follow. It is better documented, the functions are named according to the current naming convetions and to be consistent with the bytestream reader naming.
    Some of bitstream.h functions replaces several ones from get_bits.h at once:
    • bitstream_read_32() reads bits from the 0-32 range and replaces
      • get_bits()
      • get_bits_long()
      • get_bitsz()
    • bitstream_peek_32() replaces
      • show_bits()
      • show_bits_long()
      • show_bits1()
    • bitstream_skip() replaces
      • skip_bits1()
      • skip_bits()
      • skip_bits_long()
    The get_bits.h bitreading macros have to be used directly sometimes to achieve better decoding performance. Reading or writing the code that uses these macros requires good knowledge about how this bitreader works and they can be surprising at times since they create local variables.
    The new bitreader usage does not require such a deep knowledge, all needed operations require just to use a smaller set of function that happen to be faster in many usage patterns.

    Many thanks to Luca Barbato for his advices and consultations during the developing process.

     I hope this new bitreader could become a useful piece of the Libav code. Opinions and suggestions are welcome.

    Friday, 01 April

    swscale is one of the most annoying part of Libav, after a couple of years since the initial blueprint we have something almost functional you can play with.

    Colorspace conversion and Scaling

    Before delving in the library architecture and the outher API probably might be good to make a extra quick summary of what this library is about.

    Most multimedia concepts are more or less intuitive:
    encoding is taking some data (e.g. video frames, audio samples) and compress it by leaving out unimportant details
    muxing is the act of storing such compressed data and timestamps so that audio and video can play back in sync
    demuxing is getting back the compressed data with the timing information stored in the container format
    decoding inflates somehow the data so that video frames can be rendered on screen and the audio played on the speakers

    After the decoding step would seem that all the hard work is done, but since there isn’t a single way to store video pixels or audio samples you need to process them so they work with your output devices.

    That process is usually called resampling for audio and for video we have colorspace conversion to change the pixel information and scaling to change the amount of pixels in the image.

    Today I’ll introduce you to the new library for colorspace conversion and scaling we are working on.


    The library aims to be as simple as possible and hide all the gory details from the user, you won’t need to figure the heads and tails of functions with a quite large amount of arguments nor special-purpose functions.

    The API itself is modelled after avresample and approaches the problem of conversion and scaling in a way quite different from swscale, following the same design of NAScale.

    Everything is a Kernel

    One of the key concept of AVScale is that the conversion chain is assembled out of different components, separating the concerns.

    Those components are called kernels.

    The kernels can be conceptually divided in two kinds:
    Conversion kernels, taking an input in a certain format and providing an output in another (e.g. rgb2yuv) without changing any other property.
    Process kernels, modifying the data while keeping the format itself unchanged (e.g. scale)

    This pipeline approach gets great flexibility and helps code reuse.

    The most common use-cases (such as scaling without conversion or conversion with out scaling) can be faster than solutions trying to merge together scaling and conversion in a single step.


    AVScale works with two kind of structures:
    AVPixelFormaton: A full description of the pixel format
    AVFrame: The frame data, its dimension and a reference to its format details (aka AVPixelFormaton)

    The library will have an AVOption-based system to tune specific options (e.g. selecting the scaling algorithm).

    For now only avscale_config and avscale_convert_frame are implemented.

    So if the input and output are pre-determined the context can be configured like this:

    AVScaleContext *ctx = avscale_alloc_context();
    if (!ctx)
    ret = avscale_config(ctx, out, in);
    if (ret < 0)

    But you can skip it and scale and/or convert from a input to an output like this:

    AVScaleContext *ctx = avscale_alloc_context();
    if (!ctx)
    ret = avscale_convert_frame(ctx, out, in);
    if (ret < 0)

    The context gets lazily configured on the first call.

    Notice that avscale_free() takes a pointer to a pointer, to make sure the context pointer does not stay dangling.

    As said the API is really simple and essential.

    Help welcome!

    Kostya kindly provided an initial proof of concept and me, Vittorio and Anton prepared this preview on the spare time. There is plenty left to do, if you like the idea (since many kept telling they would love a swscale replacement) we even have a fundraiser.

    Tuesday, 29 March

    Another week another API landed in the tree and since I spent some time drafting it, I guess I should describe how to use it now what is implemented. This is part I

    What is here now

    Between theory and practice there is a bit of discussion and obviously the (lack) of time to implement, so here what is different from what I drafted originally:

    • Function Names: push got renamed to send and pull got renamed to receive.
    • No separated function to probe the process state, need_data and have_data are not here.
    • No codecs ported to use the new API, so no actual asyncronicity for now.
    • Subtitles aren’t supported yet.

    New API

    There are just 4 new functions replacing both audio-specific and video-specific ones:

    // Decode
    int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);
    int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);
    // Encode
    int avcodec_send_frame(AVCodecContext *avctx, const AVFrame *frame);
    int avcodec_receive_packet(AVCodecContext *avctx, AVPacket *avpkt);

    The workflow is sort of simple:
    – You setup the decoder or the encoder as usual
    – You feed data using the avcodec_send_* functions until you get a AVERROR(EAGAIN), that signals that the internal input buffer is full.
    – You get the data back using the matching avcodec_receive_* function until you get a AVERROR(EAGAIN), signalling that the internal output buffer is empty.
    – Once you are done feeding data you have to pass a NULL to signal the end of stream.
    – You can keep calling the avcodec_receive_* function until you get AVERROR_EOF.
    – You free the contexts as usual.

    Decoding examples


    The setup uses the usual avcodec_open2.

        c = avcodec_alloc_context3(codec);
        ret = avcodec_open2(c, codec, &opts);
        if (ret < 0)

    Simple decoding loop

    People using the old API usually have some kind of simple loop like

    while (get_packet(pkt)) {
        ret = avcodec_decode_video2(c, picture, &got_picture, pkt);
        if (ret < 0) {
        if (got_picture) {

    The old functions can be replaced by calling something like the following.

    // The flush packet is a non-NULL packet with size 0 and data NULL
    int decode(AVCodecContext *avctx, AVFrame *frame, int *got_frame, AVPacket *pkt)
        int ret;
        *got_frame = 0;
        if (pkt) {
            ret = avcodec_send_packet(avctx, pkt);
            // In particular, we don't expect AVERROR(EAGAIN), because we read all
            // decoded frames with avcodec_receive_frame() until done.
            if (ret < 0)
                return ret == AVERROR_EOF ? 0 : ret;
        ret = avcodec_receive_frame(avctx, frame);
        if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
            return ret;
        if (ret >= 0)
            *got_frame = 1;
        return 0;

    Callback approach

    Since the new API will output multiple frames in certain situations would be better to process them as they are produced.

    // return 0 on success, negative on error
    typedef int (*process_frame_cb)(void *ctx, AVFrame *frame);
    int decode(AVCodecContext *avctx, AVFrame *pkt,
               process_frame_cb cb, void *priv)
        AVFrame *frame = av_frame_alloc();
        int ret;
        ret = avcodec_send_packet(avctx, pkt);
        // Again EAGAIN is not expected
        if (ret < 0)
            goto out;
        while (!ret) {
            ret = avcodec_receive_frame(avctx, frame);
            if (!ret)
                ret = cb(priv, frame);
        if (ret == AVERROR(EAGAIN))
            return 0;
        return ret;

    Separated threads

    The new API makes sort of easy to split the workload in two separated threads.

    // Assume we have context with a mutex, a condition variable and the AVCodecContext
    // Feeding loop
        AVPacket *pkt = NULL;
        while ((ret = get_packet(ctx, pkt)) >= 0) {
            ret = avcodec_send_packet(avctx, pkt);
            if (!ret) {
            } else if (ret == AVERROR(EAGAIN)) {
                // Signal the draining loop
                // Wait here
                pthread_cond_wait(&ctx->cond, &ctx->mutex);
            } else if (ret < 0)
                goto out;
        ret = avcodec_send_packet(avctx, NULL);
        return ret;
    // Draining loop
        AVFrame *frame = av_frame_alloc();
        while (!done) {
            ret = avcodec_receive_frame(avctx, frame);
            if (!ret) {
            } else if (ret == AVERROR(EAGAIN)) {
                // Signal the feeding loop
                // Wait
                pthread_cond_wait(&ctx->cond, &ctx->mutex);
            } else if (ret < 0)
                goto out;
            if (!ret) {
        return ret;

    It isn’t as neat as having all this abstracted away, but is mostly workable.

    Encoding Examples

    Simple encoding loop

    Some compatibility with the old API can be achieved using something along the lines of:

    int encode(AVCodecContext *avctx, AVPacket *pkt, int *got_packet, AVFrame *frame)
        int ret;
        *got_packet = 0;
        ret = avcodec_send_frame(avctx, frame);
        if (ret < 0)
            return ret;
        ret = avcodec_receive_packet(avctx, pkt);
        if (!ret)
            *got_packet = 1;
        if (ret == AVERROR(EAGAIN))
            return 0;
        return ret;

    Callback approach

    Since for each input multiple output could be produced, would be better to loop over the output as soon as possible.

    // return 0 on success, negative on error
    typedef int (*process_packet_cb)(void *ctx, AVPacket *pkt);
    int encode(AVCodecContext *avctx, AVFrame *frame,
               process_packet_cb cb, void *priv)
        AVPacket *pkt = av_packet_alloc();
        int ret;
        ret = avcodec_send_frame(avctx, frame);
        if (ret < 0)
            goto out;
        while (!ret) {
            ret = avcodec_receive_packet(avctx, pkt);
            if (!ret)
                ret = cb(priv, pkt);
        if (ret == AVERROR(EAGAIN))
            return 0;
        return ret;

    The I/O should happen in a different thread when possible so the callback should just enqueue the packets.

    Coming Next

    This post is long enough so the next one might involve converting a codec to the new API.

    Monday, 21 March

    Last weekend, after few months of work, the new bitstream filter API eventually landed.

    Bitstream filters

    In Libav is possible to manipulate raw and encoded data in many ways, the most common being

    • Demuxing: extracting single data packets and their timing information
    • Decoding: converting the compressed data packets in raw video or audio frames
    • Encoding: converting the raw multimedia information in a compressed form
    • Muxing: store the compressed information along timing information and additional information.

    Bitstream filtering is somehow less considered even if the are widely used under the hood to demux and mux many widely used formats.

    It could be consider an optional final demuxing or muxing step since it works on encoded data and its main purpose is to reformat the data so it can be accepted by decoders consuming only a specific serialization of the many supported (e.g. the HEVC QSV decoder) or it can be correctly muxed in a container format that stores only a specific kind.

    In Libav this kind of reformatting happens normally automatically with the annoying exception of MPEGTS muxing.

    New API

    The new API is modeled against the pull/push paradigm I described for AVCodec before, it works on AVPackets and has the following concrete implementation:

    // Query
    const AVBitStreamFilter *av_bsf_next(void **opaque);
    const AVBitStreamFilter *av_bsf_get_by_name(const char *name);
    // Setup
    int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **ctx);
    int av_bsf_init(AVBSFContext *ctx);
    // Usage
    int av_bsf_send_packet(AVBSFContext *ctx, AVPacket *pkt);
    int av_bsf_receive_packet(AVBSFContext *ctx, AVPacket *pkt);
    // Cleanup
    void av_bsf_free(AVBSFContext **ctx);

    In order to use a bsf you need to:

    • Look up its definition AVBitStreamFilter using a query function.
    • Set up a specific context AVBSFContext, by allocating, configuring and then initializing it.
    • Feed the input using av_bsf_send_packet function and get the processed output once it is ready using av_bsf_receive_packet.
    • Once you are done av_bsf_free cleans up the memory used for the context and the internal buffers.


    You can enumerate the available filters

    void *state = NULL;
    const AVBitStreamFilter *bsf;
    while ((bsf = av_bsf_next(&state)) {
        av_log(NULL, AV_LOG_INFO, "%s\n", bsf->name);

    or directly pick the one you need by name:

    const AVBitStreamFilter *bsf = av_bsf_get_by_name("hevc_mp4toannexb");


    A bsf may use some codec parameters and time_base and provide updated ones.

    AVBSFContext *ctx;
    ret = av_bsf_alloc(bsf, &ctx);
    if (ret < 0)
        return ret;
    ret = avcodec_parameters_copy(ctx->par_in, in->codecpar);
    if (ret < 0)
        goto fail;
    ctx->time_base_in = in->time_base;
    ret = av_bsf_init(ctx);
    if (ret < 0)
        goto fail;
    ret = avcodec_parameters_copy(out->codecpar, ctx->par_out);
    if (ret < 0)
        goto fail;
    out->time_base = ctx->time_base_out;


    Multiple AVPackets may be consumed before an AVPacket is emitted or multiple AVPackets may be produced out of a single input one.

    AVPacket *pkt;
    while (got_new_packet(&pkt)) {
        ret = av_bsf_send_packet(ctx, pkt);
        if (ret < 0)
            goto fail;
        while ((ret = av_bsf_receive_packet(ctx, pkt)) == 0) {
        if (ret == AVERROR(EAGAIN)
        IF (ret == AVERROR_EOF)
            goto end;
        if (ret < 0)
            goto fail;
    // Flush
    ret = av_bsf_send_packet(ctx, NULL);
    if (ret < 0)
        goto fail;
    while ((ret = av_bsf_receive_packet(ctx, pkt)) == 0) {
    if (ret != AVERROR_EOF)
        goto fail;

    In order to signal the end of stream a NULL pkt should be fed to send_packet.


    The cleanup function matches the av_freep signature so it takes the address of the AVBSFContext pointer.


    All the memory is freed and the ctx pointer is set to NULL.

    Coming Soon

    Hopefully next I’ll document the new HWAccel layer that already landed and some other API that I discussed with Kostya before.
    Sadly my blog-time (and spare time in general) shrunk a lot in the past months so he rightfully blamed me a lot.

    Saturday, 05 March

    Sometimes it's very useful to print out how some parameters changes during the program execution.

    When writing the new version of some piece of code one usually needs to compare it with the old one to be sure it behaves the same in every case. Especially the corner cases might be tricky and I spent a lot of time with them while my code worked fine in general.

    For example when I was working on my ASF demuxer, I was happy there's an old demuxer and I can compare their behaviour. When debugging the ASF, I wanted to know the state of I/O context. In that time lu_zero (who was mentoring me) created a set of macros which printed logs for every I/O function (here). For example there's the macro for avio_seek() function (which is equivalent to fseek()).

      #define avio_seek(s, o, w) ({ \
    int64_t _ret = avio_seek(s, o, w); \
    int64_t _pos = avio_tell(s); \
    av_log(NULL, AV_LOG_VERBOSE|AV_LOG_C(154), "0x%08"PRIx64" - %s:%d seek %p %"PRId64" %d -> %"PRId64"\n", \
    _pos, __FUNCTION__, __LINE__, s, o, w, _ret); \
    _ret; \
     When such a macro was present in my demuxer, for all the calls of avio_seek the following information was printed
    • _pos = avio_tell(s): the offset in the demuxed file
    • __FUNCTION__ : preprocessor define that contains the name of the function being compiled to know which function called avio_seek
    • __LINE__ : preprocessor define that contains the line number of the original source file that is being compiled to know from what line avio_seek was called from
    • s, o, w : the values of the parameters avio_seek was called with
    • _ret: the avio_seek return value
    • __FILE__: preprocessor define contains the name of the file being compiled (this one was not used in the example but might be useful when one needs more complex log). 
    Parentheses are used around the define body because such a construct may appear as an expression in GNU C. There's _ret; as the last statement in this macro because its value serves as the value of the entire construct. If the last _ret; would be omitted in my example, the return value of this macro expression would be printf return value. The underscores in _ret or _pos variables are used to be sure it does not shadow some other variables with the same names.
    Working with a log created by a set of macros similar to this example might be more effective than debugging with gdb in some cases.

    Many thanks to lu_zero for teaching me about it. The support from the more experienced developers is the thing I really love about Libav.

    Wednesday, 09 December

    I made a split my complex dcadec bit-exact patch ( to the several parts. The first part which contains changing the dcadec core to work with integer coefficients instead of converting the coefficients to floats just after reading them was sent to the mailing list ( Such a change was expected to slow down the decoding process. Therefore I made some measurements to examine how much slower decoding is after my patch.
     I decoded this sample: 10 times and measured the user time between invocation and termination with the "time" command:

     time ./avconv -f dts -i dtswavsample16.wav -f null -c pcm_f32le null, 
    counted the average real time of avconv run and repeated everything for the master branch. The duration of the dtswavsample16.wav is ~4 mins and I wanted to look at the slow down for the longer files. Hence I used relatively new loop option for the avconv ( to create ~24 mins long file from the initial file by looping it 6x with
     ./avconv -loop 6 -i dtswavsample16.wav -c copy dts_long.wav. 
    I decoded this longer dts file 10x again for both new integer and old float coefficients core and counted the averages.
    According to my results the integer patch causes ~20% slow down. The question is if this is still acceptable. I see 2 options here
    • To consider the slowdown acceptable and to try to make some speedups like SIMDifying VQ decoding and using inline asm for 64-bit math.
    • Or alternatively both int and float modes can be kept for different decoding modes but this might make the code too hairy.

    Opinions and suggestions are welcome.

    Tuesday, 08 December

    When playing a multimedia file, one usually wants to seek to reach different parts of a file. Mostly, containers allows this feature but it might be problem for streamed files.
    Withing the libavformat, seeking is performed with function (inside the demuxer) called read_seek. This function tries to find matching timestamp  for the requested position (offset) in the played file.
    There are 2 ways to seek through the file. One of them is when file contains some kind of index, which matches positions with appropriate timestamps. In this case index entries are created by calling av_add_index_entry. If index entries are present, av_index_search_timestamp, which is called inside the read_seek, looks for the closest timestamp for the requested position. When the file does not provide such entries, one can look for the requested position with ff_seek_frame_binary. For doing so, read_timestamp function has to be created inside the demuxer.
    Read_timestamp takes required position and stream index and then tries to find offset of the beginning of the closest packet wich is key frame with matching stream index. While doing this, read_timestamp reads timestamps for all the packets after given position and creates index entries. When the key frame with matching stream index is found, read_timestamp upgrades required position and returns timestamp matching to it. 

    I was told to test my ASF demuxer with the zzuf utility. Zuff is a fuzzer, it changes random bits in the program's input which simulates damaged file or unexpected data.
    For testing ASF's behaviour I want to feed avconv with some corrupted wmv files and see what will happen. Because I want to fuzz in several different ways I want to vary seed (the initial value of zzuf’s random number generator). I'll do this with command:

    while true; SEED=$RANDOM; for file *wmv; do zzuf -M -l -r 0.00001 -q -U 60 -s $SEED ./avconv -i "file" -f null -c copy - || echo $SEED $file >> fuzz; done; done;.

    I got the file fuzz which is the list of seed paired with filename.  Now I need to use zzuf for creating damaged files to check the problem with valgrind. I'll use the list to determine the seed which caused some crash for creating my damaged file:

    zzuf -M -l -r 0.00001 -q -U 60 -s myseed < somefile.wmv | cat out.asf.

    Now I'll just use valgrind to find out what happened:

    valgrind ./avconv -i out.asf -f null -c copy -.

    I tried to test the ASF demuxer with different tricky samples and with FATE
    and the demuxer behaved well but testing with zzuf detected several new crashes. Mainly it was insane values sizes and it was easy to fix them by adding some more checks. Zzuf is a great thing for testing.

    Pelhřimov is small but very nice town in Czech Republic approximately 120 km from the capital Prague and I decided to organize a small but nice Libav sprint in it.
    The participants and the topics were:

    • Luca Barbato -
      • AVScale, especially documenting it
      • HW accelerated encoders
      • async API
    • Anton Khirnov - The Evil Plan II and fixing H.264 decoder for it
    • Kostya Shishkov - trolling motivating the others
    • Alexandra Hájková (me) - dcadec decoder, mainly testing the new patch
    • everyone - eating chocolate

    I finished and sent my dcadec "Integer core decoder" patch (which transforms the dcadec core decoder to work with integers) during the sprint. After the discussion and some hints from the others I tested my patch better and found out some interesting things:
    • It seems XLL output really is lossless when using my patch.
    • But LFE channel was broken - this was fixed during the sprint.
    • While lossless or force_fixed output looks fine my patch breaks a float (lossy) output a little bit - it feels the same for my ears but looking at the output in audacity and comparing it with "before the integer patch" output shows something is wrong there.
    • I discovered for myself an avconv option called channelsplit ( that splits the file into per-channel files wich was very useful for comparing the output with some reference decoder with the other channel order (for example with
    My post sprint dcadec plans are:
    • Fix the lossy output issue.
    • Fix detection of the extensions and add the options for disabling them.
    • Rewrite the dca_decode_frame to handle all the extensions and working with the new options for them more systematicly.

    I decided to improve the Libav DTS decoder - dcadec. Here I want to explain what are its problems now and what I would like to do about them.
    DTS encoded audio stream consists of core audio and may contain extended audio. Dcadec supports XCH and XLL extensions but X96, XXCH and XBR extensions are waiting to be implemented - I'd like to implement them later.
    For the DTS lossless extension - XLL, the decoded output audio should be a bit for bit accurate reproduction of the encoded input. However there are some problems:

    • The main problem is that the core decoder converts integer coefficients read from the bitstream to floats just after reading them (along with dequantization). All other steps of the audio reconstruction are done with floats and the output can not be the bitexact reproduction of the input so it is not lossless.
    When the coefficients are read from the bitstream the core decoder does the following:
    dequantization (with int -> float conversion)

    inverse ADPCM (when needed)

    VQ decoding (when needed)

    filtering: QMF, LFE, downmixing (when needed)

    float output.
    I'm working now on modifying the core to work with integer coefficients and then convert them to floats before QMF filtering for lossy output but use bitexact QMF (intermediate LFE coefficients should be always integers and I think it's not correct in the current version) for lossless output. Also I added an option called -force_fixed to force fixed-point reconstruction for any kind of input.
    • Another problem is XLL extension presence detection. During the testing I found out that XLL extension is not detected sometimes and the core audio only is decoded in this case. I want to fix this issue as well.

    Saturday, 21 November

    This is a sort of short list of checklists and few ramblings in the wake of Fosdem’s Code of Conduct discussions and the not exactly welcoming statements about how to perceive a Code of Conduct such as this one.

    Code of Conduct and OpenSource projects

    A Code of Conduct is generally considered a mean to get rid of problematic people (and thus avoid toxic situations). I prefer consider it a mean to welcome people and provide good guidelines to newcomers.

    Communities without a code of conduct tend to reject the idea of having one, thinking that a is only needed to solve the above mentioned issue and adding more bureaucracy would just actually give more leeway to macchiavellian ploys.

    That is usually a problem since, no matter how good things are now, it takes just few poisonous people to get in an unbearable situation and a you just need one in few selected cases.

    If you consider the CoC a shackle or a stick to beat “bad guys” so you do not need it until you see a bad guy, that is naive and utterly wrong: you will end up writing something that excludes people due a quite understandable, but wrong, knee-jerk reaction.

    A Code of Conduct should do exactly the opposite, it should embrace people and make easier joining and fit in. It should be the social equivalent of the developer handbook or the coding style guidelines.

    As everybody can make a little effort and make sure to send code with spaces between operators everybody can make an effort and not use colorful language. Likewise as people would be more happy to contribute if the codebase they are hacking on is readable so they are more confident in joining the community if the environment is pleasant.

    Making an useful Code of Conduct

    The Code of Conduct should be a guideline for people that have no idea what the expected behavior, it should be written thinking on how to help people get along not on how to punish who do not like.

    • It should be short. It is pointless to enumerate ALL the possible way to make people uncomfortable, you are bound to miss it.
    • It should be understanding and inclusive. Always assume cultural bias and not ill will.
    • It should be enforced. It gets quite depressing when you have a 100+ lines code of conduct but then nobody cares about it and nobody really enforces it. And I’m not talking about having specifically designated people to enforce it. Your WHOLE community should agree on what is an acceptable behavior and act accordingly on breaches.

    People joining the community should consider the Code of Conduct first as a request (and not a demand) to make an effort to get along with the others.


    Since I saw quite some long and convoluted wall of text being suggested as THE CODE OF CONDUCT everybody MUST ABIDE TO, here some suggestion on what NOT do.

    • It should not be a political statement: this is a strong cultural bias that would make potential contributors just stay away. No matter how good and great you think your ideas are, those unrelated to a project that should gather people that enjoy writing code in their spare time should stay away. The Open Source is already an ideology, overloading it with more is just a recipe for a disaster.
    • Do not try to make a long list of definitions, you just dilute the content and give even more ammo to lawyer-type arguers.
    • Do not think much about making draconian punishments, this is a community on internet, even nowadays nobody really knows if you are actually a dog or not, you cannot really enforce anything if the other party really wants to be a pest.

    Good examples

    Some CoC I consider good are obviously the ones used in the communities I belong to, Gentoo and Libav, they are really short and to the point.


    As I said before no matter how well written a code of conduct is, the only way to really make it useful is if the community as whole helps new (and not so new) people to get along.

    The rule of thumb “if somebody feels uncomfortable in a non-technical discussion, once he says, drop it immediately”, is ok as long:
    * The person uncomfortable speaks up. If you are shy you might ask somebody else to speak up for you, but do not be quiet when it happens and then fill a complaint much later, that is NOT OK.
    * The rule is not bent to derail technical discussions. See my post about reviews to at least avoid this pitfall.
    * People agree to drop at least some of their cultural biases, otherwise it would end up like walking on eggshells every moment.

    Letting situations going unchecked is probably the main issue, newcomers can think it is OK to behave in a certain way if people are behaving such way and nobody stops that, again, not just specific enforcers of some kind, everybody should behave and tell clearly to those not behaving that they are problematic.

    Gentoo is a big community so once somebody steps the boundaries gets problematic having a swift reaction, lots of people prefer not to speak up when something happens, so people unwillingly causing the problem are not made aware immediately.

    The people then in charge to dish bans have to try to figure out what exactly was wrong and there the cultural biases everybody has might or might not trigger and make the problem harder to address.

    Libav is a much smaller community and in general nobody has qualms in saying “please stop” (that is also partially due how the community evolved).

    Hopefully this post would help avoid making some mistakes and help people getting along better.

    Sunday, 08 November

    This mini-post spurred from this bug.

    AVFrame and AVCodecContext

    In Libav there are a number of patterns shared across most of the components.
    Does not matter if it models a codec, a demuxer or a resampler: You interact with it using a Context and you get data in or out of the module using some kind of Abstraction that wraps data and useful information such as the timestamp. Today’s post is about AVFrames and AVCodecContext.


    The most used abstraction in Libav by far is the AVFrame. It wraps some kind of raw data that can be produced by decoders and fed to encoders, passed through filters, scalers and resamplers.

    It is quite flexible and contains the data and all the information to understand it e.g.:

    • format: Used to describe either the pixel format for video and the sample format for audio.
    • width and height: The dimension of a video frame.
    • channel_layout, nb_samples and sample_rate for audio frames.


    This context contains all the information useful to describe a codec and to configure an encoder or a decoder (the generic, common features, there are private options for specific features).

    Being shared with encoder, decoder and (until Anton’s plan to avoid it is deployed) container streams this context is fairly large and a good deal of its fields are a little confusing since they seem to replicate what is present in the AVFrame or because they aren’t marked as write-only since they might be read in few situation.

    In the bug mentioned channel_layout was the confusing one but also width and height caused problems to people thinking the value of those fields in the AVCodecContext would represent what is in the AVFrame (then you’d wonder why you should have them in two different places…).

    As a rule of thumb everything that is set in a context is either the starting configuration and bound to change in the future.

    Video decoders can reconfigure themselves and output video frames with completely different geometries, audio decoders can report a completely different number of channels or variations in their layout and so on.

    Some encoders are able to reconfigure on the fly as well, but usually with more strict constraints.

    Why their information is not the same

    The fields in the AVCodecContext are used internally and updated as needed by the decoder. The decoder can be multithreaded so the AVFrame you are getting from one of the avcodec_decode_something() functions is not the last frame decoded.

    Do not expect any of the fields with names similar to the ones provided by AVFrame to stay immutable or to match the values provided by the AVFrame.

    Common pitfalls

    Allocating video surfaces

    Some quite common mistake is to use the AVCodecContext coded_width and coded_height to allocate the surfaces to present the decoded frames.

    As said the frame geometry can change mid-stream, so if you do that best case you have some lovely green surrounding your picture, worst case you have a bad crash.

    I suggest to always check that the AVFrame dimensions fit and be ready to reconfigure your video out when that happens.

    Resampling audio

    If you are using a current version of Libav you have avresample_convert_frame() doing most of the work for you, if you are not you need to check that format channel_layout and sample_rate do not change and manually reconfigure.

    Rescaling video

    Similarly you can misconfigure swscale and you should check manually that format, width and height and reconfigure as well. The AVScale draft API on provides an avscale_process_frame().

    In closing

    Be extra careful, think twice and beware of the examples you might find on internet, they might work until they wont.

    Friday, 06 November

    This spurred from some events happening in Gentoo, since with the move to git we eventually have more reviews and obviously comments over patches can be acceptable (and accepted) depending on a number of factors.

    This short post is about communicating effectively.

    When reviewing patches

    No point in pepper coating

    Do not disparage code or, even worse, people. There is no point in being insulting, you add noise to the signal:

    You are a moron! This is shit has no place here, do not do again something this stupid.

    This is not OK: most people will focus on the insult and the technical argument will be totally lost.

    Keep in mind that you want people doing stuff for the project not run away crying.

    No point in sugar coating

    Do not downplay stupid mistakes that would crash your application (or wipe an operating system) because you think it would hurt the feelings of the contributor.

        rm -fR /usr /local/foo

    Is as silly as you like but the impact is HUGE.

    This is a tiny mistake, you should not do that again.

    No, it isn’t tiny it is quite a problem.

    Mistakes happen, the review is there to avoid them hitting people, but a modicum of care is needed:
    wasting other people’s time is still bad.

    Point the mistake directly by quoting the line

    And use at most 2-3 lines to explain why it is a problem.
    If you can’t better if you fix that part yourself or move the discussion on a more direct media e.g. IRC.

    Be specific

    This kind of change is not portable, obscures the code and does not fix the overflow issue at hand:
    The expression as whole could still overflow.

    Hopefully even the most busy person juggling over 5 different tasks will get it.

    Be direct

    Do not suggest the use of those non-portable functions again anyway.

    No room for interpretation, do not do that.

    Avoid clashes

    If you and another reviewer disagree, move the discussion on another media, there is NO point in spamming
    the review system with countless comments.

    When receiving reviews (or waiting for them)

    Everybody makes mistakes

    YOU included, if the reviewer (or more than one) tells you that your changes are not right, there are good odds you are wrong.

    Conversely, the reviewer can make mistakes. Usually is better to move away from the review system and discuss over emails or IRC.

    Be nice

    There is no point in being confrontational. If you think the reviewer is making a mistake, politely point it out.

    If the reviewer is not nice, do not use the same tone to fit in. Even more if you do not like that kind of tone to begin with.

    Wait before answering

    Do not update your patch or write a reply as soon as you get a notification of a review, more changes might be needed and maybe other reviewers have additional opinions.

    Be patient

    If a patch is unanswered, ping it maybe once a week, possibly rebasing it if the world changed meanwhile.

    Keep in mind that most of your interaction is with other people volunteering their free time and not getting anything out of it as well, sometimes the real-life takes priority =)

    Wednesday, 21 October

    You might be subtle like this or just work on your stuff like that but then nobody will know that you are the one that did something (and praise somebody else completely unrelated for your stuff, e.g. Anton not being praised much for the HEVC threaded decoding, the huge work on ref-counted AVFrame and many other things).

    Blogging is boring

    Once you wrote something in code talking about it gets sort of boring, the code is there, it works and maybe you spent enough time on the mailing list and irc discussing about it that once it is done you wouldn’t want to think about it for at least a week.

    The people at xiph got it right and they wrote awesome articles about what they are doing.

    Blogging is important

    JB got it right by writing posts about what happened every week. Now journalist can pick from there what’s cool and coming from VLC and not have to try to extract useful information from git log, scattered mailing lists and conversations on irc.
    I’m not sure I’ll have the time to do the same, but surely I’ll prod at least Alexandra and the others to write more.

    Thursday, 15 October

    In Libav we try to clean up the API and make it more regular, this is one of the possibly many articles I write about APIs, this time about deprecating some relic from the past and why we are doing it.


    This struct used to store image data using data pointers and linesizes. It comes from the far past and it looks like this:

    typedef struct AVPicture {
        uint8_t *data[AV_NUM_DATA_POINTERS];
        int linesize[AV_NUM_DATA_POINTERS];
    } AVPicture;

    Once the AVFrame was introduced it was made so it would alias to it and for some time the two structures were actually defined sharing the commond initial fields through a macro.

    The AVFrame then evolved to store both audio and image data, to use AVBuffer to not have to do needless copies and to provide more useful information (e.g. the actual data format), now it looks like:

    typedef struct AVFrame {
        uint8_t *data[AV_NUM_DATA_POINTERS];
        int linesize[AV_NUM_DATA_POINTERS];
        uint8_t **extended_data;
        int width, height;
        int nb_samples;
        int format;
        int key_frame;
        enum AVPictureType pict_type;
        AVRational sample_aspect_ratio;
        int64_t pts;
    } AVFrame;

    The image-data manipulation functions moved to the av_image namespace and use directly data and linesize pointers, while the equivalent avpicture became a wrapper over them.

    int avpicture_fill(AVPicture *picture, uint8_t *ptr,
                       enum AVPixelFormat pix_fmt, int width, int height)
        return av_image_fill_arrays(picture->data, picture->linesize,
                                    ptr, pix_fmt, width, height, 1);
    int avpicture_layout(const AVPicture* src, enum AVPixelFormat pix_fmt,
                         int width, int height,
                         unsigned char *dest, int dest_size)
        return av_image_copy_to_buffer(dest, dest_size,
                                       src->data, src->linesize,
                                       pix_fmt, width, height, 1);

    It is also used in the subtitle abstraction:

    typedef struct AVSubtitleRect {
        int x, y, w, h;
        int nb_colors;
        AVPicture pict;
        enum AVSubtitleType type;
        char *text;
        char *ass;
        int flags;
    } AVSubtitleRect;

    And to crudely pass AVFrame from the decoder level to the muxer level, for certain rawvideo muxers by doing something such as:   = (uint8_t *)frame;
        pkt.size   =  sizeof(AVPicture);

    AVPicture problems

    In the codebase its remaining usage is dubious at best:

    AVFrame as AVPicture

    In some codecs the AVFrame produced or consumed are casted as AVPicture and passed to avpicture functions instead
    of directly use the av_image functions.


    For the subtitle codecs, accessing the Rect data requires a pointless indirection, usually something like:

        wrap3 = rect->pict.linesize[0];
        p = rect->[0];
        pal = (const uint32_t *)rect->[1];  /* Now in YCrCb! */


    Copying memory from a buffer to another when can be avoided is consider a major sin (“memcpy is murder”) since it is a costly operation in itself and usually it invalidates the cache if we are talking about large buffers.

    Certain muxers for rawvideo, try to spare a memcpy and thus avoid a “murder” by not copying the AVFrame data to the AVPacket.

    The idea in itself is simple enough, store the AVFrame pointer as if it would point a flat array, consider the data size as the AVPicture size and hope that the data pointed by the AVFrame remains valid while the AVPacket is consumed.

    Simple and faulty: with the AVFrame ref-counted API codecs may use a Pool of AVFrames and reuse them.
    It can lead to surprising results because the buffer gets updated before the AVPacket is actually written.
    If the frame referenced changes dimensions or gets deallocated it could even lead to crashes.

    Definitely not a great idea.


    Vittorio, wm4 and I worked together to fix the problems. Radically.

    AVFrame as AVPicture

    The av_image functions are now used when needed.
    Some pointless copies got replaced by av_frame_ref, leading to less memory usage and simpler code.

    No AVPictures are left in the video codecs.


    The AVSubtitleRect is updated to have simple data and linesize fields and each codec is updated to keep the AVPicture and the new fields in sync during the deprecation window.

    The code is already a little easier to follow now.


    Just dropping the “feature” would be a problem since those muxers are widely used in FATE and the time the additional copy takes adds up to quite a lot. Your regression test must be as quick as possible.

    I wrote a safer wrapper pseudo-codec that leverages the fact that both AVPacket and AVFrame use a ref-counted system:

    • The AVPacket takes the AVFrame and increases its ref-count by 1.
    • The AVFrame is then stored in the data field and wrapped in a custom AVBuffer.
    • That AVBuffer destructor callback unrefs the frame.

    This way the AVFrame data won’t change until the AVPacket gets destroyed.

    Goodbye AVPicture

    With the release 14 the AVPicture struct will be removed completely from Libav, people using it outside Libav should consider moving to use full AVFrame (and leverage the additional feature it provides) or the av_image functions directly.

    Friday, 02 October

    During the VDD we had lots of discussions and I enjoyed reviewing the initial NihAV implementation. Kostya already wrote some more about the decoupled API that I described at high level here.

    This article is about some possible implementation details, at least another will follow.

    The new API requires some additional data structures, mainly something to keep the data that is being consumed/produced, additional implementation-callbacks in AVCodec and possibly a mean to skip the queuing up completely.

    Data Structures

    AVPacketQueue and AVFrameQueue

    In the previous post I considered as given some kind of Queue.

    Ideally the API for it could be really simple:

    typedef struct AVPacketQueue;
    AVPacketQueue *av_packet_queue_alloc(int size);
    int av_packet_queue_put(AVPacketQueue *q, AVPacket *pkt);
    int av_packet_queue_get(AVPacketQueue *q, AVPacket *pkt);
    int av_packet_queue_size(AVPacketQueue *q);
    void av_packet_queue_free(AVPacketQueue **q);
    typedef struct AVFrameQueue;
    AVFrameQueue *av_frame_queue_alloc(int size);
    int av_frame_queue_put(AVFrameQueue *q, AVPacket *pkt);
    int av_frame_queue_get(AVFrameQueue *q, AVPacket *pkt);
    int av_frame_queue_size(AVFrameQueue *q);
    void av_frame_queue_free(AVFrameQueue **q);

    Internally it leverages the ref-counted API (av_packet_move_ref and av_frame_move_ref) and any data structure that could fit the queue-usage. It will be used in a multi-thread scenario so a form of Lock has to be fit into it.

    We have already something specific for AVPlay, using a simple Linked List and a FIFO for some other components that have a near-constant maximum number of items (e.g. avconv, NVENC, QSV).

    Possibly also a Tree could be used to implement something such as av_packet_queue_insert_by_pts and have some forms of reordering happen on the fly. I’m not a fan of it, but I’m sure someone will come up with the idea..

    The Queues are part of AVCodecContext.

    typedef struct AVCodecContext {
        // ...
        AVPacketQueue *packet_queue;
        AVFrameQueue *frame_queue;
        // ...
    } AVCodecContext;

    Implementation Callbacks

    In Libav the AVCodec struct describes some specific codec features (such as the supported framerates) and hold the actual codec implementation through callbacks such as init, decode/encode2, flush and close.
    The new model obviously requires additional callbacks.

    Once the data is in a queue it is ready to be processed, the actual decoding or encoding can happen in multiple places, for example:

    • In avcodec_*_push or avcodec_*_pull, once there is enough data. In that case the remaining functions are glorified proxies for the matching queue function.
    • somewhere else such as a separate thread that is started on avcodec_open or the first avcodec_decode_push and is eventually stopped once the context related to it is freed by avcodec_close. This is what happens under the hood when you have certain hardware acceleration.


    typedef struct AVCodec {
        // ... previous fields
        int (*need_data)(AVCodecContext *avctx);
        int (*has_data)(AVCodecContext *avctx);
        // ...
    } AVCodec;

    Those are used by both the encoder and decoder, some implementations such as QSV have functions that can be used to probe the internal state in this regard.


    typedef struct AVCodec {
        // ... previous fields
        int (*decode_push)(AVCodecContext *avctx, AVPacket *packet);
        int (*decode_pull)(AVCodecContext *avctx, AVFrame *frame);
        // ...
    } AVCodec;

    Those two functions can take a portion of the work the current decode function does, for example:
    – the initial parsing and dispatch to a worker thread can happen in the _push.
    – reordering and blocking until there is data to output can happen on _pull.

    Assuming the reordering does not happen outside the pull callback in some generic code.


    typedef struct AVCodec {
        // ... previous fields
        int (*encode_push)(AVCodecContext *avctx, AVFrame *frame);
        int (*encode_pull)(AVCodecContext *avctx, AVPacket *packet);
    } AVCodec;

    As per the Decoding callbacks, encode2 workload is split. the _push function might just keep queuing up until there are enough frames to complete the initial the analysis, while, for single thread encoding, the rest of the work happens at the _pull.

    Yielding data directly

    So far the API mainly keeps some queue filled and let some magic happen under the hood, let see some usage examples first:

    Simple Usage

    Let’s expand the last example from the previous post: register callbacks to pull/push the data and have some simple loops.


    typedef struct DecodeCallback {
        int (*pull_packet)(void *priv, AVPacket *pkt);
        int (*push_frame)(void *priv, AVFrame *frame);
        void *priv_data_pull, *priv_data_push;
    } DecodeCallback;

    Two pointers since you pull from a demuxer+parser and you push to a splitter+muxer.

    int decode_loop(AVCodecContext *avctx, DecodeCallback *cb)
        AVPacket *pkt  = av_packet_alloc();
        AVFrame *frame = av_frame_alloc();
        int ret;
        while ((ret = avcodec_decode_need_data(avctx)) > 0) {
            ret = cb->pull_packet(cb->priv_data_pull, pkt);
            if (ret < 0)
                goto end;
            ret = avcodec_decode_push(avctx, pkt);
            if (ret < 0)
                goto end;
        while ((ret = avcodec_decode_have_data(avctx)) > 0) {
            ret = avcodec_decode_pull(avctx, frame);
            if (ret < 0)
                goto end;
            ret = cb->push_frame(cb->priv_data_push, frame);
            if (ret < 0)
                goto end;
        return ret;


    For encoding something quite similar can be done:

    typedef struct EncodeCallback {
        int (*pull_frame)(void *priv, AVFrame *frame);
        int (*push_packet)(void *priv, AVPacket *packet);
        void *priv_data_push, *priv_data_pull;
    } EncodeCallback;

    The loop is exactly the same beside the data types swapped.

    int encode_loop(AVCodecContext *avctx, EncodeCallback *cb)
        AVPacket *pkt  = av_packet_alloc();
        AVFrame *frame = av_frame_alloc();
        int ret;
        while ((ret = avcodec_encode_need_data(avctx)) > 0) {
            ret = cb->pull_frame(cb->priv_data_pull, frame);
            if (ret < 0)
                goto end;
            ret = avcodec_encode_push(avctx, frame);
            if (ret < 0)
                goto end;
        while ((ret = avcodec_encode_have_data(avctx)) > 0) {
            ret = avcodec_encode_pull(avctx, pkt);
            if (ret < 0)
                goto end;
            ret = cb->push_packet(cb->priv_data_push, pkt);
            if (ret < 0)
                goto end;
        return ret;


    Transcoding, the naive way, could be something such as

    int transcode(AVFormatContext *mux,
                  AVFormatContext *dem,
                  AVCodecContext *enc,
                  AVCodecContext *dec)
        DecodeCallbacks dcb = {
            dem, enc->frame_queue };
        EncodeCallbacks ecb = {
            enc->frame_queue, mux };
        int ret = 0;
        while (ret > 0) {
            if ((ret = decode_loop(dec, &dcb)) > 0)
                ret = encode_loop(enc, &ecb);

    One loop feeds the other throught the queue. get_packet and push_packet are muxing and demuxing functions, they might end up being other two Queue functions once the AVFormat layer gets a similar overhaul.

    Advanced usage

    From the examples above you would notice that in some situation you would possibly do better,
    all the loops pull data from a queue push it immediately to another:

    • why not feeding right queue immediately once you have the data ready?
    • why not doing some processing before feeding the decoded data to the encoder, such as conver the pixel format?

    Here some additional structures and functions to enable advanced users:

    typedef struct AVFrameCallback {
        int (*yield)(void *priv, AVFrame *frame);
        void *priv_data;
    } AVFrameCallback;
    typedef struct AVPacketCallback {
        int (*yield)(void *priv, AVPacket *pkt);
        void *priv_data;
    } AVPacketCallback;
    typedef struct AVCodecContext {
    // ...
    AVFrameCallback *frame_cb;
    AVPacketCallback *packet_cb;
    // ...
    } AVCodecContext;
    int av_frame_yield(AVFrameCallback *cb, AVFrame *frame)
        return cb->yield(cb->priv_data, frame);
    int av_packet_yield(AVPacketCallback *cb, AVPacket *packet)
        return cb->yield(cb->priv_data, packet);

    Instead of using directly the Queue API, would be possible to use yield functions and give the user a mean to override them.

    Some API sugar could be something along the lines of this:

    int avcodec_decode_yield(AVCodecContext *avctx, AVFrame *frame)
        int ret;
        if (avctx->frame_cb) {
            ret = av_frame_yield(avctx->frame_cb, frame);
        } else {
            ret = av_frame_queue_put(avctx->frame_queue, frame);
        return ret;

    Whenever a frame (or a packet) is ready it could be passed immediately to another function, depending on your threading model and cpu it might be much more efficient skipping some enqueuing+dequeuing steps such as feeding directly some user-queue that uses different datatypes.

    This approach might work well even internally to insert bitstream reformatters after the encoding or before the decoding.

    Open problems

    The callback system is quite powerful but you have at least a couple of issues to take care of:
    – Error reporting: when something goes wrong how to notify what broke?
    – Error recovery: how much the user have to undo to fallback properly?

    Probably this part should be kept for later, since there is already a huge amount of work.

    What’s next

    Muxing and demuxing

    Ideally the container format layer should receive the same kind of overhaul, I’m not even halfway documenting what should
    change, but from this blog post you might guess the kind of changes. Spoiler: The I/O layer gets spun in a separate library.

    Proof of Concept

    Soon^WNot so late I’ll complete a POC out of this and possibly hack avplay so that either it uses QSV or videotoolbox as test-case (depending on which operating system I’m playing with when I start), probably I’ll see which are the limitations in this approach soon.

    If you like the ideas posted above or you want to discuss them more, you can join the Libav irc channel or mailing list to discuss and help.

    Thursday, 10 September

    This is a tiny introduction to Libav, the organization.


    The project aims to provide useful tools, written in portable code that is readable, trustworthy and performant.

    Libav is an opensource organization focused on developing libraries and tools to decode, manipulate and encode multimedia content.


    The project tries to be as non-hierarchical as possible. Every contributor must abide by a well defined set of rules, no matter which role.

    For decisions we strive to reach near-unanimous consensus. Discussions may happen on irc, mailing-list or in real life meetings.

    If possible, conflicts should be avoided and otherwise resolved.

    Join us!

    We are always looking for enthusiastic new contributors and will help you get started. Below you can find a number of possible ways to contribute. Please contact us.


    Even if the project is non-hierarchical, it is possible to define specific roles within it. Roles do not really give additional power but additional responsibilities.


    Contributing to Libav makes you a Contributor!
    Anybody who reviews patches, writes patches, helps triaging bugs, writes documentation, helps people solve their problems, or keeps our infrastructure running is considered a contributor.

    It does not matter how little you contribute. Any help is welcome.

    On top of the standard great feats of contributing to an opensource project, special chocolate is always available during the events.


    Many eyes might not make every bug shallow, but probably a second and a third pair might prevent some silly mistakes.

    A reviewer is supposed to read the new patches and prevent mistakes (silly, tiny or huge) to land in the master.

    Because of our workflow, spending time reading other people patches is quite common.

    People with specific expertise might get nagged to give their opinion more often than others, but everybody might spot something that looks wrong and probably is.


    Checking that the bugs are fixed and ask for better reports is important.

    Bug wrangling involves making sure reported issues have all the needed information to start fixing the problem and checking if old issues are still valid or had been fixed already.


    Nobody can push a patch to the master until it is reviewed, but somebody has to push it once it is.

    Committers are the people who push code to the main repository after it has been reviewed.

    Being a committer requires you to take newly submitted patches, make sure they work as expected either locally or pushing them through our continuous integration system and possibly fix minor issues like typos.

    Patches from a committer go through the normal review process as well.

    Infrastructure Administrator

    The regression test system. git repository, the samples collection, the website, the patch trackers, the wiki and the issue tracker are all managed on dedicated hardware.

    This infrastructure needs constant maintaining and improving.

    Most of comes from people devoting their time and (beside few exceptions) their own hardware, definitely this role requires a huge amount of dedication.


    The project strives to provide a pleasant environment for everybody.

    Every contributor is considered a member of the team, regardless if they are a newcomer or a founder. Nobody has special rights or prerogatives.

    Well defined rules have been adopted since the founding of the project to ensure fairness.

    Code of Conduct

    A quite simple code of conduct is in place in our project.

    It boils down to respecting the other people and being pleasant to deal with.

    It is commonly enforced with a friendly warning, followed by the request to leave if the person is unable to behave and, then, eventual removal if anything else fails.

    Contribution workflow

    The project has a simple contribution workflow:

    • Every patch must be sent to the mailing-list
    • Every patch must get a review and an Ok before it lands in the master branch

    Code Quality

    We have plenty of documentation to make it easy for you to prepare patches.

    The reviewers usually help newcomers by reformatting the first patches and pointing and fixing common pitfalls.

    If some mistakes are not caught during the review, there are few additional means to prevent them from hitting a release.

    Post Scriptum

    This post tried to summarize the project and its structure as if the legends surrounding it do not exist and the project is just a clean slate. Shame on me for not having written this blog post 5 years ago.

    Past and Present

    I already wrote about the past and the current situation of Libav, if you are curious please do read the previous posts. I will probably blog again about the social issues soon.


    The Release 12 is in the ABI break window now and soon the release branch will be spun off! After that some of my plans to improve the API will see some initial implementations and hopefully will be available as part of the release 13 (and nihav)

    I will discuss avframe_yield first since Kostya already posted about a better way to handle container formats.

    Friday, 14 August

    I'd like to add the loop option to avconv. This option allows to repeat an input file given number of times, so the output contains specified number of inputs. The command is ./avconv -loop n -i infile outfile, n specifies how many times the input file should be looped in the output.

    How does this work?
    After processing the input file for the first time, avconv calls new seek_to_start function to seek back to the beginning of the file. av_seek_frame is called to perform seeking itself but there are other things needed for loop option to work.

    1) flush
    Flush decoder buffers to take out delayed frames. In avconv this is done by calling process_input_file with NULL as frame, process_input_packet had to be modified a little to not to signal EOF on the filters when seeking.

    2) timestamps (ts)
    To have correct timestamps in the "after seeking" part of the output stream they have to be corrected with ts = ts_{from the demuxer} + n * (duration of the input stream), n is number of times the input stream was processed so far . This duration is the duration of the longest stream in a file because all the streams have to be processed (or played) before starting the next loop. The duration of the stream is the last timestamp - the first timestamp + duration of the last frame. For the audio streams one "frame" is usually a constant number of samples and its duration is number of samples/sample rate. Video frames on the other side are displayed unevenly so their average framerate can be used for the last frame duration if available or if the average framerate is not known the last frame duration is just 1 (in the current time base).

    Thursday, 30 July

    We are getting closer to a new release and you can see it is an even release by the amount of old and crufty code we are dropping. This usually is welcomed by some people and hated by others. This post is trying to explain what we do and why we are doing it.

    New API and old API

    Since the start of Libav we tried to address the painful shortcomings of the previous management, here the short list:

    • No leaders or dictators, there are rules agreed by consensus and nobody bends them.
    • No territoriality, nobody “owns” a specific area of the codebase nor has special rights on it.
    • No unreviewed changes in the tree, all the patches must receive an Ok by somebody else before they can be pushed in the tree.
    • No “cvs is the release”, major releases at least twice per year, bugfix-only point releases as often as needed.
    • No flames and trollfests, some basic code of conduct is enforced.

    One of the effect of this is that the APIs are discussed, proposals are documented and little by little we are migrating to a hopefully more rational and less surprising API.

    What’s so bad regarding the old API?

    Many of the old APIs were not designed at all, but just randomly added because mplayer or ffmpeg.c happened to need some
    feature at the time. The result was usually un(der)documented, hard to use correctly and often not well defined in some cases. Most users of the old API that I’ve seen actually used it wrong and would at best occasionally fail to work, at worst crash randomly.
    – Anton

    To expand a bit on that you can break down the issues with the old API in three groups:

    • Unnamespaced common names (e.g. CODEC_ID_NONE), those may or might not clash with other libraries.
    • Now-internal-only fields previously exposed that were expected to be something that are not really are (e.g. AVCodecContext.width).
    • Functionality not really working well (e.g. the old audio resampler) for which a replacement got provided eventually (AVResample).

    The worst result of API misuse could be a crash in specific situations (e.g. if you use the AVCodecContext dimension when you should use the AVFrame dimensions to allocate your screen surface you get quite an ugly crash since the former represent the decoding time dimension while the latter the dimensions of the frame you are going to present and they can vary a LOT).

    But Compatibility ?!

    In Libav we try our best to give migration paths and in the past years we even went over the extra mile by providing patches for quite a bit of software Debian was distributing at the time. (Since nobody even thanked for the effort, I doubt the people involved would do that again…)

    Keeping backwards compatibility forever is not really feasible:

    • You do want to remove a clashing symbol from your API
    • You do want to not have application crashing because of wrong assumptions
    • You do want people to use the new API and not keep compatibility wrappers that might not work in certain
      corner cases.

    The current consensus is to try to keep an API deprecated for about 2 major releases, with release 12 we are dropping code that had been deprecated since 2-3 years ago.


    I had been busy with my dayjob deadlines so I couldn’t progress on the new api for avformat and avcodec I described before, probably the next blogpost will be longer and a bit more technical again.

    Thursday, 09 July

    Debian decided to move to the new FFmpeg, what does it mean to me? Why should I care? This post won’t be technical for once, if you think “Libav is evil” start reading from here.

    Relationship between Libav and Debian

    After split between what was FFmpeg in two projects, with Michael Niedermayer keeping the name due his ties with the legal owner of the trademark and “merging” everything the group of 18 people was doing under the new Libav name.

    For Gentoo I, maybe naively, decided to just have both and let whoever want maintain the other package. Gentoo is about choice and whoever wants to shot himself on a foot has to be be free to do that in the safest possible way.

    For Debian, being binary packaged, who was maintaining the package decided to stay with Libav. It wasn’t surprising given “lack of releases” was one of the sore points of the former FFmpeg and he started to get involved with upstream to try to fix it.

    Perceived Leverage and Real Shackles

    Libav started with the idea to fix everything that went wrong with the Former FFmpeg:
    – Consensus instead of idolatry for THE Leader
    – Paced releases instead of cvs is always a release
    – Maintained releases branches for years
    git instead of svn
    – Cleaner code instead of quick hacks to solve the problem of the second
    – Helping downstreams instead of giving them the finger.

    Being in Debian, according to some people was undeserved because “Libav is evil” and since we wrongly though that people would look at actions and not at random blogpost by people with more bias than anything we just kept writing code. It was a huge mistake, this blogpost and this previous are my try to address this.

    Being in Debian to me meant that I had to help fixing stale version of software, often even without upstream.

    The people at Debian instead of helping, the amount of patches coming from people over the years amounted to 1 according to git, kept piling up work on us.

    Fun requests such as “Do remove a standard test image because its origin according to them is unclear” or “Do maintain the ancient release branch that is 3 major releases behind” had been quite common.

    For me Debian had been no help and additional bourden.

    The leverage that being in a distribution theoretically gives according to those crying because the evil Libav was in Debian amounts to none to me: their user complain because the version provided is stale, their developers do not help even keeping the point releases up or updating the software using Libav because scared to be tainted, downstreams such as Kubi (that are so naive to praise FFmpeg for what happened in Libav, such as the HEVC multi-thread support Anton wrote) would keep picking the implementation they prefer and use ffmpeg-only API whenever they could (debian will ask us to fix that for them anyway).

    Is important being in Debian?

    Last time they were discussing moving to FFmpeg I had the unpleasant experience of reading lots of lovely email with passive-aggressive snide remarks such as “libav has just developers not users” or seeing the fruits of the smear campaign such as “is it true you stole the FFmpeg hardware” in their mailing list (btw during the past VDD the FFmpeg people there said at least that would be addressed, well, it had not been yet, thank you).

    At that time I got asked to present Libav, this time after reading in the debian wiki the “case” presented with skewed git statistics (maybe purge the merge commits when you count them to compare a project activity?) and other number dressing I just got sick of it.

    Personally I do not care. There is a better way to spend your own free time than do the distro maintenance work for people that not even thanks you (because you are evil).

    The smear campaign pays

    I’m sure that now that now that the new FFmpeg gets to replace Libav will get more contributions from people and maybe those that were crying for the “oh so unjust” treatment would be happy to do the maintenance churn.

    Anyway that’s not my problem anymore and I guess I can spend more time writing about the “social issues” around the project trying to defuse at least a little the so effective “Libav is evil” narrative a post a time.

    Friday, 03 July

    Last weekend some libav developers met in the South Pole offices with additional sponsorship from Inteno Broadband Technology. (And the people at Borgodoro that gave us more chocolate to share with everybody).


    Since last year the libav started to have sprints to meet up, discuss in person topics that require a more direct media than IRC or Mailing List and usually write some code asking for direct opinions and help.

    Who attended

    Benjamin was our host for the event. Andreas joined us for the first day only, while Anton, Vittorio, Kostya, Janne, Jan and Rémi stayed both days.

    What we did

    The focus had been split in a number of area of interests:

    • API: with some interesting discussion between Rémi and Anton regarding on how to clarify a tricky detail regarding AVCodecContext and AVFrame and who to trust when.
    • Reverse Engineering: With Vittorio and Kostya having fun unraveling codecs one after the other (I think they got 3 working)
    • Release 12 API and ABI break
      • What to remove and what to keep further
      • What to change so it is simpler to use
      • If there is enough time to add the decoupled API for avcodec
    • Release 12 wishlist:
      • HEVC speed improvements, since even the C code can be sped up.
      • HEVC extended range support, since there is YUV 422 content out now.
      • More optimizations for the newer architectures (aarch64 and power64le)
      • More hardware accelerator support (e.g. HEVC encoding and decoding support for Intel MediaSDK).
      • Some more filters, since enough people asked for them.
      • Merge some of the pending work (e.g. go2meeting3, the new asf demuxer).
      • Get more security fixes in (with ago kindly helping me on this).
      • … and more …
    • New website with markdown support to make easier for people to update.

    During the sprint we managed to write a lot of code and even to push some during the sprint.
    Maybe a little too early in the case of asf, but better have it in and get to fix it for the release.

    Special mention to Jan for getting a quite exotic container almost ready, I’m looking forward to see it in the ml; and Andreas for reminding me that AVScale is needed sorely by sending me a patch that fixes a problem his PowerPC users are experiencing while uncovering some strange problem in swscale… I’ll need to figure out a good way to get a PowerPC big-endian running to look at it in detail.

    Thank you

    I want to especially thank all the people at South Pole that welcome me when I arrived with 1 day in advance and all the people that participated and made the event possible, had been fun!

    Post Scriptum

    • This post had been delayed 1 week since I had been horribly busy, sorry for the delay =)
    • During the sprint legends such as kropping the sourdough monster and the burning teapot had been created, some reference of them will probably appear in commits and code.
    • Anybody with experience with qemu-user for PowerPC is welcome to share his knowledge with me.

    Wednesday, 25 March

    I am hearing a lot of persons interested in open-source and giving back to the community. I think it can be an exciting experience and it can be positive in many different ways: first of all more contributors mean better open-source software being produced and that is great, but it also means that the persons involved can improve their skills and they can learn more about how successful projects get created.

    So I wondered why many developers do not do the first step: what is stopping them to send the first patch or the first pull-request? I think that often they do not know where to start or they think that contributing to the big projects out there is intimidating, something to be left to an alien form of life, some breed of extra-good programmers totally separated by the common fellows writing code in the world we experience daily.

    I think that hearing the stories of a few developers that have given major contributions to top level project could help to go over these misconceptions. So I asked a few questions to this dear friend of mine, Luca Barbato, who contributed among the others to Gentoo and VLC.

    Let’s start from the beginning: when did you start programming?

    I started dabbling stuff during high school, but I started doing something more consistent at the time I started university.

    What was your first contribution to an open-source project?

    I think either patching the ati-drivers to work with the 2.6 series or hacking cloop (a early kernel module for compressed loops) to use lzo instead of gzip.

    What are the main projects you have been involved into?

    Gentoo, MPlayer, Libav, VLC, cairo/pixman

    How did you started being involved in Gentoo? Can you explain the roles you have covered?

    Daniel Robbins invited me to join, I thought “why not?

    During the early times I took care of PowerPC and [Altivec](, then I focused on the toolchain due the fact it gcc and binutils tended to break software in funny ways, then multimedia since altivec was mainly used there. I had been part of the Council a few times used to be a recruiter (if you want to join Gentoo feel free to contact me anyway, we love to have more people involved) and I’m involved with community relationship lately.

    Note: Daniel Robbins is the creator of Gentoo, a Linux distribution. 

    Are there other less famous projects you have contributed to?

    I have minor contributions in quite a bit of software due. The fact is that in Gentoo we try our best to upstream our changes and I like to get back fixes to what I like to use.

    What are your motivations to contribute to open-source?

    Mainly because I can =)

    Who helped you to start contributing? From who you have learnt the most?

    Daniel Robbins surely had been one of the first asking me directly to help.

    You learn from everybody so I can’t name a single person among all the great people I met.

    How did you get to know Daniel Robbins? How did he helped you?

    I was a gentoo user, I happened to do stuff he deemed interesting and asked me to join.

    He involved me in quite a number of interesting projects, some worked (e.g. Gentoo PowerPC), some (e.g. Gentoo Games) not so much.

    Do your contributions to open-source help your professional life?

    In some way it does, contrary to the assumption I’m just seldom paid to improve the projects I care about the most, but at the same time having them working helps me when I need them during the professional work.

    How do you face disagreement on technical solutions?

    I’m a fan of informed consensus, otherwise prototypes (as in “do, test and then tell me back”) work the best.

    To contribute to OSS are more important the technical skills or the diplomatic/relation skills?

    Both are needed at different time, opensource is not just software, you MUST get along with people.

    Have you found different way to organize projects? What works best in your opinion? What works worst?

    Usually the main problem is dealing with poisonous people, doesn’t matter if it is a 10-people project or a 300+-people project. You can have a dictator, you can have a council, you can have global consensus, poisonous people are what makes your community suffer a lot. Bonus point if the poisonous people get clueless fan giving him additional voices.

    Did you ever sent a patch for the Linux kernel?

    Not really, I’m not fond of that coding style so usually other people correct the small bugs I stumble upon before I decide to polish my fix so it is acceptable =)

    Do you have any suggestions for people looking to get started contributing to open-source?

    Pick something you use, scratch your own itch first, do not assume other people are infallible or heroes.

    ME: I certainly agree with that, it is one of the best advices. However if you cannot find anything suitable at the end of this post I wrote a short list of projects that could use some help.

    Can you tell us about your best and your worst moments with contribution to OSS?

    The best moment is recurring and it is when some user thanks you since you improved his or her life.

    The worst moment for me is when some rabid fan claims I’m evil because I’m contributing to Libav and even praises FFmpeg for something originally written in Libav in the same statement, happened more than once.

    What are you working on right now and what plans do you have for the future?

    Libav, plaid, bmdtools, commonmark. In the future I might play a little more with [rust](

    Thanks Luca! I would be extremely happy if this short post could give to someone the last push they need to contribute to an existing open-source project or start their own: I think we could all use more, better, open-source software. So let’s write it.

    One thing I admire in Luca is that he is always curious and ready to jump on the next challenge. I think this is the perfect attitude to become an OSS contributor: just start play around with the things you like and talk to people, you could find more possibilities to contribute that you could imagine.

    …and one final thing: Luca is also the author of open-source recipes: he created the recipes of two types of chocolate bars dedicated to Libav and VLC. You can find them on the borgodoro website.


    I suggest to take a look at his blog.

    A few open-source you could consider contributing to

    Well, just in case you are eager to start writing some code and you are looking for some projects to contribute to here there are a few, written with different technologies. If you want to start contributing to any of those and you need directions just drop me a line (federico at tomassetti dot me) and I would be glad to help!

    • If you are interested in contributing to Libav, you can take a look at this post: there I explained how I submitted my first patch (approved in the meantime!). It is written in C.

    • You could be also interested in plaid: it is a Python web application to manage git patches sent by e-mail (there are a few projects using this model like libav or the linux kernel)

    • WorldEngine, it is a world generator written in Python

    • Plate-tectonics, it is a library for plate tectonics simulation. It is written in C++

    • JavaParser a Java parser, written in Java

    • Incremental Java parser, an incremental Java parser, written in Scala

    The post How people get started contributing to open-source? A few questions to Luca Barbato, contributor to Gentoo, MPlayer, Libav, VLC, cairo/pixman appeared first on Federico Tomassetti - Consultant Software Engineer.

    Wednesday, 18 February

    I happened to have a few hours free and I was looking for some coding to do. I thought about VLC, the media player which I have enjoyed so much using over the years and I decided that I wanted to contribute in some way.

    To start helping in such a complex process there are a few steps involved. Here I describe how I got my first patched accepted. In particular I wrote a patch for libav, the library behind VLC.

    The general picture

    I started by reading the wiki. It is a very helpful starting point but the process to setup the environment and send a first patch was not yet 100% clear to me so I got in touch with some of the developers of libav to understand how they work and how I could start lending an hand with something simple. They explained me that the easier way to start is by solving issues reported by static analysis tools and style checkers. They use uncrustify to verify that the code is adhering to their style guidelines and they run coverity to check for potential issues like memory leaks or null deferences. So I:

    • started looking at some coverity issues
    • found something easy to address (a very simple null deference)
    • prepared the patch
    • submitted the patch

    After a few minutes the patch was approved by a committer, ready to be merged. The day after it made its way to the master branch. Yeah!

    Download source code, build libav and run the tests

    First of all, let’s clone the git repository:

    git clone git://

    Alternatively you could use the GitHub mirror, if you want to.

    At this point you may want to install all the dependencies. The instructions are platform specific, you can find them here. If you have Mac Os-X be sure to have installed yasm, because nasm does not work. If you have installed both configure will pick up yasm (correctly). Just be sure to run configure after installing yasm.

    If everything goes well you can now build libav by running:


    Note that it is fine to build in-tree (no need to build in a separate directory).

    Now it is time to run the tests. You will have to specify one directory where to download some samples, later used by tests. Let’s assume you wanted to put your samples under ~/libav-samples:

    mkdir ~/libav-samples
    # This download the samples
    make fate-rsync SAMPLES=~/libav-samples
    # This run the tests
    make fate

    Did everything run fine? Good! Let’s start to patch then!

    Write the patch

    First of all we need to find an open issue. Visit Coverity page for libav at You will have to ask for access and wait that someone grants it to you. When you will be able to login you will encounter a screen like this:

    Screenshot from 2015-02-14 19:39:06

    Here, this seems an easy one! The variable oggstream has been allocated by av_mallocz (basically a wrapper for malloc) but the result values has not been checked. If the allocation fails a NULL pointer is returned and when we will try to access it at the next line things are going end up unpleasantly. What we need to do is to check the return value of av_mallocz and if it is NULL we should return an error. The appropriate error to return in this case is AVERROR(ENOMEM). To get this information… you have to start reading code, getting familiar with the way of doing business of this codebase.

    Libav follows strict rules about the comments in git commits: use git log to look at previous commits and try to use the same style.

    Submitting the patch

    I think many of you are familiar with GitHub and the whole process of submitting a patch for revision. GitHub is great because it made that process so easy. However there are some projects (notably including the Linux kernel) which adopts another approach: they receive patches by e-mail.

    Git has a functionality that permits to submit a patch by e-mail with a simple command. The patch will be sent to the mailing list, discussed, and if approved the e-mail will be downloaded, processed through git and committed in the official repository. Does it sound cumbersome? Well, it sounds to me, spoiled as I am by GitHub and similar tools but, you know, if you go in Rome you should behave as the Romans do, so…  

    # This install the git extension for sending patches through e-mail
    sudo apt install git-email 
    # This submit a patch built using the last commit
    git send-email -1 --to

    Sending patches using gmail with 2-factor authentication enabled

    Now, many of you are using gmail and many of you have enable 2-factor authentication (right? If not, you should). If this is you case you will get an error along this lines:

    Password for 'smtp://': 5.7.9 Application-specific password required. Learn more at 5.7.9 cj12sm14743233wjb.35 - gsmtp

    Here you can find how to create a password for this goal: The name of the application that I had to create was smtp:// Note that I used the same name specified in the previous error message.

    What if I need to correct my patch?

    If things go well an e-mail with your patch will be sent to the mailing-list, someone will look at it and accept it. Most of the times you will receive suggestions about possible adjustments to be done to improve your password. When it happens you want to submit a new version of your patch in the same thread which contains your first version of the patch and the e-mails commenting it.

    To do that you want to update your patch (typically using git commit –amend) and then run something like:

    git send-email -1 --to --in-reply-to Message-ID:

    Of course you need to find out the message-id of the e-mail to which you want to reply. To do that in gmail select the “Show original” item from the contextual menu for the message and in the screen opened look for the Message-Id header.

    Tools to manage patches sent by e-mail

    There are also web applications which are used to manage the patches sent by e-mail. Libav is currently using Patchwork for managing patches. You can see it deployed at: Currently another tool has been developed to replace patchwork. It is named Plaid and I tried to help a little bit with that also 🙂


    Mine has been a very small contribution, and in the future I hope to be able to do more. But being a maintainer of other open-source projects I learned that also small help is useful and appreciated, so for today I feel good.

    Screenshot from 2015-02-14 22:29:48

    Please, if I am missing something help me correct this post

    The post How to contribute to Libav (VLC): just got my first patch approved appeared first on Federico Tomassetti - Consultant Software Engineer.

    Monday, 12 January

    I'm interested in history, so I like visinting castles (or their ruins) and historical towns. There're many of them here in Czech Republic. Some of Czech sights like those in Prague, Kutná Hora or Český Krumlov are well known and, especially during the summer, overcrowded by tourists.  But there are also very nice less popular places which are much calmer and it is really a pleasure to visit them. One of such places is Jindřichův Hradec town, where the third biggest castle in the Czech Republic is located. Town centre with this castle is really amazing, it is full of romatic little streets, churches, museums and ancient buildings. This small town has really big  historical centre compared to its size, so one can spend the whole day exploring the castle and it surroundings. Recently, I decided to visit the town again, it was windy day, but it was realtively warm for the winter. There weren't any tourists around, and I really enjoyed my visit. The only disadvantage of this trip was that castle is closed for visitors during the winter and museums have short opening hours on weekends.

    Thursday, 20 November

    I've participated to last Libav sprint in Torino. I made new ASF demuxer for Libav, but during the testing problems with rtsp a mms protocols has appeared. Therefore, my main task during the sprint was to fix these issues. 
    It was second time I was at such sprint and also my second Torino visit and the sprint was even better than I expected. It's really nice to see people I'm communicating throught the irc channel in person, the thing I like about Libav a lot is its friendly community. But the most important thing for me as the most unexperienced person among skilled developers was naturally their help. My mentors from OPW participated the sprint and as a result all the issues was fixed and patch was sent to the ML ( Also, these personal consultations can be very productive in learning new things and because I'm not native English speaker I realized few days I have to speak or even think in English are really helpful for getting better in it.
    The last day of the sprint we had a trip to a really magical place called Sacra di San Michele (

    I like to visit places like this, in Czech Republic, where I'm living, I'm visiting ancient castles. But I think it may be the oldest place I've ever been to, the oldest parts of it was built in 10th century.  I had a feeling the history is breathing on us from the walls. We were lucky about the weather, it was sunny during our visit and the view from the terrace on top of the building was really breathtaking. We saw peaks of Alps covered by snow that divides this part of Italy and France.

    Saturday, 15 November

    After my challenge with the fused multiply-add instructions I managed to find some time to write a new test utility. It’s written ad hoc for unpaper but it can probably be used for other things too. It’s trivial and stupid but it got the job done.

    What it does is simple: it loads both a golden and a result image files, compares the size and format, and then goes through all the bytes to identify how many differences are there between them. If less than 0.1% of the image surface changed, it consider the test a pass.

    It’s not a particularly nice system, especially as it requires me to bundle some 180MB of golden files (they compress to just about 10 MB so it’s not a big deal), but it’s a strict improvement compared to what I had before, which is good.

    This change actually allowed me to explore one change that I abandoned before because it resulted in non-pixel-perfect results. In particular, unpaper now uses single-precision floating points all over, rather than doubles. This is because the slight imperfection caused by this change are not relevant enough to warrant the ever-so-slight loss in performance due to the bigger variables.

    But even up to here, there is very little gain in performance. Sure some calculation can be faster this way, but we’re still using the same set of AVX/FMA instructions. This is unfortunate, unless you start rewriting the algorithms used for searching for edges or rotations, there is no gain to be made by changing the size of the code. When I converted unpaper to use libavcodec, I decided to make the code simple and as stupid as I could make it, as that meant I could have a baseline to improve from, but I’m not sure what the best way to improve it is, now.

    I still have a branch that uses OpenMP for the processing, but since most of the filters applied are dependent on each other it does not work very well. Per-row processing gets slightly better results but they are really minimal as well. I think the most interesting parallel processing low-hanging fruit would be to execute processing in parallel on the two pages after splitting them from a single sheet of paper. Unfortunately, the loops used to do that processing right now are so complicated that I’m not looking forward to touch them for a long while.

    I tried some basic profile-guided optimization execution, just to figure out what needs to be improved, and compared with codiff a proper release and a PGO version trained after the tests. Unfortunately the results are a bit vague and it means I’ll probably have to profile it properly if I want to get data out of it. If you’re curious here is the output when using rbelf-size -D on the unpaper binary when built normally, with profile-guided optimisation, with link-time optimisation, and with both profile-guided and link-time optimisation:

    % rbelf-size -D ../release/unpaper ../release-pgo/unpaper ../release-lto/unpaper ../release-lto-pgo/unpaper
        exec         data       rodata        relro          bss     overhead    allocated   filename
       34951         1396        22284            0        11072         3196        72899   ../release/unpaper
       +5648         +312         -192           +0         +160           -6        +5922   ../release-pgo/unpaper
        -272           +0        -1364           +0         +144          -55        -1547   ../release-lto/unpaper
       +7424         +448        -1596           +0         +304          -61        +6519   ../release-lto-pgo/unpaper

    It’s unfortunate that GCC does not give you any diagnostic on what it’s trying to do achieve when doing LTO, it would be interesting to see if you could steer the compiler to produce better code without it as well.

    Anyway, enough with the microptimisations for now. If you want to make unpaper faster, feel free to send me pull requests for it, I’ll be glad to take a look at them!

    Tuesday, 09 September

    In one of my previous posts I have noted I’m an avid audiobook consumer. I started when I was at the hospital, because I didn’t have the energy to read — and most likely, because of the blood sugar being out of control after coming back from the ICU: it turns out that blood sugar changes can make your eyesight go crazy; at some point I had to buy a pair of €20 glasses simply because my doctor prescribed me a new treatment and my eyesight ricocheted out of control for a week or so.

    Nowadays, I have trouble sleeping if I’m not listening to something, and I end up with the Audible app installed in all my phones and tablets, with at least a few books preloaded whenever I travel. Of course as I said, I keep the majority of my audiobooks in the iPod, and the reason is that while most of my library is on Audible, not all of it is. There are a few books that I have bought on iTunes before finding out about Audible, and then there are a few I received in CD form, including The Hitchhiker’s Guide To The Galaxy Complete Radio Series which is my among my favourite playlists.

    Unfortunately, to be able to convert these from CD to a format that the iPod could digest, I ended up having to buy a software called Audiobook Builder for Mac, which allows you to rip CDs and build M4B files out of them. What’s M4B? It’s the usual mp4 format container, just with an extension that makes iTunes consider it an audiobook, and with chapter markings in the stream. At the time I first ripped my audiobooks, ffmpeg/libav had no support for chapter markings, so that was not an option. I’ve been told that said support is there now, but I have not tried getting it to work.

    Indeed, what I need to find out is how to build an audiobook file out of a string of mp3 files, and I have no idea how to fix that now that I no longer have access to my personal iTunes account on a mac to re-download the Audiobook Builder and process them. In particular, the list of mp3s that I’m looking forward to merge together are the years 2013 and 2014 of BBC’s The News Quiz, to which I’m addicted and listen continuously. Being able to join them all together so I can listen to them with a multi-day-running playlist is one of the very few things that still let me sleep relatively calmly — I say relatively because I really don’t remember when was the last time I have slept soundly in about an year by now.

    Essentially, what I’d like is for Audible to let me sideload some content (the few books I did not buy from them, and the News Quiz series that I stitch together from the podcast), and create a playlist — then for what I’m concerned I don’t have to use an iPod at all. Well, beside the fact that I’d have to find a way to shut up notifications while playing audiobooks. Having Dragons of Autumn Twilight interrupted by the Facebook pop notification is not something that I’m looking forward for most of the time. And in some cases I even have had some background update disrupting my playback so there is definitely space for improvement.

    Friday, 15 August

    RealAudio files have several possible interleavers. The simplest is “Int0”, which means that the packets are in order. Today, I was contrasting “Int4” and “genr”. They both require rearranging data, in highly similar but not identical ways. “genr” is slightly more complex than “Int4”.

    A typical Int4 pattern, writing to subpacket 0, 1, 2, 3, etc, would read data from subpacket 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 1, 7, 13, etc, in that order – assuming subpkt_h is 12, as it was in one sample file. It is effectively subpacket_h rows of subpacket_h / 2 columns, counting up by subpacket_h / 2 and wrapping every two rows.

    A typical genr pattern is a little trickier. For subpacket_h = 14, and the same 6 columns per row as above, the pattern to read from looks like 0, 12, 24, 36, 48, 60, 72, 6, 18, 30, 42, 54, 66, 78, 1, etc.

    I spent most of today implementing genr, carefully working with a paper notebook, pencil, Python, and a terse formula from the old implementation:

    case DEINT_ID_GENR:
    for (x = 0; x < w/sps; x++) avio_read(pb, ast->*(h*x+((h+1)/2)*(y&1)+(y>>1)), sps);

    After various debug printfs, a lot of quality time in GDB running commands like x /94x (pkt->data + 14 * 94), a few interestingly garbled bits of audio playback, and a mentor pointing out I have some improvements to make on header parsing, I can play (some) genr files.

    I have also recently implemented SIPR support, and it works in both RA and RM files. RV10 video also largely works.

    Sunday, 10 August

    I've been participating in a Libav project with a few small contributions before, but implementing the ASF demuxer was my first complex enterprise. I was scared a little about it and I wasn't sure I'm able to handle such a task, because I'm not much experienced as programmer yet. Hopefully specifications and patient and wise mentors were there for me (and other OPW participants). When I started to implement  just the ASF parser - asfinfo with the specifications, I realized the specifications  are accurate and has  a logical structure, which helped me a lot. Sometimes I had a feeling some things are not explained clear but may be it's just my inexperience with understanding  the specs and my mentors always patiently helped me with things I didn't understand. Also I was unsure about proper packets handling and other Libav internal stuff I wasn't familiar with. I was searching for inspiration in another demuxers (I was forbidden to look at the old ASF demuxer for the fear of copying its code without understanding it) and my mentors answered my questions of course, so  I think writing a demuxer (not too complex or overcomplicated) with the specs and with patient people who helps end encourages you is not that hard after all. It's not that easy too, I've made many bugs, I was even hopeless few times and there're still much work left but I think it is doable. Point is, people (girls), don't be afraid, you can do it.

    Saturday, 09 August

    I've solved lost packets problems and finally my ASF demuxer started to work right at "ideal samples in vacuum".  So the time for fixing memory leaks had come and valgrind helped me a lot with this issue. After memory leaks was solved I had to start testing my demuxer on various samples of ASF format multimedia files. As expected, I've found many samples my demuxer failed for. The reasons was different - mostly it was my mistakes, misunderstood or overlooked parts of specs, but I think I found a case that needed unusual handling specs didn't mention about.
    some of problems was caused for example by
    * improper subpayloads handling - one should be really careful while reading specs to avoid problems for less common cases like one subpayload is inside single payload and there's is padding inside payload itself (while padding after payload is 0), but there was other problems too
    * I had to revise padding handling for all possible cases
    * ASF file has 3 places where ASF packet size is told - twice in the header objects and once in a packet itself, and specs are not specifying what should one do when they differs or at least I didn't found it
    * some stupid mistakes like when I just forgot to do something after adding new block to my code was really annoying
    Funny thing was when I fixed my demuxer for one group of samples and another one that worked before started to fail, I fixed this new group and third group failed. I was so much annoyed by this, but many mistakes I did was caused by my inexperience and I think one (at least me) just have to do all of these mistakes to get better.

    The other day I wrote about unpaper and the fact that I was working on making it use libav for file input. I have now finished converting unpaper (in a branch) so that it does not use its own image structure, but rather the same AVFrame structure that libav uses internally and externally. This meant not only supporting stripes, but using the libav allocation functions and pixel formats.

    This also enabled me to use libav for file output as well as input. While for the input I decided to add support for formats that unpaper did not read before, for output at the moment I’m sticking with the same formats as before. Mostly because the one type of output file I’d like to support is not currently supported by libav properly, so it’ll take me quite a bit longer to be able to use it. For the curious, the format I’m referring to is multipage TIFF. Right now libav only supports single-page TIFF and it does not support JPEG-compressed TIFF images, so there.

    Originally, I planned to drop compatibility with previous unpaper version, mostly because to drop the internal structure I was going to lose the input format information for 1-bit black and white images. At the end I was actually able to reimplement the same feature in a different way, and so I restored that support. The only compatibility issue right now is that the -depth parameter is no longer present, mostly because it and -type constrained the same value (the output format).

    To reintroduce the -depth parameter, I want to support 16-bit gray. Unfortunately to do so I need to make more fundamental changes to the code, as right now it expects to be able to get the full value at most at 24 bit — and I’m not sure how to scale a 16-bit grayscale to 24-bit RGB and maintain proper values.

    While I had to add almost as much code to support the libav formats and their conversion as there was there to load the files, I think this is still a net win. The first point is that there is no format parsing code in unpaper, which means that as long as the pixel format is something that I can process, any file that libav supports now or will support in the future will do. Then there is the fact that I ended up making the code “less smart” by removing codepath optimizations such as “input and output sizes match, so I won’t be touching it, instead I’ll copy one structure on top of the other”, which means that yes, I probably lost some performance, but I also gained some sanity. The code was horribly complicated before.

    Unfortunately, as I said in the previous post, there are a couple of features that I would have preferred if they were implemented in libav, as that would mean they’d be kept optimized without me having to bother with assembly or intrinsics. Namely pixel format conversion (which should be part of the proposed libavscale, still not reified), and drawing primitives, including bitblitting. I think part of this is actually implemented within libavfilter but as far as I know it’s not exposed for other software to use. Having optimized blitting, especially “copy this area of the image over to that other image” would be definitely useful, but it’s not a necessary condition for me to release the current state of the code.

    So current work in progress is to support grayscale TIFF files (PAL8 pixel format), and then I’ll probably turn to libav and try to implement JPEG-encoded TIFF files, if I can find the time and motivation to do so. What I’m afraid of is having to write conversion functions between YUV and RGB, I really don’t look forward to that. In the mean time, I’ll keep playing Tales of Graces f because I love those kind of games.

    Also, for those who’re curious, the development of this version of unpaper is done fully on my ZenBook — I note this because it’s the first time I use a low-power device to work on a project that actually requires some processing power to build, but the results are not bad at all. I only had to make sure I had swap enabled: 4GB of RAM are no longer enough to have Chrome open with a dozen tabs, and a compiler in the background.

    Saturday, 02 August

    I’ve resumed working on unpaper since I have been using it more than a couple of times lately and there has been a few things that I wanted to fix.

    What I’ve been working on now is a way to read input files in more formats; I was really aggravated by the fact that unpaper implemented its own loading of a single set of file formats (the PPM “rawbits”); I went on to look into libraries that abstract access to image formats, but I couldn’t find one that would work for me. At the end I settled for libav even though it’s not exactly known for being an image processing library.

    My reasons to choose libav was mostly found in the fact that, while it does not support all the formats I’d like to have supported in unpaper (PS and PDF come to mind), it does support the formats that it supports now (PNM and company), and I know the developers well enough that I can get bugs and features fixed or implemented as needed.

    I have now a branch can read files by using libav. It’s a very naïve implementation of it though: it reads the image into an AVFrame structure and then convert that into unpaper’s own image structure. It does not even free up the AVFrame, mostly because I’d actually like to be able to use AVFrame instead of unpaper’s structure. Not only to avoid copying memory when it’s not required (libav has functions to do shallow-copy of frames and mark them as readable when needed), but also because the frames themselves already contain all the needed information. Furthermore, libav 12 is likely going to include libavscale (or so Luca promised!) so that the on-load conversion can also be offloaded to the library.

    Even with the naïve implementation that I implemented in half an afternoon, unpaper not only supports the same input file as before, but also PNG (24-bit non-alpha colour files are loaded the same way as PPM, 1-bit black and white is inverted compared to PBM, while 8-bit grayscale is actually 16-bit with half of it defining the alpha channel) and very limited TIFF support (1-bit is the same as PNG; 8-bit is paletted so I have not implemented it yet, and as for colour, I found out that libav does not currently support JPEG-compressed TIFF – I’ll work on that if I can – but otherwise it is supported as it’s simply 24bpp RGB).

    What also needs to be done is to write out the file using libav too. While I don’t plan to allow writing files in any random format with unpaper, I wouldn’t mind being able to output through libav. Right now the way this is implemented, the code does explicit conversion back or forth between black/white, grayscale and colour at save time, and this is nothing different than the same conversion that happens at load time, and should rather be part of libavscale when that exists.

    Anyway, if you feel like helping with this project, the code is on GitHub and I’ll try to keep it updated soon.

    Sunday, 27 July

    Finally, all basic parts of ASF demuxer seems to work somehow.

     At last two weeks I fixed various bugs in my code and I hope packets handling is correct now. Only problem is that few packets at the end of the Data Object are still lost. Because I wanted a small break from this problem, my mentors allowed me to implement basic seeking first. ASF demuxer can now read index entries from Simple Index Object and adds them with av_add_index_entry to AVStream. So when Simple Index Object is present in an ASF file, my demuxer can seek to the requested time.

    Sunday, 13 July

    Skeleton of the new ASF demuxer was written, but only audio was demuxed properly now. Problem is complicated video frames handling in ASF format. I hope I finally found out how to process packets properly. ASF packet can contain single payload, single payload with subpayloads, multiple payloads or multiple payloads with subpayloads inside some of them. Every subpayload is always one frame, but single payload can be whole frame or just part of it. When ASF packet contains multiple payloads inside it, each of them can be one frame but it can be just fragment of it as well. When one of mulptiple payloads contains subpayloads, each of subpayload is one frame and it can be processed as AVPacket.
    For the case of fragmented frame in ASF packet I have to store several unfinished frames in ASFPacket structures that I've created for this purpose.  There should not be more than one unfinished frames per stream, so I have one ASFPacket in each ASFStream (ASFStream is structure for storing ASF stream properties). ASFPacket contains pointer to AVBufferRef where unfinished frame is stored. When frame is finished I can forward pointer to buffer  with data to AVPacket, set its properties like size, timestamps and others and finally return AVPacket.
    I introduced many bugs to my code that was working (at least ASF packets was parsed right and audio worked) and now I'm working on fixing all of them.

    I was accepted for OPW, May - August 2014 round with project "Rewrite the ASF demuxer". First task from my mentors was to create  wiki page  about ASF (Advanced Streaming Format), it was created at
    Interesting notes about other containers:

    Next task from my mentors was to write simple program which reads asf file and prints its structure, i.e. list of asf objects, metadata and codec information. ASF file consists of so called ASF Objects. There're 3 top-level objects - Header Object, Data Object and Index Object. Especially Header Object can contain many other objects to provide different asf features, for example Codec List Object for codec information or Metadata object for metadata.  One can recognise object with GUID,  which is 16 byte array (each byte is number) that identifies object type. I was confused about the fact the GUID number you read from the file is not matching the GUID from specs. For some historical reasons one have to modify GUIDs from specs (reorder the numbers) for match GUID read from the file.
    My program is working now and can list objects, codecs and metadata info, but it ignores Index Objects by then. I hope I'll add support for them soon. Also I want to print offsets for each object and read Data Object deeper.

    I hoped to finish my code for parsing ASF file this week, but it seems I unlikely now. I spent the whole week trying to parse ASF data packet.
       One packet consists of Error Correction Data -> Payload Parsing Information -> Payload Data -> Padding Data. Error Correction Data and Payload Data are optional. Payload data itself can be either "Single Payload", "Single Payload with Subpayloads", "Multiple Payload" or "Multiple Payloads with Subpayloads". All these objects contain several flags, so I got to practice using &  operator. While trying to implement reading packets properly, I made a lot of stupid mistakes which slowed down my work.
       At this point, it seems I can detect a packet and print out its offset as well as some information about it.  However deeper packet examination and reading subpayloads doesn't work correctly yet.
      Implementing  with specs can be fun, I enjoyed it.

    Finally I've finished my asfinfo tool which prints asf file data structure. It takes an asf file and tells you which objects are inside, offsets of the particular object, for some objects it also tells more detailed information. The most interesting information asfinfo can tell you is offset of every packet your file contains.
    Asfinfo was submitted to the libav-devel mailing list and now I'm applying different comments to it. If someone wants to compile it with libav it's here, for compilation moving of flags is needed
    I'll keep working at asfinfo by polishing it according to the libav developers comments, but now I've already started my main work - to write new ASF demuxer from scratch. It seems I'll reuse some code from asfinfo, that helped me a lot to understeand how to handle ASF file, mainly how to parse packets.

    Thursday, 03 July

    Today, I learned how to use framecrc as a debug tool. Many Libav tests use framecrc to compare expected and actual decoding. While rewriting existing code, the output from the old and new versions of the code on the same sample can be checked; this makes a lot of mistakes clear quickly, including ones that can be quite difficult to debug otherwise.

    Checking framecrcs interactively is straightforward: ./avconv -i somefile -c:a copy -f framecrc -. The -c:a copy specifies that the original, rather than decoded, packet should be used. The - at the end makes the output go to stdout, rather than a named file.

    The output has several columns, for the stream index, dts, pts, duration, packet size, and crc:

    0, 0, 0, 192, 2304, 0xbf0a6b45
    0, 192, 192, 192, 2304, 0xdd016b78
    0, 384, 384, 192, 2304, 0x18da71d6
    0, 576, 576, 192, 2304, 0xcf5a6a07
    0, 768, 768, 192, 2304, 0x3a84620a

    It is also unusually simple to find out what the fields are, as libavformat/framecrcenc.c spells it out quite clearly:

    static int framecrc_write_packet(struct AVFormatContext *s, AVPacket *pkt)
    uint32_t crc = av_adler32_update(0, pkt->data, pkt->size);
    char buf[256];

    snprintf(buf, sizeof(buf), “%d, %10″PRId64″, %10″PRId64″, %8d, %8d, 0x%08″PRIx32″\n”,
    pkt->stream_index, pkt->dts, pkt->pts, pkt->duration, pkt->size, crc);
    avio_write(s->pb, buf, strlen(buf));
    return 0;

    Keiler, one of my Libav mentors, patiently explained the above; I hope documenting it helps other people who are starting with Libav development.

    Thursday, 12 June

    Most recently, I have been adding documentation to Libav. Today, my work included writing a demuxer howto. In the last couple of weeks, I have also reimplemented RealAudio 1.0 support (2.0 is in progress), and learned more about Coccinelle and undefined behavior in C. Blog posts on these topics are pending.

    Tuesday, 20 May

    My first patch for undefined behavior eliminates left shifts of negative numbers, replacing a << b (where a can be negative) with a * (1 << b). This change fixes bug686, at least for fate-idct8x8 and libavcodec/dct-test -i (compiled with ubsan and fno-sanitize-recover). Due to Libav policy, the next step is to benchmark the change. I was also asked to write a simple benchmarking HowTo for the Libav wiki.

    First, I installed perf: sudo aptitude install linux-tools-generic
    I made two build directories, and built the code with defined behavior in one, and the code with undefined behavior in the other (with ../configure && make -j8 && make fate). Then, in each directory, I ran:

    perf stat --repeat 150 ./libavcodec/dct-test -i > /dev/null

    The results were somewhat more stable than with –repeat 30, but it still looks much more like noise than a meaningful result. I ran the command with –repeat 30 for both before the recorded 150 run, so both would start on equal footing. With defined behavior, the results were “0.121670022 seconds time elapsed ( +-  0.11% )”; with undefined behavior, “0.123038640 seconds time elapsed ( +-  0.15% )”. The best of a further three runs had the opposite result, shown below:

    % cat

    perf stat –repeat 150 ./libavcodec/dct-test -i > /dev/null

    Performance counter stats for ‘./libavcodec/dct-test -i’ (150 runs):

    120.427535 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% )
    21 context-switches # 0.178 K/sec ( +- 1.88% )
    0 cpu-migrations # 0.000 K/sec ( +-100.00% )
    226 page-faults # 0.002 M/sec ( +- 0.01% )
    455’393’772 cycles # 3.781 GHz ( +- 0.05% )
    <not supported> stalled-cycles-frontend
    <not supported> stalled-cycles-backend
    1’306’169’698 instructions # 2.87 insns per cycle ( +- 0.00% )
    89’674’090 branches # 744.631 M/sec ( +- 0.00% )
    1’144’351 branch-misses # 1.28% of all branches ( +- 0.18% )

    0.120741498 seconds time elapse

    % cat

    Performance counter stats for ‘./libavcodec/dct-test -i’ (150 runs):

    120.838976 task-clock (msec) # 0.997 CPUs utilized ( +- 0.11% )
    21 context-switches # 0.172 K/sec ( +- 1.98% )
    0 cpu-migrations # 0.000 K/sec
    226 page-faults # 0.002 M/sec ( +- 0.01% )
    457’077’626 cycles # 3.783 GHz ( +- 0.08% )
    <not supported> stalled-cycles-frontend
    <not supported> stalled-cycles-backend
    1’306’321’521 instructions # 2.86 insns per cycle ( +- 0.00% )
    89’673’780 branches # 742.093 M/sec ( +- 0.00% )
    1’148’393 branch-misses # 1.28% of all branches ( +- 0.11% )

    0.121162660 seconds time elapsed ( +- 0.11% )

    I also compared the disassembled code from jrevdct.o, before and after the changes to have defined behavior (using gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 on x86_64).

    In the build directory for the code with defined behavior:
    objdump -d libavcodec/jrevdct.o > def.dis
    sed -e 's/^.*://' def.dis > noline.def.dis

    In the build directory for the code with undefined behavior:
    objdump -d libavcodec/jrevdct.o > undef.dis
    sed -e 's/^.*://' undef.dis > noline.undef.dis

    Leaving aside difference in jump locations (despite the fact that they can impact performance), there are two differences:

    diff -u build_benchmark_undef/noline.undef.dis build_benchmark_def/noline.def.dis

    –       0f bf 50 f0             movswl -0x10(%rax),%edx
    +       0f b7 58 f0             movzwl -0x10(%rax),%ebxi

    It’s switched to using a zero-extension rather than sign-extension in one place.

    –       74 1c                   je     40 <ff_j_rev_dct+0x40>
    –       c1 e2 02                shl    $0x2,%edx
    –       0f bf d2                movswl %dx,%edx
    –       89 d1                   mov    %edx,%ecx
    –       0f b7 d2                movzwl %dx,%edx
    –       c1 e1 10                shl    $0x10,%ecx
    –       09 d1                   or     %edx,%ecx
    –       89 48 f0                mov    %ecx,-0x10(%rax)
    –       89 48 f4                mov    %ecx,-0xc(%rax)
    –       89 48 f8                mov    %ecx,-0x8(%rax)
    –       89 48 fc                mov    %ecx,-0x4(%rax)
    +       74 19                   je     3d <ff_j_rev_dct+0x3d>
    +       c1 e3 02                shl    $0x2,%ebx
    +       89 da                   mov    %ebx,%edx
    +       0f b7 db                movzwl %bx,%ebx
    +       c1 e2 10                shl    $0x10,%edx
    +       09 da                   or     %ebx,%edx
    +       89 50 f0                mov    %edx,-0x10(%rax)
    +       89 50 f4                mov    %edx,-0xc(%rax)
    +       89 50 f8                mov    %edx,-0x8(%rax)
    +       89 50 fc                mov    %edx,-0x4(%rax)

    Leaving aside differences in register use:

    –       0f bf d2                movswl %dx,%edx
    There is one extra movswl instruction in the version with undefined behavior, at least with the particular version of the particular compiler for the particular architecture checked.

    This is an example of a null result while benchmarking; neither version performs better, although any given benchmark has one or the other come out ahead, generally by less than the variance within the run. If this were a suggested performance change, it would not make sense to apply it. However, the point of this change was correctness; a performance increase is not expected, and the lack of a performance penalty is a bonus.

    Monday, 19 May

    One of my fantastic OPW mentors prepared a “Welcome task package”, of self-contained, approachable, useful tasks that can be done while getting used to the code, and with a much smaller scope than the core objective. This is awesome. To any mentors reading this: consider making a welcome package!

    Step one of it is to use ubsan with gdb. This turned out to be somewhat intricate, so I have decided to supplement the wiki’s documentation with a step-by-step guide for Ubuntu 14.04.

    1) Install clang-3.5 (sudo aptitude install clang-3.5), as Ubuntu 14.04 comes with gcc 4.8, which does not support -fsanitize=undefined.

    2) Under libav, mkdir build_ubsan && cd build_ubsan && ../configure --toolchain=clang-usan --extra-cflags=-fno-sanitize-recover (alternatively, –cc=clang –extra-cflags=-fsanitize=undefined –extra-ldflags=-fsanitize=undefined can be used instead of –toolchain=clang-usan).

    3) make -j8 && make fate

    4) Watch where the tests die (they only die if –extra-cflags=-fno-sanitize-recover is used). For me, they died on TEST idct8x8. Running make V=1 fate and asking my mentors pointed me towards libavcodec/dct-test -i, which is dying on jrevdct.c:310:47: with “runtime error: left shift of negative value -14”. If you really want to err on the side of caution, make a second build dir, and ./configure --cc=clang && make -j8 && make fate in it, making sure it does not fail… this confirms that the problem is related to configuring with –toolchain=clang-usan (and, it turns out, with -fsanitize=undefined).

    5) It’s time to use the information my mentor pointed out on the wiki about ubsan at  – specifically, the information about useful gdb breakpoints. I put a modified version of the b_u definitions into ~/.gdbinit. The wiki has been updated now, but was originally missing a few functions, including one that turns out to be relevant: __ubsan_handle_shift_out_of_bounds

    6 Run gdb ./libavcodec/dct-test, then at the gdb prompt, set args -i to set the arguments dct-test was being run with, and then b_u to load the ubsan breakpoints defined above. Then start the program: type run at the gdb prompt.

    7) It turns out that a problem can be found, and the program stops running. Get a backtrace with bt.

    680 in __ubsan_handle_shift_out_of_bounds ()
    #1  0x000000000048ac96 in __ubsan_handle_shift_out_of_bounds_abort ()
    #2  0x000000000042c074 in row_fdct_8 (data=<optimized out>) at /home/me/opw/libav/libavcodec/jfdctint_template.c:219
    #3  ff_jpeg_fdct_islow_8 (data=<optimized out>) at /home/me/opw/libav/libavcodec/jfdctint_template.c:273
    #4  0x0000000000425c46 in dct_error (dct=<optimized out>, test=<optimized out>, is_idct=<optimized out>, speed=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:246
    #5  main (argc=<optimized out>, argv=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:522

    It would be nice to see a bit more detail, so I wanted to compile the project so that less would be optimized out, and eventually settled on -O1 because compiling with ubsan and without optimizations failed (which I reported as bug 683). This led to a slightly better backtrace:

    #0  0x0000000000491a70 in __ubsan_handle_shift_out_of_bounds ()
    #1  0x0000000000492086 in __ubsan_handle_shift_out_of_bounds_abort ()
    #2  0x0000000000434dfb in ff_j_rev_dct (data=<optimized out>) at /home/me/opw/libav/libavcodec/jrevdct.c:275
    #3  0x00000000004258eb in dct_error (dct=0x4962b0 <idct_tab+64>, test=1, is_idct=1, speed=0) at /home/me/opw/libav/libavcodec/dct-test.c:246
    #4  0x00000000004251cc in main (argc=<optimized out>, argv=<optimized out>) at /home/me/opw/libav/libavcodec/dct-test.c:522

    It is possible to work around the problem by modifying the source code rather than the compiler flags: FFmpeg did so within hours of the bug report – the commit is at;a=commit;h=bebce653e5601ceafa004db0eb6b2c7d4d16f0c0 ! Both FFmpeg and Libav have also merged my patch to work around the problem (FFmpeg patch, Libav patch). The workaround of using -O1 was suggested by one of my mentors, lu_zero; –disable-optimizations does not actually disable all optimizations (in practice, it leaves in ones necessary for compilation), and it does not touch the -O1 that –toolchain=clang-usan now sets.

    Wanting a better backtrace leads to the next post: a detailed guide to narrowing down a bug in a the C compiler, Clang. Yes, I know, the problem is never a bug in the C compiler – but this time, it was.

    Thursday, 15 May

    What’s the fun of only running code on platforms you physically have? Portability is important, and Libav actively targets several platforms. It can be useful to be able to try out the code, even if the hardware is totally unavailable.

    Here is how to run Libav’s tests under aarch64, on x86_64 hardware and Ubuntu 14.04. This guide is provided in the hopes that it saves someone else 20 hours or more: there is a lot of once-excellent information which has become misleading, because a lot of progress has been made in aarch64 support. I have tried three approachs – building with Linaro’s cross-compiler, building under QEMU user emulation, and building under QEMU system emulation, and cross-compiling. Building with a cross-compiler is the fastest option. Building under user emulation is about ten times slower. Building under system emulation is about a hundred times slower. There is actually a fourth option, using ARM Foundation Model, but I have not tried it. Running under QEMU user emulation is the only approach I managed to make entirely work.

    For all three approaches, you will want a rootfs; I used Ubuntu Core. You can download Ubuntu Core for aarch64 (a minimal rootfs; see to learn more),  and untar it (as root) into a new directory. Then, set an environment variable that the rest of this guide/set of notes uses frequently, changing the path to match your system:

    export a64root=/path/to/your/aarch64/rootdir

    Approach 1 – build under QEMU’s user emulation.

    Step 1) Set up QEMU. The days when using SUSE branches were necessary are over, but it still needs to be statically linked, and not all QEMU packages are. Ubuntu has a static QEMU:

    sudo aptitude install qemu-user-static

    This package also sets up binfmt for you. You can delete broken or stale binfmt information by running:
    echo -1 > /proc/sys/fs/binfmt_misc/archnamehere – this can be useful, especially if you have previously installed QEMU by hand.

    Step 2) Copy your QEMU binary into the chroot, as root, with:

    cp `which qemu-aarch64-static` $a64root/usr/bin/

    Step 3) As root, set up the aarch64 image so it can do DNS resolution, so you can freely use apt-get:
    echo 'nameserver' > $a64root/etc/resolv.conf

    Step 4) Chroot into your new system. Run chroot $a64root /bin/bash as root.

    At this point, you should be able to run an aarch64 version of ls, and confirm with file /bin/ls that it is an aarch64 binary.

    Now you have a working, emulated, minimal aarch64 system.

    On x86, you would run aptitude build-dep libav, but there is no such package for aarch64 yet, so outside of the chroot, on the normal system, I installed apt-rdepends and ran:
    apt-rdepends --build-depends --follow=DEPENDS libav

    With version information stripped out, the following packages are considered dependencies:
    debhelper frei0r-plugins-dev libasound2-dev libbz2-dev libcdio-cdda-dev libcdio-dev libcdio-paranoia-dev libdc1394-22-dev libfreetype6-dev  libgnutls-dev libgsm1-dev libjack-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libopenjpeg-dev libopus-dev libpulse-dev libraw1394-dev librtmp-dev libschroedinger-dev libsdl1.2-dev libspeex-dev libtheora-dev libtiff-dev libtiff5-dev libva-dev libvdpau-dev libvo-aacenc-dev libvo-amrwbenc-dev libvorbis-dev libvpx-dev libx11-dev libx264-dev libxext-dev libxfixes-dev libxvidcore-dev libxvmc-dev texi2html yasm zlib1g-dev doxygen

    Many of the libraries do not have current aarch64 Ubuntu packages, and neither does frei0r-plugins-dev, but running aptitude install on the above list installs a lot of useful things – including build-essential. The full list is in the command below; the missing packages are non-essential.

    Step 5) Set it up: apt-get install aptitude

    aptitude install git debhelper frei0r-plugins-dev libasound2-dev libbz2-dev libcdio-cdda-dev libcdio-dev libcdio-paranoia-dev libdc1394-22-dev libfreetype6-dev  libgnutls-dev libgsm1-dev libjack-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libopenjpeg-dev libopus-dev libpulse-dev libraw1394-dev librtmp-dev libschroedinger-dev libsdl1.2-dev libspeex-dev libtheora-dev libtiff-dev libtiff5-dev libva-dev libvdpau-dev libvo-aacenc-dev libvo-amrwbenc-dev libvorbis-dev libvpx-dev libx11-dev libx264-dev libxext-dev libxfixes-dev libxvidcore-dev libxvmc-dev texi2html yasm zlib1g-dev doxygen

    Now it is time to actually build libav.

    Step 6) Create a user within your chroot: useradd -m auser, and switch to running as that user: sudo -u auser bash, and type cd to go to the home directory.

    Step 7) Run git clone git://, then ./configure --disable-pthreads && make -j8 (change the 8 to approximately the number of CPU cores you have).
    On my hardware, this takes 10-11 minutes, and ‘make fate’ takes about 16. Disabling pthreads is essential, as qemu-user does not handle threads well, and running the tests hangs randomly without it.

    Approach 2: cross-compile (warning: I do not have the tests working with this approach).

    1) Start by getting an aarch64 compiler. A good place to get one is; I am using . Untar it, and add it to your path:

    export PATH=$PATH:/path/to/your/linaro/tools/bin

    2) Make the cross-compiler work. Run aptitude install lsb lib32stdc++6. Without this, invoking the compiler will say “No such file or directory”. See

    3) Under the libav directory (run git clone git:// if you do not have one), type mkdir a64crossbuild; cd a64crossbuild. Make sure the libav directory is somewhere under $a64root (it should simplify running the tests, later).

    4)./configure --arch=aarch64 --cpu=generic --cross-prefix=aarch64-linux-gnu- --cc=aarch64-linux-gnu-gcc --target-os=linux --sysroot=$a64root --target-exec="qemu-aarch64-static -L $a64root" --disable-pthreads

    This is a minimal variant of Jannau’s configuration – a developer who has recently done a lot of libav aarch64 work.

    5) Run make -j8. On my hardware, it takes just under a minute.

    6) Run make fate. Unfortunately, both versions of QEMU I tried hung on wait4 at this point (in fft-test, fate-fft-4), and used an extra couple of hundred megabytes of RAM per second until I stopped QEMU, even if I asked it to wait for a remote GDB. For anyone else trying this, has several useful tips for getting the tests to run after cross-compilation.

    Approach 3: Use QEMU’s system emulation. In theory, this should allow you to use pthreads; in practice, the tests hung for me. The following May 9th post describes what to do: In short: git clone git:// qemu.git && cd qemu.git && ./configure --target-list=aarch64-softmmu && make, then

    ./aarch64-softmmu/qemu-system-aarch64 -machine virt -cpu cortex-a57 -machine type=virt -nographic -smp 1 -m 2048 -kernel aarch64-linux-3.15rc2-buildroot.img  --append "console=ttyAMA0" -fsdev local,id=r,path=$a64root,security_model=none -device virtio-9p-device,fsdev=r,mount_tag=r

    Then, under the buildroot system, log in as root (no password), and type mkdir /mnt/core && mount -t 9p -o trans=virtio r /mnt/core. At this point, you can run chroot /mnt/core /bin/bash, and follow the approach 1 instructions from useradd onwards, except that ./configure without –disable-pthreads should theoretically work. On my system, ./configure takes a bit over 5 minutes with this approach. Running make is quite slow; time make took 113 minutes. Do not use -j – you are limited to a single core, so -j would slow compilation down slightly. However, make fate consistently hung on acodec-pcm-alaw, and I have not yet figured out why.


    Things not to do:

    • Use a rootfs from a year ago; I am yet to try one that is not broken, and some come with fun bonuses like infinite file system loops. These cost me well over a dozen hours.
    • Compile SUSE’s QEMU; qemu-system is bleeding-edge enough that you need to compile it from upstream, but SUSE’s patches have long been merged into the normal QEMU upstream. Unless you want qemu-system, you do not need to compile QEMU at all under Ubuntu 14.04.
    • Leave the environment variables in this tutorial unset in a new shell and wonder why things do not work.


    Wednesday, 23 April

    Applying to OPW requires an initial contribution. The Libav IRC channel suggested porting the asettb filter from FFmpeg, so I did (version 5 of the patch was merged upstream, in two parts: a rename patch and a content patch; the FFmpeg author was credited as author for the latter, while I did a signed-off-by). I also contributed a 3000+ line documentation patch, standardizing the libavfilter documentation and removing numerous English errors, and triaged a few bugs, git bisecting the one that was reproducible.

    Sunday, 13 April

    And how it nearly ruined another video coding standard.

    Everyone knows that interlacing was a trick in the '80s for pseudo motion compensation with analogue video. This more or less worked because it mimicked how television worked back then. This technique was preserved when flat panels for pc and tv were introduced, for a mix of backward compatibility and technical limitations, and video coding features interlacing in MPEG2 and H264 and similar.

    However as with black and white, TACS and Gopher, old technology has to be replaced with modern and efficient technology, as a trade off of users' interests and technology providers' market prospects. In case you are not familiar, interlacing is a mess to support, makes decoding slower and heavily degrades quality. People saying that interlacing saves bandwidth do not know much about video coding and bad marketing claiming that higher resolution is better than higher framerate has an effect too.

    So, when ITU and then MPEG set out to establish the mandates for a new video standard capable of superseding H264, it was decided that interlacing was old enough, did more harm than good and it was time for retirement: HEVC was going to be the first video codec to officially deprecate interlacing.

    Things went pretty swell during its development, until a few months before the completion of the standard. A group of US companies complained that the proposed tools were not sufficient (a set of SEI messages and treating fields like progressive frames) and heavily protested with both standardisation bodies. ITU firmly rejected the idea (with the video group chair threatening to step down) while MPEG set out to understand the needs of the industry and see if there was anything that could be done.

    An ad-hoc group was established to see if there was any evidence that interlaced coding tool would have improved the situation. Things looked really shady, the Requirements group even mentioned that it was the first time that an AhG was established to look for evidence, instead of establishing an AhG because there was evidence. Several liasons from EBU and other DVB members tried to point out this absurdity while the threat of adding interlacing back in HEVC became real. Luckily the first version of the specifications got published in the meantime, so this decision didn't slow down the standardisation process.

    Why so much love towards interlacing? Well in the "rebellious" group defence, it is true that interlaced content in HEVC is less performant than in H264; however it is also true that such deinterlaced content in HEVC outperforms H264 in any configuration. Truth is that mass marketed deinterlacers (commonly found in televisions for example) bring a lot of royalty income, so it is normal that companies with vested interests would prefer to have interlacing in a soon-popular video standard like HEVC. Also in markets like US where the network operator (which has control on the encoding but not on the video source) might differ from the content provider, it could be politically difficult to act as a carrier only if you have to deinterlace a video.

    However these problems are actually not enough for forcing every encoder, decoder, analyser to support a deprecated technology like interlacing. Technical problems can be solved with good deinterlacers at the top of the distribution chain, while political ones can be solved amending contracts. Plus having progressive only video will definitely improve quality and let the industry concentrate on other delicate subjects, like bit depth, both properties going in favour of users' interests.

    At the last MPEG meeting, the "rebellious" group which had been working on reintroducing interlacing for a year provided no real evidence that interlaced coding tools would improve HEVC at all. The only sensible solution was to disband the group over this wasted effort and support progressive video only, which is what happened luckily. So now both ITU and MPEG support progressive video only and this has finally nailed it.

    Interlacing is dead, long live progressive.

    Written by Vittorio Giovara (
    Published under a CC-BY-SA 3.0 license.

    Tuesday, 25 March

    I am very glad to announce that Libav 10 has been released!

    This has a bunch of features that I contributed to, in particular regarding stereoscopic video and interlaced filtering, but more importantly this release has the work of an awesome group of people which has been carried out for a whole year. This is the magic of open source!

    I joined the group more or less one year ago, with some patches regarding an obscure H.264 specification which I then later reimplemented in HEVC and then I wrote a few filters I needed and then designed an API and then, wow! A whole year passed without me noticing, and I am still around, sending patches to the same group of people who welcomed someone who had problems with shifting values (sad but true story)!

    I met the team both at VDD and FOSDEM and they've been the most exciting conferences I ever went to (and I went to a lot of them). I couldn't believe I was with the devteam of my favourite multimeida opensource projects I've been following since I was a kid! Until a year ago, I saw the names from the commits and the blogposts from both VideoLAN and Libav projects and I had been thinking "Oh wouldn't it be so cool to be like one of them".

    The answer is yes, it definitely would, and it's something that can happen if one is really committed in it! The Libav Info page states "Being a committer is a duty, not a privilege", but it sure does feel like one.

    Thanks for this exciting year guys, I look forward to the next ones.

    Monday, 24 March

    ...using latest modern tools!

    X264 and VLC are two of the most awesomest opensource software you can find on-line and of course the pose no problem when you compile them on a Unix environment. Too bad that sometimes you need to think of Windowze as well, so we need a way to crosscompile that software: in this blogpost, I'll describe how to achieve that, using modern tools on a Ubuntu 12.04 installation.

    [0] Sources
    It goes without saying that without the following guides, I'd have had a much harder time!
    So a big thanks to all the original authors!

    [1] Introduction
    When you crosscompile you just use the same tools and toolchains that you are used to, gcc, ld and so on, but configured (and compiled) so that they produce executable code for a different platform. This platform can vary both in software and in hardware and it is usually identified by a triplet: the processor architecture, the ABI and the operating system.

    What we are going to use here is i686-w64-mingw32, which identifies any x86 cpu since the Pentium III, the w64 ABI used on modern Windows NT systems (if I'm not wrong), and the mingw32 architecture, that is the Windows gcc variant.

    [2] Prerequisites
    Note that the name of the packages might be slightly different according to your distribution. We are going to need a quite recent mingw-runtime for VLC (>=3.00) which has not yet landed on Ubuntu, so we'll take it from our Debian cousins.

    Execute this command

    $ wget
    $ sudo dpkg -i mingw-w64-dev_3.0~svn4933-1_all.deb

    and then install stock dependencies

    $ sudo dpkg -i gcc-mingw-w64 g++-mingw-w64
    $ sudo dpkg -i pkg-config yasm subversion cvs git-core

    [3] x264 and libav 
    x264 has very few dependencies, just pthreads and zlib, but it reaches its full potential when all of them are satisfied (encapsulation, avisynth support and so on).

    Loosely following Alex Jurkiewicz's work, we create a user-writable folder and then we prepare a script that sets some useful variables every time.

    $ mkdir -p ~/win32-cross/{src,lib,include,share,bin}


    export CC=$TRIPLET-gcc
    export CXX=$TRIPLET-g++
    export CPP=$TRIPLET-cpp
    export AR=$TRIPLET-ar
    export RANLIB=$TRIPLET-ranlib
    export ADD2LINE=$TRIPLET-addr2line
    export AS=$TRIPLET-as
    export LD=$TRIPLET-ld
    export NM=$TRIPLET-nm
    export STRIP=$TRIPLET-strip

    export PATH="/usr/i586-mingw32msvc/bin:$PATH"
    export PKG_CONFIG_PATH="$HOME/win32-cross/lib/pkgconfig/"

    export CFLAGS="-static -static-libgcc -static-libstdc++ -I$HOME/win32-cross/include -L$HOME/win32-cross/lib -I/usr/$TRIPLET/include -L/usr/$TRIPLET/lib"
    export CXXFLAGS="$CFLAGS"

    exec "$@"
    Please not the use of the CFLAGS variables: without all the static parameters, the executable will dynamically link gcc, so you'll need to bundle the equivalent dll. I prefer to have one single exe, so everything goes static, but I'm not really sure which flag is actually needed. If you have any idea, please drop me a line.

    Anyway, let's compile latest revision of pthreads (2.9.1 as of this writing)

    $ cd ~/win32-cross/src
    $ wget -qO - | tar xzvf -
    $ cd pthreads-w32-2-9-1-release
    $ make GC-static CROSS=i686-w64-mingw32-
    $ cp libpthreadGC2.a ../../lib
    $ cp *.h ../../include

    and zlib (1.2.7) - we need to remove the references to the libc library (which is implied anyway) otherwise we will get a linkage failure

    $ cd ~/win32-cross/src
    $ wget -qO - | tar xzvf -
    $ cd zlib-1.2.7
    $ ../../mingw ./configure
    $ sed -i"" -e 's/-lc//' Makefile
    $ make
    $ DESTDIR=../.. make install prefix=

    Now it's turn for libav, so that x264 can use different input chroma and other stuff. If you need libav exececutables, you might want to change the configure line so that it suits you

    $ cd ~/win32-cross/src
    $ git clone git://
    $ cd libav
    $ ./configure \
    --target-os=mingw32 --cross-prefix=i686-w64-mingw32- --arch=x86 --prefix=../.. \
    --enable-memalign-hack --enable-gpl --enable-avisynth --enable-runtime-cpudetect \
    --disable-encoders --disable-muxers --disable-network --disable-devices
    $ make
    $ make install

    and the nice tools that give more output options

    $ cd ~/win32-cross/src
    $ svn checkout ffms
    $ cd ffms
    $ ../../mingw ./configure --host=mingw32 --with-zlib=../.. --prefix=$HOME/win32-cross
    $ ../../mingw make
    $ make install

    $ cd $HOME/win32-x264/src
    # Create a CVS auth file on your machine
    $ cvs login
    $ cvs -z3 co -P gpac
    $ cd gpac
    $ chmod +rwx configure src/Makefile
    # Hardcode cross-prefix
    $ sed -i'' -e 's/cross_prefix=""/cross_prefix="i686-w64-mingw32-"/' configure
    $ ../../mingw ./configure --static --use-js=no --use-ft=no --use-jpeg=no \
          --use-png=no --use-faad=no --use-mad=no --use-xvid=no --use-ffmpeg=no \
          --use-ogg=no --use-vorbis=no --use-theora=no --use-openjpeg=no \
          --disable-ssl --disable-opengl --disable-wx --disable-oss-audio \
          --disable-x11-shm --disable-x11-xv --disable-fragments--use-a52=no \
          --disable-xmlrpc --disable-dvb --disable-alsa --static-mp4box \
          --extra-cflags="-I$HOME/win32-cross/include -I/usr/i686-w64-mingw32/include" \
          --extra-ldflags="-L$HOME/win32-cross/lib -L/usr/i686-w64-mingw32/lib"
    # Fix pthread lib name
    $ sed -i"" -e 's/pthread/pthreadGC2/' config.mak
    # Add extra libs that are required but not included
    $ sed -i"" -e 's/-lpthreadGC2/-lpthreadGC2 -lwinmm -lwsock32 -lopengl32 -lglu32/' config.mak
    $ make
    # Make will fail a few commands after building libgpac_static.a
    # (i586-mingw32msvc-ar cr ../bin/gcc/libgpac_static.a ...).
    # That's fine, we just need libgpac_static.a 
    i686-w64-mingw32-ranlib bin/gcc/libgpac_static.a 
    $ cp bin/gcc/libgpac_static.a ../../lib/
    $ cp -r include/gpac ../../include/

    Finally we can compile x264 at full power! The configure script will provide a list of what features have been activated, make sure everything you need is there!

    $ cd ~/win32-cross/src
    $ git clone git://
    $ cd x264
    $ ./configure --cross-prefix=i686-w64-mingw32- --host=i686-w64-mingw32 \
          --extra-cflags="-static -static-libgcc -static-libstdc++ -I$HOME/win32-cross/include" \
          --extra-ldflags="-static -static-libgcc -static-libstdc++ -L$HOME/win32-cross/lib" \
    $ make

    And you're done! Take that x264.exe file and use it wherever you want!
    Most of the work here has been outlined by Alex Jurkiewicz in this guide so checkout his blog for more nice guides!

    [4] VideoLAN
    On the other hand, VLC has a LOT of dependencies, but thankfully it also has a nice way to get them working quickly. If you read the wiki guide, you'll notice that it will use i586-mingw32msvc everywhere, but you should definitely avoid that! In fact that one offers a very old toolchain, under which VLC will fail to compile! Also the latest versions provides much better code, x264 will weight 46MB against 38MB in one case!

    So let's update every script to the more modern version i686-w64-mingw32! As usual, first of all get the sources

    $ git clone git:// vlc
    $ cd vlc 
    And let's get the dependencies through the contrib scripts, qt4 needs to be compiled by hand as the version in Ubuntu repositories doesn't cope well with the rest of the process. I also had to remove some of the files because they were of the wrong architecture (mileage might vary here) .

    $ mkdir -p contrib/win32
    $ cd contrib/win32
    $ ../bootstrap --host=
    $ make prebuilt
    $ make .qt4
    $ rm ../i686-w64-mingw32/bin/{moc,uic,rcc}
    $ cd -

    We now return to the main sources folder and launch the boostrap and configure process; you need some standard automake/libtool dependencies for this.

    $ ./bootstrap
    $ mkdir win32 && cd win32
    $ ../extras/package/win32/ --host=i686-w64-mingw32
    $ ./compile
    $ make package-win-common

    Let's grab something to drink and celebrate when the compilation ends! You'll find all the necessary files in the vlc-x.x.x folder. A big thanks goes to the wiki authors and j-b who gave me pointers on #videolan irc.

    [5] Conclusions
    Whelp, that was a long run! As additional benefit you are able to customize every single piece of software to your need, eg. you can modify the libav version that you are going to use for Vlc as you wish! Also crosscompiling is often treated as black magic, but in reality is a simple process that just needs more careful configuration. Errors often are related to wrong paths or missing dependencies and sometimes a combination of both; don't lose hope and keep going until you get what you want!

    For future reference, all (or most of) functions and structs in libav have a prefix that indicates the exposure of that functions. Those are

    • av_ meaning a public function, present in the API;
    • ff_ meaning a private function, not present in the API;
    • avpriv_ meaning inter-library private function, used internally across libraries only.
    Source: #libav-devel

    Friday, 13 January

    Well, I've finished the new audio decoding API, which has been merged into Libav master. The new audio encoding API is basically done, pending a (hopefully final) round of review before committing.

    Next up is audio timestamp fixes/clean-up. This is a fairly undefined task. I've been collecting a list of various things that need to be fixed and ideas to try. Plus, the audio encoding API revealed quite a few bugs in some of the demuxers. Today I started a sort of TODO list for this stage of the project. I'll be editing it as the project continues to progress.

    Friday, 28 October

    For the past few weeks I've been working on a new project sponsored by FFMTech. The entire project involves reworking much of the existing audio framework in libavcodec.

    Part 1 is changing the audio decoding API to match the video decoding API. Currently the audio decoders take packet data from an AVPacket and decode it directly to a sample buffer supplied by the user. The video decoders take packet data from an AVPacket and decode it to an AVFrame structure with a buffer allocated by AVCodecContext.get_buffer(). My project will include modifying the audio decoding API to decode audio from an AVPacket to an AVFrame, as is done with video.

    AVCODEC_MAX_AUDIO_FRAME_SIZE puts an arbitrary limit on the amount of audio data returned by the decoder. For example, each FLAC frame can hold up to 65536 samples for 8 channels at 32-bit sample depth, which is 2097152 bytes of raw audio, but AVCODEC_MAX_AUDIO_FRAME_SIZE is only 192000. Using get/release_buffer() for audio decoding will solve this problem. It will, however, require changes to every audio decoder. Most of those changes are trivial since the frame size is known prior to decoding the frame or is easily parsed. Some of the changes are more intrusive due to having to determine the frame size prior to allocating and writing to the output buffer.

    As part of the preparation for the new API, I have been cleaning up all the audio decoder, which has been quite tedious. I've found some pretty surprising bugs along the way. I'm getting close to finishing that part so I'll be able to move on to implementing the new API in each decoder.

    Wednesday, 27 July

    So, I've moved on from AHT now, and it's on to Spectral Extension (SPX).  I got the full syntax working yesterday, now I just need to figure out how to calculate all the parameters.  I have a feeling this will help quality quite a bit, especially when used in conjunction with variable bandwidth/coupling.  My vision for automatic bandwidth adjustment is starting to come together.

    SPX encoding/decoding is fairly straightforward, so I expect this won't take too long to implement.  Similar to channel coupling, the encoder writes coarsely banded scale factors for frequencies above the fully-encoded bandwidth, along with noise blending factors.  The decoder copies lower frequency coefficients to the upper bands, multiplies them by the scale factors, and blends them with noise (which has been scaled according to the band energy and the blending factors in the bitstream).  For the encoder, I just need to make the reconstructed coefficients match the original coefficients as closely as possible by calculating appropriate spectral extension coordinates and blending factors.  Also, like coupling coordinates, the encoder can choose how often to resend the parameters to balance accuracy vs. bitrate.

    Once SPX encoding is working properly, I'll revisit variable bandwidth.  However, instead of adjusting the upper cutoff frequency (which is somewhat complex to avoid very audible attack/decay), it will adjust the channel coupling and/or spectral extension ranges to keep the cutoff frequency constant while still adjusting to changes in signal complexity to keep a more stable quality level at a constant bitrate.  This could also be used in a VBR mode with constrained bitrate limits.

    If you want to follow the development, I have a separate branch at my Libav github repository.

    I finally got the complete AHT syntax working properly.  Unfortunately, the quality seems to be lower at all bitrates than with the normal AC-3 quantization.  I'm hoping that I just need to pick better gain values, but I have a suspicion that some of the difference is related to vector quantization, which the encoder has no control over (a basic 6-dimensional VQ minimum distance search is the best it can do).

    My first step is to find out for sure if choosing better gain values will help.  One problem is that the bit allocation model is saying we need X number of bits for each mantissa.  Using mode=0 (all zero gains) gives exactly X number of bits per mantissa (with no overhead for encoding the gain values), but the overall quality is lower than with normal AC-3 quantization or even GAQ with simplistic mode/gain decisions.  So I think that means there is some bias built-in to the AHT bit allocation table that assumes GAQ will appropriately fine-tune the final allocations.  Additionally, it could be that AHT should not always be turned on when the exponents are reused in blocks 1 through 5 (the condition required to use AHT).  This is probably the point where I need a more accurate bit allocation model...

    edit: After analyzing the bit allocation tables for AC-3 vs. E-AC-3, it seems there is no built-in bias in the GAQ range.  They are nearly identical.  So the difference is clearly in VQ.  Next step, try a direct comparison of quantized mantissas using VQ vs. linear quantization and consider that in the AHT mode decision.

    edit2: dct+VQ is nearly always worse than linear quantization...  I also tried turning AHT off for a channel if the quantization difference was over a certain threshold, but as the threshold approached zero, the quality approached that with AHT turned off.  I don't know what to do at this point... *sigh*

    note: analyzation of a commercial E-AC-3 sample using AHT shows that AHT is always turned on when the exponent strategy allows it.

    edit3: It turns out that the majority of the quality difference was in the 6-point DCT.  If I turn it off in both the encoder and decoder (but leave the quantization the same) the quality is much better.  I hope it's a bug or too much inaccuracy (it's 25-bit fixed-point) in my implementation...  If not then I'm at another dead-end.

    edit4: I'm giving up on AHT for now.  The DCT is definitely correct and is very certainly causing the quality decrease.  If I can get my hands on a source + encoded E-AC-3 file from a commercial encoder that uses AHT then I will revisit this.  Until then, I have nothing to analyze to tell me how using AHT can possibly produce better quality.

    Friday, 17 June

    Well, I finally got a working E-AC-3 encoder committed to Libav.  The bitstream format does save a few bits here and there, but the overall quality difference is minimal.  However, it will be the starting point for adding more E-AC-3 features that will improve quality.

    The first feature I completed was support for higher bit rates.  This is done in E-AC-3 by using fewer blocks per frame.  A normal AC-3 frame has 6 blocks of 256 samples each, but E-AC-3 can reduce that to 1, 2, or 3 blocks.  This way a small range can be used for the per-frame bit rate, but it still allow for increasing the per-second bit rate.  For example, 5.1-channel E-AC-3 content on HD-DVDs was typically encoded at 1536 kbps using 1 block per frame.

    Currently I am working on implementing AHT (adaptive hybrid transform).  The AHT process uses a 6-point DCT on each coefficient across the 6 blocks in the frame.  It basically uses the normal AC-3 bit allocation process to determine quantization of each DCT-processed "pre-mantissa" but it uses a finer resolution for quantization and different quantization methods.  I have the 6-point DCT working and one of the two quantization methods.  Now I just need to finish the other quantization method and implement mantissa bit counting and bitstream output.


    FeedRSSLast fetched
    Ambient Language XML 2017-10-18 00:30
    Kostya's Boring Codec World XML 2017-10-18 00:30
    libav – alpaastero XML 2017-10-18 00:30
    Libav – Federico Tomassetti – Consultant Software Engineer XML 2017-10-18 00:30
    libav – Luca Barbato XML 2017-10-18 00:30
    Multimedia on Flameeyes's Weblog XML 2017-10-18 00:30
    Project Symphony XML 2017-10-18 00:30
    Sasshka's XML 2017-10-18 00:30
    Scarabeus' blag XML 2017-10-18 00:30

    Planet Feed

    rss 2.0 | opml


    If you want your blog to be added, send an email to Luca Barbato.

    Planet service provided by Luminem.