July 24, 2020
In this remote podcast Curtis Judd and Samuele Lilliu talk about low cost/budget solutions to improve the sound and video quality of your guest in a remote video podcast. We debunk the idea that iPhones or smart phones can be used in professional scenarios by providing a comparison between an iPhone 11 Pro and a Canon C200. We present a list of argument to convince someone that mobile phones are not appropriate for professional video productions. Sound makes or breaks your videos! We compare a range of audio gear solutions starting from zero budget (your smart phone), to low cost (Zoom H1 and H4n) and mid-range (Sennheiser MKH-416 with Mix Pre-3). There are things you can do to unlock extra controls in your iPhone. For example you can install an app called FiLMiC pro. This allows you to control things like focus and white balance. However you are still restricted to the sensor and lens capabilities of your smart phone. We illustrate our mid-range professional audio and video studio setups and audio post-processing techniques.
Samuele Lilliu (SL). The thing I wanted to discuss is how do you convince someone that doesn't know anything about audio/video setups that a professional audio-video setup is a great idea well?
Curtis Judd (CJ). I think if you're doing any sort of presentation on a professional level you owe it to your audience to provide them with decent quality in terms of audio and visuals. It doesn't have to be through the roof. You don't have to spend a lot of money necessarily but you should do enough research so that you can provide them something that doesn't assault their ears or their eyes. Something that's pleasant to listen to pleasant to watch and as much as possible. So again I'm not necessarily a proponent to say you have to spend twenty thousand dollars to achieve. This my perspective. You probably should at least invest a little bit of effort. Even just a hundred dollar investment in a microphone can make a massive difference.
SL. You know once we were shooting an interview and the guy was saying “well I can shoot that with an iPhone” and I was telling him, “Yes you can, but there are things you cannot do with an iPhone”. If you shoot with an iPhone, well with the same sort of content, forget about content, let's say that the content is there, but if you shoot with an iPhone it's gonna look cheap, there are things you cannot do with an iPhone.
JC. There are things that you cannot do with an iPhone and I think there's an issue of credibility when you put together a presentation that that has a little bit of polish. I think that adds to credibility. I think that again as I mentioned before it just leads to a more pleasant experience for your audience. I think in particular audio quality can make a massive difference. I think many people don't recognize that consciously, but if you go and watch… if you just pull up a video on YouTube for example if it has very poor sound quality, even if the visuals are very good, there's something that in most of our minds says “eh this is kind of a homemade thing”.
SL. Yeah sounds cheap.
CJ. Sounds cheap. It sounds like they didn't put a lot of effort into it. There's the inverse, if you have very good sound quality, but the visuals are just so so, a lot of times you'll retain your audience longer, you'll be able to get your message across to your audience. This has been my experience.
SL. I was trying to explain this concept of the sound to a person I was working with recently. I told them “think about the difference between, what did I say, a Toyota Yaris in a Lamborghini or if you want to spend your holidays in a hostel or in St.Regis in the Maldives. So the car takes you to from point a to point b, you're still doing a holiday, but it's not the same thing, it's a different experience and then that guy said no we're not…
CJ. One thing too to consider too is that I think that's part of what scares a lot of people away. My contention is really you don't necessarily have to go to a Lamborghini level but there is a difference between the hush hush motel and a Marriott courtyard. You probably owe it to your audience to at least give them the Marriott courtyard.
SL. Talking about technical uh technical side from the video perspective there are things you cannot do with an iPhone or with a smartphone. For example the first thing that comes to my mind are the lenses. I know the new iPhones have different lens options but of course you cannot replace lenses, you cannot achieve depth of field, you cannot pull focus, and then what else… maybe the colour grading, colour grading is a big problem.
CJ. I think so yeah because I think a lot of them, those cameras in the phones, their dynamic range is usually fairly limited. That in combination with the fact that they're automatically exposing often leaves you with an image that is uncorrectable. You can't make it look nice. I mean I'm not even going to go so far as to say you can't make it look cinematic. But with the default camera apps on most phones you're going to get an image that looks very much like an inexpensive video image. Usually for example what will happen is the either the face will be underexposed or it will be greatly overexposed. Once it's overexposed there's nothing you can do in colour grading to fix that really, especially on you know the with the 8-bit images that they're providing. So there're just a lot of technical challenges.
With those types of things can you shoot, of course you can shoot with your phone, you can shoot with your laptop, you can shoot you know with all these different devices that we all have, and in fact I would say that it's better to shoot with those than to not shoot. But I would say at the very very least, you can do something to improve the audio quality of your program whatever it may be by just adding a simple microphone for $100 or less. So there's some USB microphones that are very inexpensive that can greatly improve the overall experience for your audience.
So yeah for the iPhones, you know there are there's people that do stunts, I would call them stunts, where they show how amazing an image they can get out of a phone. Yes if you're very skilled with your phone and you buy a third-party camera app you can you can probably achieve some pretty impressive things with a phone. But when it comes to producing content or some sort of program on a reliable basis especially if you're operating on your own it's very difficult to do that with a phone.
SL. I think the app you were mentioning is called FiLMiC Pro something that you can control the…
CJ. Yeah, you can do you, white balance, you can manually set white balance, you can do your focus, you can do the exposure, you can control a lot. Even then you're still constrained oftentimes by the dynamic range capability of the camera, the lightest, the brightest to the darkest areas within your image, that range is much more narrow on a mobile phone than it is on a camera with a much larger sensor.
SL. Yeah and you're restricted to 8-bit colour space which means that you don't have enough colours. Instead with a camera like the Canon C200 you would have 12-bit color space, if you're recording in raw. Even in the proxy still 8-bit, but it's a good one.
CJ. Yeah it is. It is good. And you're starting with the advantage too of a much larger sensor that has larger photo sights that can capture more light and so on and so forth. It becomes an issue of up against the limits of physics. Also it becomes a practical issue too, where operating a phone as a camera it just doesn't leave you a lot of options unless you really dig in.
Anyone could become an expert, they could download this FiLMiC Pro app, they could get to know that app, they could learn all the principles of it, all the nuances of the iPhone or whatever phone you're using, camera sensors so on and so forth. But the question is do you have time to do that? Do you have time to invest that much effort to learn that much? That's a question that everyone has to answer for themselves.
SL. We said dynamic range and then the other problem is the low light performance and then you get lots of compression artifacts in the phone, right? I mean if you look at the videos done with the phone with low light, not well exposed… in fact most of the comparisons are done with the well-lit situations, right?
CJ. Now, I wouldn't be quite as concerned for at least for talking head like what we're doing here if you're doing an interview. Compression artifacts are generally not as much of an issue if it's well lit. If it's not well lit then it becomes a problem. But there's not a lot moving in a frame when it's just a talking head or an interview. So I'm not as concerned about that, but it is a factor as well.
I think lighting is another big one too. One of the challenges I find is that with webcams that are built into laptops or computers or iPhones or whatever kind of phone, if you do light it in the way that you would typically light something for a TV news program, for cinema, for a documentary film, that's where the cameras just cannot handle that amount of contrast. They just get very confused. There're algorithms for estimating what the exposure should be and setting that exposure do all sorts of really odd things.
SL. Yeah like the one that I'm using now, that you're seeing now, that's rubbis…
CJ. That's the and the same with what you're seeing of me right now…
SL. No that's that's… that's good… that's a great what are you using. Do you have a Mac there? What do you have?
CJ. Yeah, it's an iMac.
SL. Now we see, uh well on YouTube and almost everywhere, people are moving to all these… instead of doing proper interviews they tend to do the Skype calls or Zoom calls, which is something that I don't really like, but I mean that's because of the pandemic. Even if the pandemic is over, kind of over somewhere, it's over I would say almost over. But people are still scared, so you need to do these Zoom calls or Skype calls. Basically you're going to be facing three situations, three scenarios. One is when your guest doesn't know anything about audio video setup. Then you have the situation like maybe someone has an average idea of audio-video set up. And then you have the case of the experienced people like the ones that you normally interview for example. What suggestions can you give to someone that doesn't know anything, so that they can deliver to me or to you something decent?
CJ. A couple of things. First of all on the visual front do something to help your lighting situation. Those webcams require a lot of light and they generally prefer if the entire room is evenly lit. So a lot of times if you're doing a shot, if you're doing an interview or discussion, during the day, what I recommend is if you can orient your laptop or your computer with its back to a window, preferably a large window with sunlight coming in, and probably not direct sunlight. But indirect sunlight would be best, just because you then get a very even nice soft light that comes in and fills your room, lights you up. That's probably the best thing you can do for a webcam. That's really what they're made for and that's what their exposure algorithms are optimized for. So that's probably the best way to do it do not put a window behind you in as much as possible. That will just confuse the camera and it will end up looking very bad.
Then the next thing I would do is if you you're not in a position where you can spend any money or invest in a microphone, I would just at least use your earbuds and this solves actually two problems. Number one, rather than having the remote participant playing back through speakers to you and then echoing back into your microphone and making a very ugly, unpleasant experience from an audio point of view…
CJ. Yeah sometimes, Skype doesn't cancel that, sometimes it does, other times it doesn't…
SL. Exactly right.
CJ. So that's one problem that using earbuds solves. The second problem is it gets the microphone much closer to the sound source, which is you. Even though it's not an amazing microphone it will sound a whole lot better than a microphone that is even just an arm’s reach away from you in a webcam or on your computer. So these can solve those two problems very quickly.
Aside from that there are some USB microphones you can get, which will you know pretty drastically improve the sound of your audio, especially if you're able to position it appropriately. So for example you and I here today we're both using microphones that are within probably 5-6 inches of our mouths. So that's another one of the kind of cardinal rules of audio engineering. You get the microphone relatively close to the sound source. That makes a huge difference. You will hear more of the kind of a richer rendition of the sound source, my voice or your voice or whomever we're talking about. You'll get less of the sound that reflects off of the walls or the floor or the ceiling of the room that you're in and get back into the microphone. So it just really improves things. So microphone placement is a very big deal.
USB microphones you can buy a very good one something like a Rode NT-USB mini. That costs about a $100. If you put it in a position like this, it sounds like a professional microphone and it's very simple to use. So even with very simple solutions like that you can make a big difference.
Again even with external microphones like this make sure you're using headphones again to prevent that sound from playing out of your speakers and getting back into the microphone. It just makes it much more pleasant for whoever's on the other end of the of the meeting or the recording.
SL. So overall it's a better idea to just do the Skype call from a mobile phone, rather than using a computer, essentially, because sometimes you get funny noise from the computer like some buzz, because the preamps are probably bad in the sound card, often they are, it depends, at least in my computer they are rubbish…
CJ. You bring up another good point too, though another thing that can happen with computers, depending on your computer, is especially laptops have fans in them and if the if the computer is working hard it can have a tendency to kick that fan on and that will be another noise source in your room that gets picked up by the microphone, which you use. Best to avoid if you can.
SL. Also electrical noise like, but what's the name of it , ground loops, buzz.
As I mentioned before, Skype calls are limited to I think 1.2Mbps for the video and 30kbps for voice calls. I see you very well, you look good, your sound is good, but sometimes it doesn't really sound good. I could see a difference in the podcast that you did, where you were talking to the Garcia brothers. You were very clear. So you were streaming the thing directly into YouTube using StreamYard and then you were getting their feed from … Skype, right?
CJ. No actually they so the way StreamYard works… it’s a web-based platform, I connect to it and they connected to it. The problem we ran into with them is that they hadn't set their sound up appropriately on their side, so we had originally… they, on their side, they had set up three microphones one for each of the guests. What had happened was what we learned the hard way that StreamYard will only recognize the first audio source on an audio interface, regardless of how many channels it can support. So what ended up happening there is we had to default to one microphone. Then on top of that our engineer hadn't set the input level or the gain level loud enough. So I was talking at the same level that we're talking at here, but their gain or input level is set much lower and so it was very difficult for the audience to hear. We'll talk about this a little bit later when we talk about mastering, but or mixing. But I was very loud and they were very faint and so it made for a very unpleasant experience for the audience. So what we did is in real time during that live stream, we asked the engineer on the other side to increase the gain and eventually we got it to a point, where it was much more even much more pleasant to listen to.
SL. But still it looks like the bandwidth that you have when you upload to into YouTube is much larger than the bandwidth that you were getting from their feed.
CJ. Part of it was their camera framing you know they were they were much farther from the camera and they were they were framed wider. We were trying to fit three people into a frame. Here it's just one person into a frame, that's one factor, but that's probably a minor factor.
Lighting makes a huge difference and then I think bandwidth makes a big difference as well, so I don't know what the network speed from their location was like. But in my particular case here I have a 100Mbps fibre to the home. So that's usually plenty of bandwidth.
Normally on YouTube the optimal bandwidth from a YouTube perspective or the most that it can support really is somewhere if I understand correctly for HD 1920x1080 resolution is about 6Mbps. So it's basically three times what Skype is. Unless you're moving around a whole lot and stressing the codec, generally that's going to be enough to get a much clearer and cleaner look.
SL. Is there any solution in terms of broadcasting solutions so that you can use a larger bandwidth?
CJ. There are. Many of them actually have just popped up over the last few months during the pandemic and perhaps they were in the works prior to that. I don't know, but one of them is called fm. Riverside.fm is mainly focused on recording podcasts, but you can also live stream if you choose to. The thing that's unique about that is that it supports a much higher bandwidth image so if you have more bandwidth with your internet connection you can take up more of that bandwidth for your image and for your audio. It uses a much higher codec or much higher quality codec for the audio. If you are recording instead of streaming, it will actually record each of the video feeds. So me and my remote guest, it will record both of us independently to separate files and it will record our audio separately as well. So what that means is in if you decided to come back later and edit that in post, you would have these essentially isolated video and audio tracks that you could then edit very carefully like you normally would.
SL. I was thinking okay maybe there is someone that wants to… so I'm the host there is a guest, we want to record a podcast or some conversation and then I want to send this person some equipment, but let's say that this person lives in the USA or New Zealand, as I did recently, what would you suggest? What shall I send this person? Maybe I buy that on Amazon, something that is below £100 or $100, let's say, what would you send this person?
CJ. Most likely I would send them a USB microphone just like we talked about before. The Rode is a great choice. I think it's a very simple device, it's easy to use. As long as you get it positioned correctly and you could talk to the person through positioning it correctly. Once you got on for the sound check, right before recording. So you could talk them through it and it's pretty straightforward. That's going to make the biggest difference in terms of sound quality.
What I also note is something that you've done very well too, by my own observation, is that prior to this session you sent a document that had recommendations on how to set up and so. In that document you had some very specific things about what I could do in terms of lighting. Fortunately for me it wasn't a stretch because I was already pretty well set up for it. But you even had some specific recommendations in terms of the angle you wanted the light coming from and so on and so forth and that actually was really helpful. I think having something like that for someone… you could simplify it a bit for someone who doesn't have any AV experience, but still end up with a pretty high quality image.
SL. I had different levels, the basic, the, average, the advanced…
CJ. Did I get the intermediate or did I get the advanced?
SL. I mean you're pro… so… yeah, the other thing I want to talk about is something about me. So I started with these productions about four years ago, doing some promotional videos actually for myself and my projects and then I decided to expand on that. I come from science and so I just wanted to know a little bit about you. You also come from science right?
CJ. Yeah, so my undergraduate degree is in psychology. Then I started a doctoral program in clinical psychology and at that point I realized about two years into the program that it was it was not a good fit for me. It was not something I wanted to pursue long term as a career like I had originally thought. However what I did pick up in that were just some fascinating courses that I got to take and working with an advisor on the master's thesis that was really… there were a lot of things that I really learned, I think largely about myself. I think one of the things I learned was that statistical analysis and modelling was very fascinating to me. That was kind of the route I ended up getting more interested in than clinical work.
Then on the side I was also a photographer and my brother's a musician so I was always involved in visual imaging and recording sound as well. So what ended up happening is after that, as far as career, I ended up going back and getting an information systems degree, a master's degree after I bailed out of the doctoral program. Then ended up working for tech companies after that.
So that's what I do. Even to today, I'm still a product manager at a software company and then I also have us my side business, where I teach people how to do lighting and sound for video.
SL. You also have a course online on your website (https://school.learnlightandsound.com/), what's that about? Can you give me a brief summary?
CJ. Most of those are focused on how to produce better sound for video. So I have courses on how to use specific recorders, how to do overall just a more general kind of sound for film, and video course. I have one on post processing. So after you're done recording, there's a whole set of things you can do in post-production to optimize the sound. Then I also have, again, as I mentioned, a number of courses on how to use specific audio recorders or mixers. Those have been a lot of fun to put together and hopefully they've helped a lot of people too.
SL. You're selling these courses from your website, right? You don't use platforms like Udemy … is there any reason for that?
CJ. Yes, first of all Udemy is having a rough time, I think because I think Udemy takes a very large portion of the proceeds of the sales. What I'm using is a different platform called teachable. What teachable does is that at the volume that I'm selling it makes more sense from the standpoint that I get to keep a lot more of the earnings. I don't know as much about Udemy, because this is research I did years ago before I chose a platform. But I have a lot more control over the audience in terms of how I can contact them and stay in touch with them. Because one of the things that to me was very important about online courses is that there's a massive problem with online courses and that problem is that if your students get stuck, there's generally not a way to help them very well.
What I did as part of the course offering is I have weekly live streams, where we can do question and answer. You know that gave me enough control on the teachable side that I could do something like that and offer that additional value to my audience, which was, you know, if you get stuck and you have a question, well you can always email me but we also have these question and answer sessions once a week to help get you unstuck and moving forward.
SL. I want to move to the sort of setups that we have here. I've got an Aputure LS Light Storm Video LED Then I've got a Canon C200 there, and a Canon C200 behind me. Now the cool thing of this C200 is that it's mounted on a Jib, on a portable Jib. It's an Edelkrone JibOne with the HeadPlus (with Laser and Focus modules). I think it does a great job. I also have a Sound Devices MixPre-3 here and the Sennheiser MKH-416. And two wireless Sennheiser AV ME2. What do you have there?
CJ. Today we're being recorded also by a Canon C200, is the camera that we're looking at here. I have an interesting… my setting’s a little bit different. In terms of lighting I have a similar I have a panel led light this is a D&O lighting 180w. The reason I use that is twofold. Number one it has a soft box on it, so it softens up the light a little bit, which is good for when you're shooting talking head. Because it it's a little bit more flattering on a person's face. Then secondly, the reason I like this light… and I think it's similar with your light as well… is that it doesn't have a fan in it, so it's not producing noise while we're recording, which is really helpful.
From here, and actually this is where things get a little more sophisticated, or perhaps complex, I'm not sure. But let me explain to you. I have right here an Earthworks SR314. It's a condenser microphone that is routed into an audio interface that's connected to my computer. It's a Universal Audio Apollo x6. This interface actually does some interesting things in terms of processing the audio. It's actually compressing the audio a little bit and bringing the loudness up a little bit and making it more consistent. But it's also applying a little bit of equalization. Then that sends the audio to my computer, which is what you're hearing on Skype, because we're talking here on Skype, but at the same time it's also sending the audio directly to my camera and recording this audio that's already been processed and ready to go into my camera. So when I give you, Sam, the video files you will actually already have the embedded audio of what we're recording here today.
So that's the setup I'm using here right now.
SL. Does it do any adaptive denoising?
CJ. It does not do any denoising. Fortunately in my room here I think we have enough control over things that we don't need to do a lot of denoising. But you can always do a little bit in post if you're concerned about anything you hear.
SL. How about the computer there? Do you have any funs there? Is it loud? Because my computer there is very loud, so I had to put it behind the curtains.
CJ. Yeah good question. This is actually an iMac pro and the fan only ever comes on when I'm exporting a big video file, but otherwise it's very quiet.
SL. I saw your video about the Mix Minus with the Sound Devices Mix Pre-6 and -3. I think it was maybe 6. The Mix Pre-3 is great. It has great preamps, right?
CJ. Unprecedented. In fact the quality of the pre-amplifiers and the audio gear that we're seeing today at that kind of a price… so the MixPre-3 for example is a $650 field recorder and it has pre-amplifiers that normally in the past, probably five years ago or ten years ago, you would have to purchase a $3000-$5000 mixer recorder to get that same quality.
SL. I remember that for one of the first videos that we did, kind of professional videos, we hired a person (a sound recordist) who came with a Tascam, I think, and the two Rode microphones. I didn't like that. It was too noisy. Then I got the MixPre-3 with the MHK-416 and that changed my life, my experience.
CJ. Yes and it probably made it a lot easier for you right. You don't have to do as much post processing. It's basically very close to where you need it right out of the recorder.
Yeah, so “sound”. You said that “sound makes or breaks your video”, can you tell me more about that?
CJ. Yeah, well I think so. George Lucas is famous for saying that sound is 50% of the experience in relation to film. That's what he was talking about. I don't know if I was joking… perhaps it's a joke… perhaps it's not… but I actually think that sound, despite what people might actually think at a conscious level, actually we rely on our ability to hear much more than we really realize. In fact sometimes I say that sound is actually 60% of the experience. Actually if you look at auditory capabilities from an evolutionary standpoint, sound really helps us in a lot more ways than we realize. It helps us to make decisions about where things are in relation to ourselves. So as we're walking down a street, for example, our brain is making these split-second decisions about “okay this sound that I'm hearing is coming to this ear you know just a few milliseconds sooner than it's getting to this ear”. That tells me something about where that sound source is located in relation to where I am. These are judgments that we can make as human beings.
I think a lot of us don't realize that at a conscious level how important that is and how much a part of the human experience that is for us. So I think the kind of the manifestation we see of that is if you watch a video, for example online somewhere, and the audio quality is poor for us it feels like an amateurish job (amateur I'm sure is a bad word to use because amateur can actually mean a very good thing), but it's just like a very poor quality job. So the credibility of the piece or if it's a fiction piece, the ability for that piece to suspend my disbelief, is diminished greatly. So it's just not as an effect of an experience for most people.
So that's my perspective. That's why sound is so important, is that humans whether we realize it consciously or not we rely on our sound or our ability to hear so much more than we realize.
So providing a quality sonic experience for your audience, I think, is critical. I mean not just from the standpoint of fiction pieces or narrative film, I mean that from sales presentations, from the perspective of business meetings, you need to respect your audience enough to not assault their ears, you need to provide them a decent experience as far as sound is concerned. So wear your headphones, use a decent mic, get it close.
SL. I hope this will convince my collaborators. So you have a great microphone there, I really like that, I really like the sound. How would you compare that microphone with this Sennheiser MKH-416?
CJ. The Sennheiser MKH-416 was actually made for a very specific purpose and it's been reused for different purposes. For example that was originally designed in the 1970s. It's gone through a few different iterations. But really it's what is called a shotgun microphone and the reason for that if you look up close on it you'll see it's a very long microphone first of all. It has this tube with slits along the side. As a physicist you could probably explain how that works better than I can. But the net effect of that particular microphone design is very directional. So it is very good at picking up the sound right in front of it, but it's very also very good at rejecting a lot of any other sound sources that come off to the sides. So that's one thing that makes that microphone very special. In addition to that it was just well engineered and so it just sounds really good. It has a very rich low end to it and it also is very articulate in the higher frequencies. So it sounds overall very nice in most circumstances.
The difference is this microphone (Earthworks SR314) was actually originally designed as a handheld performance microphone for music. This one's actually made by an American company, that's made by a German company. This is made by an American company called Earthworks. The engineer (David E. Blackmer) that started Earthworks he also started the company called DBX, which makes audio gear as well. He has a long history as a renowned audio engineer. What's kind of special about this microphone to me is that its overall frequency response, how it responds to different frequencies is generally pleasing on most voices. It works really nicely with almost every voice that I've had.
SL. You did a video about that, you were showing it has a cardiod polar pattern, you were rotating it and showing it…
CJ. Yeah so this one is a little different because it's actually more forgiving so if I move off to the side here it's going to pick me up. If you move off to the side there you're going to fall off pretty quickly in terms of the sound picking. This one's a little bit more forgiving but it is still very good at rejecting sound from the back of the microphone. That's important in this case, for example, because I have a computer screen right here, I have a desk just below it, so what I do is I intentionally aim the microphone not only so that it's picking up the sound of my voice, but also that the part of the microphone that picks up the least amount of sound is aimed at these flat surfaces, which will reflect the sound of my voice. So it's going to be better at rejecting any of those kind of reflected sounds. So overall it results in a better sounding experience.
SL. And of course microphones kind of cannot fix your speech defects or my accent, for example, they're not going to correct my Italian accent [Laughter]. So key components to achieve sound quality, I think it's microphone, hardware, room, and post processing. Would you add anything else?
CJ. I agree with that and I would just kind of clarify a few things in regards to that. I think room is a bigger thing than most people realize. The room in which you make a sound recording probably makes a bigger impact on the overall quality of the sound than anything else.
Now the other thing that makes a big difference is how close the microphone is to the sound source. So I would say that's even probably more important than the particular microphone that you're using is how close it is to the sound of your voice.
Then from there I think there is there is an art to choosing a microphone, which complements your voice. So for example on my voice I have a lot of what's called sibilance. So when I say the letter “s” like “seashells she sells seashells by the seashore”, that “s” has a sizzling sort of sound to it, kind of start to distort. So finding a microphone that suits your particular voice is one thing. Some people have very deep voices without a lot of articulation, in which case they would probably benefit very much from a microphone like what you have, which incidentally I think complements your voice very nicely. You have what I would call a kind of a “darker voice”, where you have a bit more… well and it's a good thing… I think actually it's a very nice sound a lot of voice over artists have a voice like yours. It has a little bit more kind of bass and not quite as much high frequency content to it. So it has this really warm kind of inviting sound to it.
The nice thing about the Sennheiser MKH-416, which you're using is that it is very sensitive in those higher frequencies. So while your voice doesn't produce a lot of those frequencies, it's very good at picking up what you have produced with your voice. It's very sensitive up there and so it complements your voice very nicely.
If you're just getting started in sound recording or you're not that invested, you're just going to do an interview with Sam every once in a while, then you're probably not up to spending several hundred dollars trying out different microphones to find the perfect microphone for your voice. But something again like that USB microphone we mentioned earlier, that's going to work well for almost every voice.
Again getting a decent room or treating your room in some fashion and then getting that microphone close those are the two biggest things I would say.
SL. Yeah, so in terms of room, as you can see here I've got some relatively cheap curtains, and I was actually surprised by the fact that I could find the very cheap curtains that don't look bad, they look reasonably nice, and I used some curtain rails and that's it. Now the thing is that these ones are not particularly thick, but because of the shape they get, they tend to absorb the sound in a better way than rather than having simply walls. What would you suggest to improve these sort of setups? Those things that you have behind you, can you just can you tell me what those things are?
CJ. Yeah, so right behind me here, these are these are called broadband traps they're made out of a fiberglass type of material fairly dense fiberglass material. I have some in the corners as well as along the walls. The main idea with these is to absorb sounds, to keep the sound waves from bouncing around the room, bouncing off of walls, bouncing off of floors and ceilings. I have some up on the ceiling as well. Now, you know, if you're going to do professional audio work, then yes I would recommend these.
SL. Did you run a simulation of your room? Did you do a software simulation?
CJ. I actually did some testing. I had a test microphone and so we took some test measurements and then I also worked with an acoustician to help design the trapping plan for the room here. In fact we ended up with two, you can't see them here, but there's two very large sets of traps in the corner up at the front of the room here. Then we have several along the wall back here. We have two more up on the ceiling, which we call clouds. Then we have some on the back wall as well. The ones on the back wall are much thicker. So the idea with the trap, as I mentioned, is a lot of times people will call them bass traps, that's kind of a vernacular term, that isn't super accurate. I guess what they're emphasizing there is that the traps that are made specifically with the technical specifications to be able to absorb base frequencies is important, because a lot of times what people will do is they'll put up a bunch of foam on the walls like acoustic foam. The trick with acoustic foam is it's very good at eliminating or absorbing high frequency sound, but it is not good at absorbing low frequency sound. So what ends up happening is, especially in smaller rooms, like you and I are working in here, one of the biggest problems tends to be bass and the build-up of bass and the kind of the resulting comb filtering that happens. When you have all these different frequencies they start to interact with each other and you start to get comb filtering effects. It can even start to sound like a warbling effect if it's bad enough. So you get these and you get flutters and you get these other all these other things.
I think if someone needs to do something on a budget what you can do is just use any sort of blanket or curtains. I think your solution is very good because it has it serves a visual purpose and an audio purpose so it's a good fit that way.
I think when you're recording there are different considerations for recording versus setting up a mixing room, where you're going to be listening back to pre-recorded material. So for recording the biggest thing is to get those early reflections or those first reflections. When the sound of your voice goes out, what you want to do is be able to trap it before it bounces off of a hard surface and comes back into the microphone. So any sort of blankets that you can hang up in the room will help, especially in front of the flat walls. If you have a hard floor, putting a blanket down on the floor for the course of the recording can help. A lot of people come to me and say “hey, I want to make my room sound better, but I can't permanently attach anything to the walls”, that's where I would say “Okay a sound blanket or any sort of blanket you can put on the floor on a hard floor or hang up on the in the room off camera”. Those can only make a you know make things better and help manage that reflection.
SL. So you said the position of those bass traps… so I was wondering if there are software that tell you where to put those things?
CJ. It was an acoustics engineer that helped me with that. So the acoustics engineer looked at the dimensions of the room and I did some measurements of the room with a with the test microphone. What you do is essentially is you send a frequency sweep out of your speakers. So it goes up through the whole frequency range and the measurement microphone measures the response of the room, when that sound is put into it. Based on that the acoustician then can make some educated informed decisions about where it may be best to put some of these traps. So there's constraints. Obviously this room was not built for acoustics. It was a spare bedroom in my home and so I've got a window right behind me here. I've got another window and these are curtains here that are blocking a window behind me. So you have to work around those constraints, but there are things you can do and I think we would probably I think we'd probably send people to you with your physics background to help them sort through.
SL. Yeah, I don't know anything about acoustics, I should probably research more. So when you go on location, when you're shooting something let's say in some company office, what do you do? Do you carry anything to help with the sound processing?
CJ. Absolutely, yes. Sound blankets are definitely my friend. There are companies that make blankets that are specifically engineered to help eliminate some of the reflected sound. You can use any sort of blanket. A lot of times people will use moving blankets. These are the blankets you wrap around furniture when you're moving. The problem with those is it depends on them. You can get some that will work effectively for acoustic treatment as well. Generally they're made to be very lightweight. So that's not going to do a whole lot in terms of absorbing sound. The acoustic blankets are much heavier, they're made out of cotton, they actually have been through laboratory testing and they've identified the noise reduction coefficient. So I'll carry those around and usually put them up on a c-stand or a sentry stand, which it's a stand that's made for lighting. I can put that up in the room off camera and that helps reduce some of that reflected sound.
SL. I don't remember if it was you, it was probably you… you probably did the video, where you showed these blankets and you said that they were smelling bad… that you had to wash it…
CJ. Yes, it was a serious problem. It seems like it had been treated with some heavy-duty chemicals of some sort, but the company contacted me after I made that video and they said “Hey we had a problem with some of our earlier shipments”. In the manufacturing [process], which was done in China, they had applied some sort of chemical. They said “We apologize for that here are some new blankets”. When they sent the new blankets out they didn't have that smell to them. So evidently they had changed their manufacturing process.
SL. Now coming to audio post processing, I just briefly tell you what I normally do if I'm treating sound. I normally do that in Adobe Audition. I remove noise with the noise processing, then I normalize to -3dB, then I basically go point by point and remove unwanted noises. Then I usually do a hard limit to -6dB and then I probably apply a noise gate just to remove breaks between words or sentences. What do you do normally do? What's your typical workflow?
CJ. I'm glad you asked. I actually just documented this earlier today for another question that somebody asked. So typically I try to do as little processing as possible. If I have done the recording myself oftentimes that's possible. But if I'm on location that's where things get a little more challenging you don't always have control over the location. That's where you're going to pick up more noise.
Typically what I do is I apply a high pass filter. What a high pass filter does is that it removes some of the lowest frequency energy. So I just apply that at about 50 Hz. Generally voices, dialogue doesn't sit there. So it's just getting rid of any sort of rumble such as cars, air conditioning, airplanes, and things of that nature.
If I do need to do noise reduction I will then do it at that point. I generally use Izotope RX, which is an application that is used a lot in the film industry for cleaning audio and it has a very good noise reduction algorithm, which does a very good job. My goal there typically is not to eliminate the noise but to reduce it some, just so that it's not distracting. What happens [is that] if you press it too hard, even the best algorithms today will start to affect the quality of the dialogue sound. At some point you'll start to sound like you're underwater. The algorithm can't differentiate between all noise, because a lot of noise is broadband, it's across the entire audio spectrum. So your voice has to live in there somewhere so anyway so just applying that very lightly is the best bet there.
A lot of men's voices in particular, and you'll see this with some instruments as well, they create what are called asymmetric waveforms. So if you think of an audio wave that's going up and down, you have a centre line. The way it works is that the audio waveform goes up, this determines how loud the sound is, and then how quickly or slowly it goes up and down determines its frequency or whether it's a low noise or a high pitched noise. What happens in some cases is that a lot of times with men's voices in particular I find the waveform will be much taller, larger in amplitude on one side than it will be on the other. What that means in practical terms is that once you for example normalize, you're not getting all of the headroom available because one side is so much larger than the other. So I also have a tool within Izotope RX, which does what's called adaptive phase rotation, [which] will even up that asymmetry in the waveform. Usually I'll be able to get back a fair bit of headroom, so I can make the audio louder without affecting the timbre of the sound too much. So that's another thing I do.
I will normalize to a loudness standard or a perceived loudness standard as opposed to peak normalize. Peak normalization all that does is it looks at all the samples in the audio at any given point in time, it looks for the highest amplitude sample and it says “okay I'm going to move that up however much you said to -3dB and I'm going to also move all of the other samples up that same amount”. The challenge with normalizing that way is that it it's going to change from recording to recording. So if I normalize all, say I have a series of six different audio recordings and I normalize all of them to -3dB, perceptually they will sound different in terms of how loud they are. Some of them will sound louder than others. So more recently we've seen some measurement systems come out for what is called perceptual loudness. They're usually done in the in the form of “loudness units relative to full scale” (LUFS) or “loudness, K-weighted, relative to full scale” (LKFS). What these do is they measure the overall loudness of an audio file much more closely to how we as humans perceive loudness. If you have a waveform and it goes up say it goes up really loud it goes amplitude wise it goes to -3dB on a digital scale, if it only goes up once to that loudness level and we hear it played back, it doesn't sound actually that loud. It's only if it persists at that amplitude for a longer period of time that it starts to sound loud to us. So there's some very important distinctions there. So measuring and normalizing based on these loudness units full scale gets you more consistent results in terms of perceived loudness. So that's what I'll typically do is just for now I might just loudness normalize to -26 LUFS and that just gets me to a starting point, where I can start processing the audio.
At that point I'll often apply some corrective EQ. So for example everyone's voice is different, every microphone is different, and every room is different. So when you record a voice into a microphone in a room, sometimes you get these resonances and these resonances don't always sound good. Sometimes they sound great other times they don't sound good at all. So what equalization does is it allows you to find those resonances and pull them down a little bit. For example when you hear a recording that sounds very nasally, that's because there's a resonance of some sort in that nasal range and if you can find that using an equalizer and pull that down it will sound much more balanced and more pleasant to listen to. So that's typically what I'll do next. Usually with a lot of voices and depending on the room you're recording in it's not unusual to find at least two maybe three different resonances in someone's voice for that particular recording that you can pull down. That will then improve the sound of the audio quite nicely.
From there then I will start to prepare for what we would call mastering stage. What I'm looking to do there is to get a consistent loudness so when I publish a video one week and then I go back the next week and I publish another video I want my audience to be able to listen to the first one and set their volume level and then play the second one and not have to adjust their volume level again. It should be very consistent in terms of sound. That's really what mastering does to a large extent. It allows you to do that.
So in the mastering stage I'll often do some compression on the audio. Any of those what we call transients, where the waveform goes up and comes down very quickly, which we call it transient, those aren't necessarily helping, they don't necessarily sound loud because they happen so quickly and a lot of times I'll compress those down so I'll pull those down and what that allows me to do then is to take the overall audio file and increase its overall amplitude without clipping because as soon as the audio gets up to 0 dB, which is the maximum in a digital scale it clips and it starts to sound distorted. So we're trying to avoid that as well.
Once I've done that compression then I might come back and do another pass of equalization in this particular case though, when I come back to do the second round of equalization I'm not so much trying to fix problems as maybe sweeten the overall sound of the voice. For example if it's a men's voice and I want it to have a very intimate sound I might boost the bass just a tiny bit just to make it sound a little warmer and more intimate. If I have a voice that seems to be a little bit lacking in terms of articulation I might bump up some of the mid to high frequencies a little bit so that they sound a little clearer. So there's some things you can do there as well.
If there are any mouth clicks… you know a lot of times people will go “click click click” and that kind of stuff can become very annoying… so you can actually cut some of that out and not so much by actually cutting it. There are plugins you can use to de-click that noise, take that click away.
A lot of times too I will do a de-breath. So instead of a noise gate I'll use a de-breath plug-in rather. A noise gate is just trying to detect when you're talking, when you're not talking. When you're not talking it will basically reduce those levels and that can be fine. But what a de-breath plug-in does is it specifically looks for breath. So it's looking for specific frequencies that are much quieter than the talking parts. So it's a little bit more sophisticated in its detection and then it pulls those breaths down a little bit, just so they're not so annoying. I find generally that you don't want to remove the breaths because otherwise it starts to sound unnatural. But you don't want them to be super prominent either. So you just pull them down just a little bit.
That's oftentimes, in my experience, post-processing is more about subtlety. Subtle changes make the biggest difference. If you get too hard-core and you start pushing too hard on the plugins and changing the sound too dramatically, it starts to sound very unnatural and very unpleasant.
Then after I've done those steps then the final step is to loudness normalize. For example if I'm publishing to the web, somewhere on the web, normally what you're aiming for is a target of about -16 LUFS. That will ensure is that when someone is listening to your content on a plane or in a subway, train, taxi, car, which is a very poor listening environment typically. They're often listening with earbuds, which are not the best. They're listening on consumer grade devices like mobile phones, which don't have the best audio processing and amplification and so on and so forth. If you get the audio to a good consistent level and sounding good at -16 LUFS, it'll play back nicely in those environments as well. So that's usually the overview I guess of my post processing.
SL. -16 LUFS is super loud…
CJ. Well it's all relative. So the music services are normalizing to like -13 and -14 LUFS even louder. -16 is a good middle ground if you're publishing specifically for something that will be consumed on a mobile device or on the web somewhere with a laptop or on a phone, mostly on a phone these days. So those are very poor playback systems to be entirely honest and so you do have to push. There's always a trade-off. The louder you go the less dynamic range your audio will have and what I mean by that is if somebody whispers and then they shout you know that's kind of the dynamic range… that's an illustration of the width of you know you can capture the whispers and you can capture the shouts as well. If you do this loudness normalization you're getting rid of some of that dynamic range in the effort to make it louder. So there's a kind of a balanced, trade-off, there.
Now if I'm going to produce something that's going to be viewed in a theatre, I want to keep a lot more of that dynamic range because we're going to be working with a playback system that's much more capable, it's much better, and the whole listening environment is set up to be much better as well. That's why movies always sound so much better than typically what you're going to watch on your phone. So in that case I might just normalize to -24 LUFS as opposed to -16.
SL. When it comes to mastering with music, have you done anything about that? That's another art, I guess…
CJ. That's another art. Yeah, very difficult. Well, I mean I've mixed some films and for me the approach to mastering is very simple. It's more about loudness and consistency. In music mastering takes on an entirely different meaning or an additional meaning as well, where they're doing more… not only they are they optimizing the loudness of the music, but they're also… they're like a next level mixer, almost they're making sure that the different instruments are managed a little bit differently so that they get the different effect that they're looking for. It's a little bit ephemeral the description…
SL. So they try to match the frequencies of different instruments so that sound well when they are played together…
CJ. I think so. I think sometimes the mixing engineer will do some of that. Sometimes the mastering engineer will do some of that so it's a little bit of a shared responsibility depending on the situation but yeah that's the idea that's the idea.
SL. What setup are you using for the live stream at the moment?
CJ. I have a switcher, I have an overhead camera. A lot of times my live streams are more educational in nature. What I'm trying to do is demonstrate products and how to operate them. So I'll have my main camera, my Canon C200, and then I'll have the overhead camera that's looking down on my desk so I can just show the product and how to operate it.
Then I have a switcher, which basically both of those cameras are connected to the switcher and then I can just push a button to switch between the two different cameras. In addition to that the switcher enables me to have basically a program monitor so I can see which camera is currently active, which one's not active. It also has a multi-view, which allows me to see some additional information. I can actually show all of the available cameras. What's currently happening on each of the available cameras. I can do a more sophisticated type of switching. I don't do this typically because I am the one that's operating my live stream. It's just me. If somebody else were helping me it has a mode that would be better suited… you've seen TV shows or movies for example, where you see a director sitting in a back room… they have all the different cameras available to them they say “okay switch to camera six now” and so there's someone operating the switcher. They'll punch camera number six into the preview and then the director will say “now” and then they'll hit the cut button and that will move over to that camera. So it's capable of doing that, but I don't, because I'm operating it all. I just put it in live cut mode so it just cuts as soon as I choose the other camera.
But it also allows me to do some audio processing. If I wanted to apply some compression, a limiter, some EQ, I can do all of that in the switcher as well. You can potentially connect your microphone or your audio signal chain to the mixer or to the switcher as well. Generally what I do is I actually run the audio directly to my camera and then send that to the switcher so there are different ways you can do it but that's my general setup.
I'm using this room right here, [which is] where I do my live streams and a lot of my educational videos.
SL. The switcher is a Blackmagic Design? What do you use?
CJ. It is a Blackmagic ATEM Mini Pro
SL. Hard to get…
CJ. It's hard to get it during the pandemic, yes. It was released just the start of the pandemic. They were incredibly popular and if you put an order in you might get yours in six weeks type of thing.
SL. It's also hard to get the Blackmagic Pocket Cinema Camera 4k…
CJ. Indeed very popular…
SL. You got one there, right?
CJ. I do yes. I actually bought that shortly when after it came out, although I will say what a lot of people don't realize I think is that a Canon cinema camera like the C200, which you're using and I'm using is a very different camera than the pocket cinema cameras. In terms of workflow they don't have a lot of things that a lot of people would assume they would have. They don't for example have autofocus.
SL. The black magic doesn't have autofocus…
CJ. Yeah, exactly. The canon just has actually very good autofocus and so…
The Blackmagic has “one shot autofocus” I think…
CJ. Exactly… which basically means you can you can press a button it will auto focus on whatever you happen to have it aimed on at that moment in time. Then as soon as you start rolling if the person moves all bets are off, you have to you have to manually refocus. Find them again. So it's a different thing, but it's an amazing camera for its price in terms of what it's capable of doing with an operator that knows how to operate it. But it's one of those cameras that's not super easy to operate. I mean there's some effort involved.
SL. I've been using it several times with the gimbal and I would say that it matches it matches very well with the Canon C200 when you use a ColorChecker Video. Do you use these things?
CJ. Sometimes… I did make a video about them. It depends on what I'm trying to do. So if I'm going to have to match cameras and they're different or they're two cameras that I don't know how well they match yet, then yes I will definitely use it then. That's kind of the one of the most common use cases for something like that. If I'm just shooting something where I'm just using one camera I typically won't. I'll just colour grade. I won’t worry too much about it from that point.
SL. You mentioned the focus, did you have problems with the C200 autofocus?
CJ. Yeah you know it's funny… so if I'm looking directly into the camera, it doesn't seem to have a problem, it does very good at face detection autofocus so it stays trained on my face or whomever I'm shooting. If I turn off axis just a little bit like we're doing here in a typical interview situation, where the person is not looking directly into the camera but off to one side a little bit, it seems to have a really hard time with that. It's funny because if you turn 90 degrees to the camera like this, it seems to do better than if you're turned just off camera. Is that the same thing you've experienced?
SL. The thing I've experienced is that we were doing… it happened twice… we had the autofocus on with a Canon 70-200 L series and the camera was trying to chase the focus in the eyes and it was going back and forth until I noticed that and I turned it off. There is also a nice video done by Rubidium, CrimsonEngine… he spoke about this. You get this problem when you are with low F-stop …
CJ. So you close down the lens a little bit more it tends to do better…
SL. So yeah, just keep it in manual.
CJ. That's it interesting…
SL. Yeah unless I'm outside and I'm moving a lot then it doesn't really matter (use autofocus), but if there is a subject that is still, I just keep it manual.
Now there is also another thing a nice thing. These lenses that I'm using… Samyang… I guess it's called Rokinon in the USA… they are kind of entry-level cinema lenses… this thing behind me from Edelkrone has got a kind of automatic focus puller (Focus Module). So I can switch between different directions and the thing will focus on different objects. There is a laser meter (Laser Module) that calculates the right focus. So as the Jib moves now, you can see it's moving, it's going to focus on the screen where you are… so it's going to keep you in focus. So if you were here for real it would keep you in focus even though it's moving. I think this is this is great, but it's (the focus module) not a portable solution. I mean that doesn't work if it's not mounted on this system (the HeadPlus).
CJ. That's surprisingly quiet too…
SL. It's very quiet. The sliders now from these guys, Edelkrone, are very quiet. I think you mentioned time codes. I've been using… I think twice or three times… the Tentacle Sync, which injects the time code into the audio input. But the problem I add with that is that their software only works with Mac and I only work with Windows. So it was basically useless. So I just use this (a £10 slate). Then there is also the Adobe Premiere Pro auto align, the thing that aligns the shots based on the on the audio signal. Basically where do you see time code useful?
CJ. I think the place I've seen [timecodes] is when I'm working on a much more complex piece, usually a narrative film. So when there's a crew and there are multiple cameras and we come away from the day with 80 to 120 shots. That's when it becomes more useful and in most of the production teams I work on the post team is going to have access to a Mac. To be honest I think that with the Tentacle Sync… the biggest value of that package now is that… the time code generators are fine they're good, they're good hardware… but a huge part of the value of purchasing those is that it comes with the Tentacle Sync Studio software, which runs on Mac. That is a dream to work with. Because in that case at the end of the day I dropped my 120 video clips and my 120 audio clips in, I push a button… “boom”… they're all lined up. I push a button, export an XML file that goes into my video editing app and then I am ready to go. It's a great way to work when you're doing something like that.
Now if I'm working on a corporate piece where it is I walk away with five clips of video and five clips of audio, I just use the Final Cut Pro or the Premiere Pro or the Davinci Resolve auto align feature and that's fine. At that point I think the amount of time it takes to set up and work with a time code workflow is just too much overhead. It only makes sense if you're going to come home with a lot of clips.
SL. Okay I think we can close it here unless you want to add…
CJ. No… no… it's been a pleasure I'm so glad that you contacted me and that we were able to have our conversation. I apologize it didn't happen sooner but I'm really glad we got a chance to talk it's been a lot of fun.
SL. Thank you very much have a great day.
CJ. Okay you too. Take care.