Project 1
Fundamental frequency of vowels
Due date: Sunday, March 2
Use the corresponding invite link in this google doc (accessible with your EPFL account) to accept the project, and either join an existing team or create a new one.
You are required to hand in a PDF version of your report report.pdf (generated from the quarto file report.qmd, if applicable) and the quarto file itself. The report.qmd should contain all the code necessary to reproduce your results: you should not show the actual code in the PDF report, unless you want to point out something specific.
Your README.md should contain instructions on reproducing the PDF report from the quarto file. This can be useful if you have issues with the automatic generation of the PDF report right before the deadline.
An aternative to quarto (or Rmarkdown), is to use LaTeX to produce the report. In that case, you will also need to submit the source code where chunks should be well commented and references to the figures in the report should be clear.
Checklist:
Data: Acoustic analysis of vowels
- Data on acoustic measurements of vowels produced by American English speakers
- data from Hillebrand et al. (1995)
- measurements on a group of children, men, and women were recorded
- the fundamental frequency \(f_0\) of a vowel’s sound wave was measured (in Hz)
- Only binned data are available (and the grid is not equidistant)
- available data: the number of recorded vowels (variable
count) and the fraction (variablepercentage) of vowels’ fundamental frequency belonging to each bin (given bystart_pointandend_point)
- available data: the number of recorded vowels (variable
The Goal
Simulate the vowels’ frequency \(f_0\) from a distribution, which is as close as possible to the observed data, in order to study the different characteristics of these sounds in the time domain
- i.e., the goal is to do Monte Carlo: how to simulate vowels’ frequencies that are compatible with the data?
Expert knowledge: a mixture of two log-normal distributions is a good model for the lowest frequency \(f_0\)
Tasks for You
- Is the assumption viable, i.e., is bi-log-normal distribution a reasonable model for the data?
- simple exploration of the data
- Fit the bi-log-normal distribution in order to be able to simulate the data easily using
- jittering and EM algorithm OR direct optimization (e.g., local search starting from the jittered EM result), AND
- a Bayesian approach
- Test whether the fundamental frequencies come from a bi-log-normal distribution
- parametric bootstrap and goodness of fit
MATH-517 Content
- Week 1: Introduction & Software & Data Considerations
- Week 2: Graphics & Visualization
- Week 3: Kernel Density Estimation
- Week 4: Non-parametric Regression
- Week 5: Cross-validation
- Week 6: EM Algorithm
- Week 7: EM Algorithm
- Week 8: Monte Carlo
- Week 9: Bootstrap
- Week 10: Bootstrap
- Week 11: Bayesian Computations
- Week 12: Bayesian Computations
- Week 13: Decision Trees
- Week 14: \(\emptyset\)
- Weeks in bold are pertinent to Project 1
- Weeks 1-2 established the workflow needed for all the projects