You collect several thousand Drosophila melanogaster individuals from the UC Davis campus. You
take 1000 flies, establish a population in the laboratory and maintain it at a size of 1000 each generation. You then establish from the remaining flies a series of replicated populations of size 10, 100, 200, and 500 and maintain each at the starting size (10, 100, 200, 500) for several generations. After some time you sequence each lab population.
a. If one plotted for each lab population, the frequency of each allele vs. the true frequencies in the UCD population (assuming you knew them) for all variable sites in the genome, how would the correlations differ across lab populations?
b. Which lab populations do you think would provide the best estimate of the true UCD frequencies? Why?
c. Now imagine that one carried out the same type of correlation analysis of allele frequencies, but instead of comparing each population to the true UCD frequencies you compare the allele frequency of the replicated populations to each other (e.g., the populations of size 10 are compared to one another, the populations of size 100 are compared to one another, etc.). How would the pairwise correlations of frequency vary from one population size to another?
d. What two aspects of the sampling of flies in this entire experiment would lead to allele frequency deviations from the true UCD frequencies for sites free of natural selection?
e. You measure sequence divergence between each lab population and the sibling species, Drosophila simulans. How will the expected divergence vary across replicated populations of different size? Why?