Data dredging

10/20/2023

Note: In case you haven't see a data frame before, think of it like a spreadsheet where each row is an instance each data and each column is a vector of specific values. Store the p-value for this test in the variable p3.a and round your answer to two decimal places.

Use an appropriate hypothesis test and a significance level of a = 0.05. Part A) Use the entire dataset to determine whether Nefarian's layout is an improvement over the original layout. purchases read.csv ("purchases.csv") purchases = purchases names (purchases) = c("group", "num_purchases") head (purchases) A ame: 6x2 group num_purchases 1 a 36 2 a 42 a 41 a 40 5 a 36 6 a 42 He knows that the site has an average purchase rate of 0.8 and wants to see if his layout is an improvement. Nefarian wants to land a permanent position at the company after his internship is over, so he really wants to impress his supervisors with his new layout. This data is stored in the data frame purchases. The effectiveness of Nefarian's layout is measured by the number of customers that made a purchase. This test was then repeated on multiple days. To test his new layout, the company gathered four different groups of 50 customers and recorded how many of those ended up purchasing an item.

His project was the design and test a new website layout that would lead to more purchases. Nefarian just landed his first data science position as an intern at a new e-commerce company. In this problem, we will focus on the idea of using subsets of data to find a desired result. There's many methods for this, and they've got some cool names like p-hacking and data dredging. Usually it involves manipulating the data or the test in such a way to produce a desired result. Now that you've learned about hypothesis testing and p-values, you should also be aware that these methods can be used incorrectly.

0 Comments

Data dredging

Leave a Reply.

Author

Archives

Categories