I am seeking your feedback in relation to the following task associated with identifying specific patterns in longitudinal ratings data. In particular, I am working with a public Amazon dataset containing thousands of products and millions of corresponding review ratings. My initial explorations suggest that some of the products' sales start out as authentic ones, but then get substituted with fake ones. One of the ways to identify such products is to examine their ratings' distribution over time -- i.e., the ratings for the product start high in the beginning, but then gradually decline over time. Please consider some of the examples in the attached images below (I aggregated the data at the monthly level).
Currently, I am manually examining the products and their ratings to handpick the suspicious ones. However, given the vast volume of data, I am wondering if there is any feasible way to do so automatically. I would appreciate your feedback. Below I provide a sample of data that includes observations for the three examples below (tag = 1) and 6 randomly picked products (tag = 0).
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str10 asin float(month rating tag) "B0006HLBPU" 83 5 0 "B0006HLBPU" 84 5 0 "B0006HLBPU" 85 4.25 0 "B0006HLBPU" 86 5 0 "B0006HLBPU" 87 4.975 0 "B0006HLBPU" 88 3.727273 0 "B0006HLBPU" 89 3.4375 0 "B0006HLBPU" 90 3.9875 0 "B0006HLBPU" 91 4.4 0 "B0006HLBPU" 92 4.226316 0 "B0006HLBPU" 93 4.1041665 0 "B0006HLBPU" 94 4.110606 0 "B0006HLBPU" 95 4.123457 0 "B0006HLBPU" 96 3.9641025 0 "B0006HLBPU" 97 3.947826 0 "B0006HLBPU" 98 4.0333333 0 "B0006HLBPU" 99 3.9545455 0 "B0009U5NEY" 80 4.2222223 0 "B0009U5NEY" 81 4.3333335 0 "B0009U5NEY" 83 4.6666665 0 "B0009U5NEY" 84 4 0 "B0009U5NEY" 85 3.5 0 "B0009U5NEY" 86 4.6666665 0 "B0009U5NEY" 87 2.25 0 "B0009U5NEY" 88 3 0 "B0009U5NEY" 90 4.2 0 "B0009U5NEY" 91 4.5 0 "B0009U5NEY" 92 4.023148 0 "B0009U5NEY" 93 4.6136365 0 "B0009U5NEY" 94 4.293478 0 "B0009U5NEY" 95 3.598485 0 "B0009U5NEY" 96 3.7261906 0 "B0009U5NEY" 97 4.391667 0 "B0009U5NEY" 98 3.75 0 "B0009U5NEY" 99 4 0 "B000HLO8XM" 84 1 0 "B000HLO8XM" 88 1 0 "B000HLO8XM" 89 5 0 "B000HLO8XM" 90 5 0 "B000HLO8XM" 91 3 0 "B000HLO8XM" 92 4 0 "B000HLO8XM" 93 5 0 "B000HLO8XM" 94 3 0 "B000HLO8XM" 95 4.4814816 0 "B000HLO8XM" 96 4.381579 0 "B000HLO8XM" 97 4.111111 0 "B000HLO8XM" 98 3.9285715 0 "B000HLO8XM" 99 3.7 0 "B000YMQPWQ" 82 4.6666665 0 "B000YMQPWQ" 83 4 0 "B000YMQPWQ" 84 2.666667 0 "B000YMQPWQ" 85 4 0 "B000YMQPWQ" 86 5 0 "B000YMQPWQ" 87 4 0 "B000YMQPWQ" 88 2 0 "B000YMQPWQ" 89 2 0 "B000YMQPWQ" 90 4.571429 0 "B000YMQPWQ" 91 4.423077 0 "B000YMQPWQ" 92 3.9385965 0 "B000YMQPWQ" 93 3.892157 0 "B000YMQPWQ" 94 4.839966 0 "B000YMQPWQ" 95 4.484127 0 "B000YMQPWQ" 96 4.4166665 0 "B000YMQPWQ" 97 4.759259 0 "B000YMQPWQ" 98 5 0 "B000YMQPWQ" 99 4.625 0 "B0019HXG5O" 81 5 0 "B0019HXG5O" 83 5 0 "B0019HXG5O" 86 5 0 "B0019HXG5O" 88 4 0 "B0019HXG5O" 89 4.5 0 "B0019HXG5O" 90 4.0666666 0 "B0019HXG5O" 91 4.5454545 0 "B0019HXG5O" 92 4.1041665 0 "B0019HXG5O" 93 4.763158 0 "B0019HXG5O" 94 4.535714 0 "B0019HXG5O" 95 4.692 0 "B0019HXG5O" 96 4.2777777 0 "B0019HXG5O" 97 4.576923 0 "B0019HXG5O" 98 4.5921054 0 "B0019HXG5O" 99 3.9375 0 "B001D95AHU" 83 5 0 "B001D95AHU" 84 2.5 0 "B001D95AHU" 85 4 0 "B001D95AHU" 86 1 0 "B001D95AHU" 87 4.3333335 0 "B001D95AHU" 88 4.047619 0 "B001D95AHU" 89 3.857143 0 "B001D95AHU" 90 3.777778 0 "B001D95AHU" 91 4.777778 0 "B001D95AHU" 92 4.65 0 "B001D95AHU" 93 4.2222223 0 "B001D95AHU" 94 3.916667 0 "B001D95AHU" 95 3.9444444 0 "B001D95AHU" 96 4.8 0 "B001D95AHU" 97 4.4 0 "B001D95AHU" 98 4.5 0 "B001D95AHU" 99 3.333333 0 "B00EE18QX4" 85 3.833333 1 "B00EE18QX4" 86 2.857143 1 "B00EE18QX4" 87 3.3 1 "B00EE18QX4" 88 3.223958 1 "B00EE18QX4" 89 3 1 "B00EE18QX4" 90 2.95614 1 "B00EE18QX4" 91 3.570513 1 "B00EE18QX4" 92 3.302564 1 "B00EE18QX4" 93 3.079603 1 "B00EE18QX4" 94 3.254762 1 "B00EE18QX4" 95 3.138342 1 "B00EE18QX4" 96 2.6526015 1 "B00EE18QX4" 97 2.2407408 1 "B00EE18QX4" 98 2.441746 1 "B00EE18QX4" 99 2.1 1 "B00EE18QX4" 100 1 1 "B014P3B7TU" 77 5 1 "B014P3B7TU" 78 4.642857 1 "B014P3B7TU" 85 4.4761643 1 "B014P3B7TU" 86 4.509491 1 "B014P3B7TU" 87 4.519261 1 "B014P3B7TU" 88 4.4232097 1 "B014P3B7TU" 89 4.419365 1 "B014P3B7TU" 90 4.4009995 1 "B014P3B7TU" 91 4.3559737 1 "B014P3B7TU" 92 4.322343 1 "B014P3B7TU" 93 4.350097 1 "B014P3B7TU" 94 4.1546817 1 "B014P3B7TU" 95 4.0436482 1 "B014P3B7TU" 96 4.0874333 1 "B014P3B7TU" 97 3.890648 1 "B014P3B7TU" 98 3.4806874 1 "B014P3B7TU" 99 3.468327 1 "B014P3B7TU" 100 3.299692 1 "B01AHFPPK2" 85 5 1 "B01AHFPPK2" 86 5 1 "B01AHFPPK2" 87 5 1 "B01AHFPPK2" 89 5 1 "B01AHFPPK2" 90 4 1 "B01AHFPPK2" 91 5 1 "B01AHFPPK2" 92 3.574074 1 "B01AHFPPK2" 93 3.966667 1 "B01AHFPPK2" 94 4.177778 1 "B01AHFPPK2" 95 3.787879 1 "B01AHFPPK2" 96 4.5 1 "B01AHFPPK2" 97 2.833333 1 "B01AHFPPK2" 98 2.3333333 1 "B01AHFPPK2" 99 3.166667 1 end
0 Response to Identifying specific patterns in longitudinal ratings data
Post a Comment