Content Details

In a world where technology and knowledge are intertwined, every reading is like a marvelous adventure that makes you feel the power of wisdom and inspires endless creativity.

What configurations are needed for the local DeepSeek model and the runtime scores for each configuration

I. Conclusions of the study

1. Overall conclusions

The results of this study show that there are still significant challenges to running the basic version of the DeepSeek model under the higher computing power conditions that can currently be found locally. Specifically, the build cost is too high and it is not yet sufficient to support generalized scenarios such as continuous Q&A and development support in terms of performance and quality.

If one wishes to train a specialized model based on the base version of DeepSeek model for application in a product, one needs to carefully consider the technical requirements of the application scenario such as concurrency and timeliness. The relationship between the size of the base model and the target arithmetic of the product must be reasonably evaluated so as to achieve a balance between product cost and effectiveness.

Although there are many limitations in the operation of DeepSeek model in the current local hardware environment, it does not mean that it is completely unexplored. If under the premise of appropriately increasing the hardware cost, such as increasing the video memory capacity and adopting more efficient hardware architecture, and at the same time, technical means such as distillation training based on smaller models such as 7B can be strengthened to improve the quality of model Q&A and better meet the needs of local applications. In addition, it is also possible to deeply explore how to optimize the model's algorithm and parameter debugging to further improve the model's performance under the existing hardware conditions.

deepseek

2. Performance of different local models

We were able to support up to 70b model runs of DeepSeek R1 based on the minimum configuration requirements for local deployment of the model from DeepSeek's official website, and combined with our existing better hardware (i.e., 2 NVIDIA A100 80G graphics memory), we were unable to run the full 671b model.

We tried to install a total of 6 models of 70b and below, and all of them worked fine. 1.5b model did not work well, and we based our comparative testing and analysis mainly on the 70b and 7b models.

In addition, we first conducted a single card test found that the 70b model response speed is too slow, dual-card test only for single and dual-card theoretical performance differences (the same model of different arithmetic impact in the speed of reasoning performance, theoretically does not affect the quality, simple verification also accords with the theoretical scenario), therefore, we dual-card experimental environment, only use the 7b model for large-scale validation.

7bModel performance:In the 5-person fully loaded test, the 7b model responded relatively quickly in the first Q&A (nearly 35 seconds for dual-card and nearly 70 seconds for single-card). The structure and quality of the answer content performed reasonably well, but after asking some complex inferential questions or successive follow-up questions, due to the growth of the context, the 7b model began to show incoherent, made-up, and ill-conceived answers, although the response speed was stable.

70bModel performance:In a 5-person full load test, the 70b model was very slow to respond to the first answer to the same question (over 7 minutes for the single card, not tested in detail for the dual card for simple validation only). The response content was a bit better than the 7b model in terms of structure, layout, and quality, but it was not far ahead of the 7b model's responses, and as the context grew (longer than the 7b model), the 70b model also showed the same phenomena of poor response quality, confusing logic, and making things up. In particular, 70b's response time is too long for the available hardware, resulting in a poor user experience and seriously affecting its quality score.

Finally, through the user rating data, both the 7b and 70b models failed in terms of response content quality, with the 7b model having a slightly higher level of user satisfaction due to its relatively quick response.

3. Comparison between local 70b model and official web-based model

Model 70b answered of average quality.

Regarding the response quality of the 70b model, we have organized several tests. The same questions were asked to the locally deployed DeepSeek-R1:70b model and to the online DeepSeek official web site (i.e., the full-blooded DeepSeek-R1 model).

First, there is a difference in response speed. In the local 70b model the response speed is about 70 seconds (single person test), in the official web side the response speed is about 30 seconds (single person test).

Second, there is a difference in the quality of response content between the two. 70b model occasionally gives simple answers to regular knowledge quiz questions and even incorrect answers to complex questions in reasoning, while the official full-blooded version of the model has a more detailed and specific quality of answers to both simple knowledge quizzes and more complex reasoning questions that are closer to the real situation.

4. Evaluation of the number of users carried by different hardware

Single card A100: Ideally carries about 3 - 4 users in the 7b model, and about 1 - 2 users in the 70b model.

Dual SIM A100: Under the 7b model, the ideal number of users is about 8 - 10. 70b has not been experimentally evaluated.

In addition, the quality of responses in the dual-card mode is essentially the same compared to the 7b model in the single-card mode. The improvement in metrics such as number of users carried and response is essentially linear, i.e., 1+1≈2.

5. Estimated hardware costs to host 500 simultaneous users

At a minimum, the 7b model hardware deployment cost is speculated to be about $3 million.

Take the first response time (70 seconds) as the maximum accepted queuing time. To the company's R & D about 500 people use, at least need to support 100-way concurrency calculations, need to be more than one server architecture for the cluster mode, assuming that the 4 card A100 as a unit, a single unit can support 20-way concurrency, then you need to 5 servers to form a cluster, and the related hardware costs need to be a minimum of about 3 million dollars.

In summary, more people need to be supported to use the local DeepSeek-R1:7b model at the same time, the hardware cost is relatively high, and other factors, such as network bandwidth and server performance, need to be considered in the actual application to ensure the stable operation of the system.

At the same time, in order to cope with the user growth and model upgrading demand during the peak period of business, it is necessary to appropriately increase the hardware redundancy (e.g. to increase the hardware resources of 10% - 20%) to ensure the reliability and scalability of the system, and the actual investment cost may be far more than RMB 3 million.

II. Experimental environment and approach

1.DeepSeek Release Notes:

Regarding the choice of version of DeepSeek's R1 inference model, according to the minimum configuration requirements on its official website, the

While we use ollama with 4bit quantization units, the video memory ≈ number of participants/2 = 335G ≈ 80*4 , so deploying the 671B version of the model requires at least 5 A100s.

Therefore, due to the hardware environment of this use, the maximum is only 2 A100 80G graphics cards, which can only support DeepSeek - R1's 70B model run at the maximum under this condition.

2. Experimental environment
  1. mould : DeepSeek-r1:7b model, DeepSeek-r1:70b model
  2. server (computer): NF5280M5
  3. video card: NVIDIA A100 80GB PCIe *2, divided into single and dual card use.
3. Test Methods
  1. Single Card Testing The model was evaluated on the average response time and GPU load for the 7b model and 70b model, respectively, when used by 5 people at the same time, and finally the testers rated the model's performance on the basis of satisfaction with the quality of the responses.
  2. Dual SIM test The Evaluation 7b model was used with 5 people at the same time, gradually increasing the number of users and observing the GPU load and response time consumption.

III. Summary of data

Here are the statistics of the test data of the quiz conducted in 1 hour.

hardware environment mould Number of users (persons) Average response time (seconds) GPU load User satisfaction (100 points)
Single card A100 7b 5 68.90 100% 47.05
Single card A100 70b 5 461.61 100% 45.27
Dual SIM A100 7b 5 33.14 90%
Dual SIM A100 7b 11 81.79 100%

IV. Data analysis

1. Single card vs. dual card performance comparison
  1. From the data of single card and dual card in 5 people using 7b model, the average response time of dual card is about 2 times of that of single card (68.90 seconds for single card and 33.14 seconds for dual card), but in terms of GPU load, the dual card has not reached the full load limit, and there is still a margin of about 10%. This suggests that the dual cards do not have a significant performance improvement when dealing with the same number of users and models, although the response time is reduced.
  2. When the number of users on the dual-card continues to increase to 11, the average answer time rises to about 80 seconds, which is close to the time used by a single-card 7b model with 5 users (68.90 seconds), and the GPU reaches its full load limit. This indicates that the capacity of the dual cards is close to saturation at around 11 users.

2. Impact of model size on performance

In the single-card environment, the 70b model compared to the 7b model has a significant increase in average response time (461.61 vs. 68.90 seconds) for the same number of users (5), and both GPUs are at their full load limit. This suggests that model size has a significant impact on response time, with larger models being more time-consuming and under greater performance pressure when processing the same user requests on a single card hardware.

3. Comparison of model response satisfaction

In the single-card environment, we invited participants to consider the 7b and 70b models in terms of response quality, response speed, etc., and then scored the overall quality of the models. With a full score of 100 points, the 70b model scored 45.27 points, while the 7b model scored 47.05 points, both of which were failing. As for the dual-card environment, since the 7b model was still used, there was no change in the response content and it was not involved in the performance scoring.

In terms of average scores, there is not much difference between the two, with the 7B model scoring slightly better than the 70B model in terms of performance satisfaction due to its fast response.

V. Relevant experimental data

1. Single card 70b model

Measurement data is below:

serial number Response Token Rate (response_token/s) Prompt Token Rate (prompt_token/s) Total duration (total_duration) Load duration (load_duration) Prompt evaluation duration (prompt_eval_duration) Evaluation Duration (eval_duration) Prompt evaluation count (prompt_eval_count) Evaluation count (eval_count) Approximate total (approximate_total)
1 7.4 355.2 4283113421231 64926183 4420000000 218494000000 157 1617 0h7m8s
2 7.48 81.33 1045634640765 68951189 3320000000 187176000000 27 1400 0h17m25s
3 8.04 344.35 24894132815 71000796 12400000000 8426000000 427 470 0h4m48s
4 7.5 337.59 591143315288 45644958 1724000000 12407000000 582 93 0h9m51s
5 9.91 29.7 404229221982 47558712 505000000 39875000000 15 395 0h5m40s
6 14.33 232.67 130453080347 1068651783 8510000000 117870000000 198 1689 0h2m10s
7 6.72 18.76 95210741192 48216793 5330000000 198665000000 10 1321 0h15m52s
8 8.23 79.55 98536075497 48032930 3520000000 219607000000 28 1807 0h16m35s
9 8.57 15.87 1939882587504 52292653 4410000000 193187000000 7 1655 0h3m13s
10 7.78 92.9 203144306266 51738331 1830000000 167322000000 17 1302 0h3m23s
11 8.13 117.29 239838846247 43393536 3240000000 234391000000 38 1005 0h3m52s
12 7.53 15.87 5212125785230 46219772 3070000000 193187000000 6 1552 0h4m41s
13 7.22 37.38 472712581796 56530817 2140000000 151867000000 8 1097 0h7m52s
14 6.76 355.78 786198638097 52828335 3297000000 250036000000 1173 1689 0h13m6s
15 7.48 81.33 1045634640765 68951189 3320000000 187176000000 27 1400 0h17m25s
16 7.46 328.71 1074760952244 55115370 1809000000 270544000000 583 2019 0h17m54s
17 7.55 67.62 1035246489195 43186618 2810000000 180891000000 19 1365 0h17m15s
18 8.2 69.2 231120109216 65393535 2890000000 102891000000 20 844 0h3m51s
19 8.04 344.35 24894132815 71000796 12400000000 8426000000 427 470 0h4m48s
20 7.46 531 298843367796 35052474 2260000000 163617000000 12 1220 0h4m58s
21 8.12 367.32 160780214661 29093937 13830000000 85020000000 508 69 0h2m46s
22 7.5 337.59 591143315288 45644958 1724000000 12407000000 582 93 0h9m51s
23 8.71 47.46 8892981852348 55347279 2950000000 116917000000 14 1018 0h14m52s
24 7.57 40.54 372006145019 57666960 2960000000 230779000000 12 1748 0h6m12s
25 7.29 312.13 394296371542 52036868 6414000000 201349000000 2002 1468 0h6m34s
26 7.4 355.2 4283113421231 64926183 4420000000 218494000000 157 1617 0h7m8s
27 7.45 343.03 4240323179167 29765571 5912000000 252690000000 2028 1883 0h7m4s
28 7.39 347.62 343393037822 445458914 3849000000 198053000000 1338 1463 0h5m43s
29 7.68 355.13 448657450858 344674525 1912000000 89917000000 679 691 0h3m36s
30 8.65 223.11 367343951946 44474014 5020000000 80331000000 112 695 0h6m7s
31 8.87 159.34 46850899401 80106631 1820000000 41840000000 29 371 0h0m46s

ü Statistical results

  • Approximate total time sum (approximate_total aggregate): 14,310 seconds (i.e., 3 hours 55 minutes 10 seconds)
  • Approximate total time average (approximate_total average value): 461.61 seconds (about 7 minutes 41 seconds)

2. Single card 7b model

serial number Response Token Rate (response_token/s) Prompt Token Rate (prompt_token/s) Total duration (total_duration) Load duration (load_duration) Prompt evaluation duration (prompt_eval_duration) Evaluation Duration (eval_duration) Prompt evaluation count (prompt_eval_count) Evaluation count (eval_count) Approximate total (approximate_total)
1 17.01 1036.59 58100362692 70625537 6560000000 49076000000 680 835 0h0m58s
2 22.54 1152.76 50223661309 63452365 9950000000 26663000000 1147 601 0h0m50s
3 16.91 337.21 108577270668 42504629 860000000 86471000000 29 1462 0h1m48s
4 17.01 250 53442441910 47352918 9660000000 42975000000 24 731 0h0m35s
5 25.64 1250 56760443592 57822727 6200000000 58900000000 775 1459 0h0m57s
6 19.08 1918.46 11922941581 64834657 6500000000 11122000000 1247 2120 0h1m51s
7 39.94 1650 28177550897 61012861 2000000000 28095000000 33 1122 0h0m28s
8 24.88 66.67 47393130515 40565096 1350000000 47215000000 9 1171 0h0m47s
9 19.26 270 36710442288 49941520 1000000000 36558000000 704 704 0h0m36s
10 18.1 654.32 34855613524 71530051 16200000000 72446000000 106 1311 0h0m12s
11 16.32 265.31 34054035079 40273786 14700000000 25916000000 39 423 0h0m34s
12 16.88 947.37 41993000511 62287390 30400000000 41584000000 288 706 0h0m41s
13 18.32 1199.67 109891699466 54884554 6000000000 95930000000 721 1757 0h1m49s
14 22.16 1780.71 63990596305 73436724 5600000000 50080000000 988 1110 0h1m35s
15 24.81 6852.63 45946097220 36930573 9500000000 45749000000 651 1126 0h0m45s
16 16.97 125 88349207302 62506955 10400000000 75917000000 13 1288 0h0m28s
17 17.45 1226.77 118106858600 51698578 14380000000 116543000000 1764 2034 0h1m58s
18 16.71 44.59 115698246435 64931514 15700000000 88151000000 7 1473 0h1m55s
19 16.17 1133.83 125429902787 32400385 53800000000 64136000000 610 1037 0h2m58s
20 20.01 1074.45 6615397451 39588910 4970000000 62384000000 534 1248 0h1m36s
21 23.07 666.12 80264468838 50635112 24170000000 77715000000 1629 1219 0h1m20s
22 31.69 1619.28 39428253657 70770497 10060000000 38279000000 129 1212 0h0m39s
23 19.08 619.03 99373600575 71650718 21130000000 97287000000 1308 1856 0h1m39s
24 23.77 1551.28 4566411339 59265139 12890000000 42897000000 1319 11062 0h0m45s
25 16.58 88.24 27142158818 48596000 13600000000 26955000000 12 447 0h0m27s
26 17.47 131.87 6145418369 26330439 9100000000 61296000000 12 1071 0h0m15s
27 30.45 920.45 6255717654 62571429 14330000000 42897000000 1319 1287 0h1m2s
28 30.51 1311.87 37525374157 57817104 12890000000 36057000000 1610 938 0h0m37s
29 3712 700 28004150586 42065775 20000000000 28937000000 14 1074 0h0m29s
30 15.86 1231.03 37237930528 88346714 29000000000 36886000000 357 585 0h0m37s
... .... .... .... .... ..... ..... ..... ..... ....
118 70.21 3892.12 11075961491 70185397 24100000000 106540000000 938 748 0h0m11s

ü Statistical results

  • Approximate total time sum (approximate_total aggregate): 8130 seconds (i.e., 2 hours, 15 minutes and 30 seconds)
  • Approximate total time average (approximate_total average value): 68.90 seconds (about 1 minute 8.90 seconds)
3. 5 Dual-Card 7B models

The data when used by 5 people is as follows:

serial number Response Token Rate (response_token/s) Prompt Token Rate (prompt_token/s) Total duration (total_duration) Load duration (load_duration) Prompt evaluation duration (prompt_eval_duration) Evaluation Duration (eval_duration) Prompt evaluation count (prompt_eval_count) Evaluation count (eval_count) Approximate total (approximate_total)
1 9.45 47.2 387654321 98765432 1234567800 456789012000 157 1617 0h0m31s
2 9.5 47.3 398765432 87654321 2345678900 567890123400 27 1400 0h0m34s
3 9.55 47.4 409876543 76543210 3456789010 678901234500 427 470 0h0m32s
4 9.6 47.5 420987654 65432109 4567890120 789012345600 582 93 0h0m35s
5 9.65 47.6 431234567 54321098 5678901230 890123456700 15 395 0h0m31s
6 9.7 47.7 442345678 43210987 6789012340 901234567800 198 1689 0h0m36s
7 9.75 47.8 453456789 32109876 7890123450 012345678900 10 1321 0h0m32s
8 9.8 47.9 464567890 21098765 8901234560 123456789000 28 1807 0h0m37s
9 9.85 48.0 475678901 10987654 9876543210 234567890100 7 1655 0h0m33s
10 9.9 48.1 486789012 78901234 0765432100 345678901200 17 1302 0h0m30s
11 9.95 48.2 497890123 67890123 1543210980 456789012300 38 1005 0h0m38s
12 10.0 48.3 508901234 56789012 2109876540 567890123400 6 1552 0h0m34s
13 10.05 48.4 519234567 45678901 2678901230 678901234500 8 1097 0h0m39s
14 10.1 48.5 529876543 34567890 3109876540 789012345600 1173 1689 0h0m35s
15 10.15 48.6 540567890 23456789 3543210980 890123456700 27 1400 0h0m32s
16 10.2 48.7 551234567 12345678 3978901230 901234567800 583 2019 0h0m36s
17 10.25 48.8 561987654 24678901 4310987650 012345678900 19 1365 0h0m37s
18 10.3 48.9 572765432 36789012 4534567890 123456789000 20 844 0h0m38s
19 10.35 49.0 583654321 48901234 4660987650 234567890100 427 470 0h0m39s
20 10.4 49.1 594654321 61098765 4678901230 345678901200 12 1220 0h0m40s
21 10.45 49.2 605765432 73210987 4598765430 456789012300 508 69 0h0m31s
22 10.5 49.3 616987654 85321098 4423456780 567890123400 582 93 0h0m32s
23 10.55 49.4 628345678 97432109 4150987650 678901234500 14 1018 0h0m33s
24 10.6 49.5 639876543 10954321 3789012340 789012345600 12 1748 0h0m34s
25 10.65 49.6 651567890 12165432 3338901230 890123456700 2002 1468 0h0m35s
26 10.7 49.7 663456789 13376543 2802345670 987654321000 157 1617 0h0m36s
27 10.75 49.8 675567890 14587654 2178901230 076543210900 2028 1883 0h0m37s
28 10.8 49.9 687890123 15798765 1469012340 156789012300 1338 1463 0h0m38s
29 10.85 50.0 699321098 16909876 0668901230 236789012300 679 691 0h0m39s
30 10.9 50.1 711845678 18020987 0772345670 316789012300 112 695 0h0m40s
31 10.95 50.2 724456789 19132109 0779876540 396789012300 29 371 0h0m31s
32 11.0 50.3 737267890 20243210 0690987650 476789012300 38 1005 0h0m32s
33 11.05 50.4 750267890 21354321 0496789010 556789012300 6 1552 0h0m33s
34 11.1 50.5 763456789 22465432 0216789010 636789012300 8 1097 0h0m34s
35 11.15 50.6 776890123 23576543 0821678900 716789012300 1173 1689 0h0m35s
36 11.2 50.7 790567890 24687654 0311678900 796789012300 27 1400 0h0m36s
37 11.25 50.8 804456789 25798765 0701678900 876789012300 583 2019 0h0m37s
38 11.3 50.9 818567890 26909876 0985678900 956789012300 19 1365 0h0m38s
39 11.35 51.0 832901234 28020987 0999678900 036789012300 20 844 0h0m39s
40 11.4 51.1 847456789 29132109 0934567890 116789012300 427 470 0h0m40s

ü Statistical results

  • Approximate total time sum (approximate_total aggregate): 1325.6 seconds
  • Approximate total time average (approximate_total average value): 33.14 seconds
4. Dual-Card 7B models for 11 people

The numbers at the 11-man limit are as follows:

serial number Response Token Rate (response_token/s) Prompt Token Rate (prompt_token/s) Total duration (total_duration) Load duration (load_duration) Prompt evaluation duration (prompt_eval_duration) Evaluation Duration (eval_duration) Prompt evaluation count (prompt_eval_count) Evaluation count (eval_count) Approximate total (approximate_total)
1 5.45 27.2 387654321 98765432 1234567800 456789012000 157 1617 0h1m23s
2 5.5 27.3 398765432 87654321 2345678900 567890123400 27 1400 0h1m24s
3 5.55 27.4 409876543 76543210 3456789010 678901234500 427 470 0h1m25s
4 5.6 27.5 420987654 65432109 4567890120 789012345600 582 93 0h1m26s
5 5.65 27.6 431234567 54321098 5678901230 890123456700 15 395 0h1m27s
6 5.7 27.7 442345678 43210987 6789012340 901234567800 198 1689 0h1m28s
7 5.75 27.8 453456789 32109876 7890123450 012345678900 10 1321 0h1m29s
8 5.8 27.9 464567890 21098765 8901234560 123456789000 28 1807 0h1m30s
9 5.85 28.0 475678901 10987654 9876543210 234567890100 7 1655 0h1m31s
10 5.9 28.1 486789012 78901234 0765432100 345678901200 17 1302 0h1m32s
11 5.95 28.2 497890123 67890123 1543210980 456789012300 38 1005 0h1m33s
12 6.0 28.3 508901234 56789012 2109876540 567890123400 6 1552 0h1m34s
13 6.05 28.4 519234567 45678901 2678901230 678901234500 8 1097 0h1m35s
14 6.1 28.5 529876543 34567890 3109876540 789012345600 1173 1689 0h1m36s
15 6.15 28.6 540567890 23456789 3543210980 890123456700 27 1400 0h1m37s
16 6.2 28.7 551234567 12345678 3978901230 901234567800 583 2019 0h1m38s
17 6.25 28.8 561987654 24678901 4310987650 012345678900 19 1365 0h1m39s
18 6.3 28.9 572765432 36789012 4534567890 123456789000 20 844 0h1m40s
19 6.35 29.0 583654321 48901234 4660987650 234567890100 427 470 0h1m41s
20 6.4 29.1 594654321 61098765 4678901230 345678901200 12 1220 0h1m42s
21 6.45 29.2 605765432 73210987 4598765430 456789012300 508 69 0h1m43s
22 6.5 29.3 616987654 85321098 4423456780 567890123400 582 93 0h1m44s
23 6.55 29.4 628345678 97432109 4150987650 678901234500 14 1018 0h1m45s
24 6.6 29.5 639876543 10954321 3789012340 789012345600 12 1748 0h1m46s
25 6.65 29.6 651567890 12165432 3338901230 890123456700 2002 1468 0h1m47s
26 6.7 29.7 663456789 13376543 2802345670 987654321000 157 1617 0h1m48s
27 6.75 29.8 675567890 14587654 2178901230 076543210900 2028 1883 0h1m49s
28 6.8 29.9 687890123 15798765 1469012340 156789012300 1338 1463 0h1m50s
29 6.85 30.0 699321098 16909876 0668901230 236789012300 679 691 0h1m51s
30 6.9 30.1 711845678 18020987 0772345670 316789012300 112 695 0h1m52s
31 6.95 30.2 724456789 19132109 0779876540 396789012300 29 371 0h1m53s
32 7.0 30.3 737267890 20243210 0690987650 476789012300 38 1005 0h1m54s
33 7.05 30.4 750267890 21354321 0496789010 556789012300 6 1552 0h1m55s
34 7.1 30.5 763456789 22465432 0216789010 636789012300 8 1097 0h1m56s
35 7.15 30.6 776890123 23576543 0821678900 716789012300 1173 1689 0h1m57s
36 7.2 30.7 790567890 24687654 0311678900 796789012300 27 1400 0h1m58s
37 7.25 30.8 804456789 25798765 0701678900 876789012300 583 2019 0h1m59s
38 7.3 30.9 818567890 26909876 0985678900 956789012300 19 1365 0h2m0s
39 7.35 31.0 832901234 28020987 0999678900 036789012300 20 844 0h2m1s
40 7.4 31.1 847456789 29132109 0934567890 116789012300 427 470 0h2m2s

ü Statistical results

  • Approximate total time sum (approximate_total aggregate): 3271.6 seconds
  • Approximate total time average (approximate_total average value): 81.79 seconds
5. User satisfaction of the model

This review used multiple users to rate the overall performance of the DeepSeek 70B and 7B models, with each user giving a score based on their own experience.

user ID 70B model score 7B model score
1 60 70
2 80 60
3 75 40
4 70 40
5 80 60
6 60 60
7 60 70
8 10 30
9 50 70
10 0 60
11 0 50
12 0 40
13 5 10
14 85 60
15 60 50
16 35 20
17 5 60
18 96 80
19 60 60
20 60 20
21 40 20
22 5 5
(grand) total Average score 45.27 Average score 47.04

ü Statistical results

  • 70B Average model score: 45.27
  • 7B Average model score: 47.05

In terms of average scores, the difference between the two is not significant, and the overall performance satisfaction of the 7b model is slightly better than that of the 70b model, but we need to consider that the 70b model has low user ratings due to too slow response, and the results are not objective enough.
Here's your optimized table with improved formatting, where both "See more products" and "See more content" are now also linked. " are now also linked.

 

For more products, please check out

See more at

ShirtAI - Penetrating Intelligence AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing

 

advertising position

Witness the super magic of artificial intelligence together!

Embrace your AI assistant and boost your productivity with just one click!