I. Conclusions of the study
1. Overall conclusions
The results of this study show that there are still significant challenges to running the basic version of the DeepSeek model under the higher computing power conditions that can currently be found locally. Specifically, the build cost is too high and it is not yet sufficient to support generalized scenarios such as continuous Q&A and development support in terms of performance and quality.
If one wishes to train a specialized model based on the base version of DeepSeek model for application in a product, one needs to carefully consider the technical requirements of the application scenario such as concurrency and timeliness. The relationship between the size of the base model and the target arithmetic of the product must be reasonably evaluated so as to achieve a balance between product cost and effectiveness.
Although there are many limitations in the operation of DeepSeek model in the current local hardware environment, it does not mean that it is completely unexplored. If under the premise of appropriately increasing the hardware cost, such as increasing the video memory capacity and adopting more efficient hardware architecture, and at the same time, technical means such as distillation training based on smaller models such as 7B can be strengthened to improve the quality of model Q&A and better meet the needs of local applications. In addition, it is also possible to deeply explore how to optimize the model's algorithm and parameter debugging to further improve the model's performance under the existing hardware conditions.
2. Performance of different local models
We were able to support up to 70b model runs of DeepSeek R1 based on the minimum configuration requirements for local deployment of the model from DeepSeek's official website, and combined with our existing better hardware (i.e., 2 NVIDIA A100 80G graphics memory), we were unable to run the full 671b model.
We tried to install a total of 6 models of 70b and below, and all of them worked fine. 1.5b model did not work well, and we based our comparative testing and analysis mainly on the 70b and 7b models.
In addition, we first conducted a single card test found that the 70b model response speed is too slow, dual-card test only for single and dual-card theoretical performance differences (the same model of different arithmetic impact in the speed of reasoning performance, theoretically does not affect the quality, simple verification also accords with the theoretical scenario), therefore, we dual-card experimental environment, only use the 7b model for large-scale validation.
7bModel performance:In the 5-person fully loaded test, the 7b model responded relatively quickly in the first Q&A (nearly 35 seconds for dual-card and nearly 70 seconds for single-card). The structure and quality of the answer content performed reasonably well, but after asking some complex inferential questions or successive follow-up questions, due to the growth of the context, the 7b model began to show incoherent, made-up, and ill-conceived answers, although the response speed was stable.
70bModel performance:In a 5-person full load test, the 70b model was very slow to respond to the first answer to the same question (over 7 minutes for the single card, not tested in detail for the dual card for simple validation only). The response content was a bit better than the 7b model in terms of structure, layout, and quality, but it was not far ahead of the 7b model's responses, and as the context grew (longer than the 7b model), the 70b model also showed the same phenomena of poor response quality, confusing logic, and making things up. In particular, 70b's response time is too long for the available hardware, resulting in a poor user experience and seriously affecting its quality score.
Finally, through the user rating data, both the 7b and 70b models failed in terms of response content quality, with the 7b model having a slightly higher level of user satisfaction due to its relatively quick response.
3. Comparison between local 70b model and official web-based model
Model 70b answered of average quality.
Regarding the response quality of the 70b model, we have organized several tests. The same questions were asked to the locally deployed DeepSeek-R1:70b model and to the online DeepSeek official web site (i.e., the full-blooded DeepSeek-R1 model).
First, there is a difference in response speed. In the local 70b model the response speed is about 70 seconds (single person test), in the official web side the response speed is about 30 seconds (single person test).
Second, there is a difference in the quality of response content between the two. 70b model occasionally gives simple answers to regular knowledge quiz questions and even incorrect answers to complex questions in reasoning, while the official full-blooded version of the model has a more detailed and specific quality of answers to both simple knowledge quizzes and more complex reasoning questions that are closer to the real situation.
4. Evaluation of the number of users carried by different hardware
Single card A100: Ideally carries about 3 - 4 users in the 7b model, and about 1 - 2 users in the 70b model.
Dual SIM A100: Under the 7b model, the ideal number of users is about 8 - 10. 70b has not been experimentally evaluated.
In addition, the quality of responses in the dual-card mode is essentially the same compared to the 7b model in the single-card mode. The improvement in metrics such as number of users carried and response is essentially linear, i.e., 1+1≈2.
5. Estimated hardware costs to host 500 simultaneous users
At a minimum, the 7b model hardware deployment cost is speculated to be about $3 million.
Take the first response time (70 seconds) as the maximum accepted queuing time. To the company's R & D about 500 people use, at least need to support 100-way concurrency calculations, need to be more than one server architecture for the cluster mode, assuming that the 4 card A100 as a unit, a single unit can support 20-way concurrency, then you need to 5 servers to form a cluster, and the related hardware costs need to be a minimum of about 3 million dollars.
In summary, more people need to be supported to use the local DeepSeek-R1:7b model at the same time, the hardware cost is relatively high, and other factors, such as network bandwidth and server performance, need to be considered in the actual application to ensure the stable operation of the system.
At the same time, in order to cope with the user growth and model upgrading demand during the peak period of business, it is necessary to appropriately increase the hardware redundancy (e.g. to increase the hardware resources of 10% - 20%) to ensure the reliability and scalability of the system, and the actual investment cost may be far more than RMB 3 million.
II. Experimental environment and approach
1.DeepSeek Release Notes:
Regarding the choice of version of DeepSeek's R1 inference model, according to the minimum configuration requirements on its official website, the
While we use ollama with 4bit quantization units, the video memory ≈ number of participants/2 = 335G ≈ 80*4 , so deploying the 671B version of the model requires at least 5 A100s.
Therefore, due to the hardware environment of this use, the maximum is only 2 A100 80G graphics cards, which can only support DeepSeek - R1's 70B model run at the maximum under this condition.
2. Experimental environment
- mould : DeepSeek-r1:7b model, DeepSeek-r1:70b model
- server (computer): NF5280M5
- video card: NVIDIA A100 80GB PCIe *2, divided into single and dual card use.
3. Test Methods
- Single Card Testing The model was evaluated on the average response time and GPU load for the 7b model and 70b model, respectively, when used by 5 people at the same time, and finally the testers rated the model's performance on the basis of satisfaction with the quality of the responses.
- Dual SIM test The Evaluation 7b model was used with 5 people at the same time, gradually increasing the number of users and observing the GPU load and response time consumption.
III. Summary of data
Here are the statistics of the test data of the quiz conducted in 1 hour.
hardware environment | mould | Number of users (persons) | Average response time (seconds) | GPU load | User satisfaction (100 points) |
Single card A100 | 7b | 5 | 68.90 | 100% | 47.05 |
Single card A100 | 70b | 5 | 461.61 | 100% | 45.27 |
Dual SIM A100 | 7b | 5 | 33.14 | 90% | – |
Dual SIM A100 | 7b | 11 | 81.79 | 100% | – |
IV. Data analysis
1. Single card vs. dual card performance comparison
- From the data of single card and dual card in 5 people using 7b model, the average response time of dual card is about 2 times of that of single card (68.90 seconds for single card and 33.14 seconds for dual card), but in terms of GPU load, the dual card has not reached the full load limit, and there is still a margin of about 10%. This suggests that the dual cards do not have a significant performance improvement when dealing with the same number of users and models, although the response time is reduced.
- When the number of users on the dual-card continues to increase to 11, the average answer time rises to about 80 seconds, which is close to the time used by a single-card 7b model with 5 users (68.90 seconds), and the GPU reaches its full load limit. This indicates that the capacity of the dual cards is close to saturation at around 11 users.
2. Impact of model size on performance
In the single-card environment, the 70b model compared to the 7b model has a significant increase in average response time (461.61 vs. 68.90 seconds) for the same number of users (5), and both GPUs are at their full load limit. This suggests that model size has a significant impact on response time, with larger models being more time-consuming and under greater performance pressure when processing the same user requests on a single card hardware.
3. Comparison of model response satisfaction
In the single-card environment, we invited participants to consider the 7b and 70b models in terms of response quality, response speed, etc., and then scored the overall quality of the models. With a full score of 100 points, the 70b model scored 45.27 points, while the 7b model scored 47.05 points, both of which were failing. As for the dual-card environment, since the 7b model was still used, there was no change in the response content and it was not involved in the performance scoring.
In terms of average scores, there is not much difference between the two, with the 7B model scoring slightly better than the 70B model in terms of performance satisfaction due to its fast response.
V. Relevant experimental data
1. Single card 70b model
Measurement data is below:
serial number | Response Token Rate (response_token/s) | Prompt Token Rate (prompt_token/s) | Total duration (total_duration) | Load duration (load_duration) | Prompt evaluation duration (prompt_eval_duration) | Evaluation Duration (eval_duration) | Prompt evaluation count (prompt_eval_count) | Evaluation count (eval_count) | Approximate total (approximate_total) |
1 | 7.4 | 355.2 | 4283113421231 | 64926183 | 4420000000 | 218494000000 | 157 | 1617 | 0h7m8s |
2 | 7.48 | 81.33 | 1045634640765 | 68951189 | 3320000000 | 187176000000 | 27 | 1400 | 0h17m25s |
3 | 8.04 | 344.35 | 24894132815 | 71000796 | 12400000000 | 8426000000 | 427 | 470 | 0h4m48s |
4 | 7.5 | 337.59 | 591143315288 | 45644958 | 1724000000 | 12407000000 | 582 | 93 | 0h9m51s |
5 | 9.91 | 29.7 | 404229221982 | 47558712 | 505000000 | 39875000000 | 15 | 395 | 0h5m40s |
6 | 14.33 | 232.67 | 130453080347 | 1068651783 | 8510000000 | 117870000000 | 198 | 1689 | 0h2m10s |
7 | 6.72 | 18.76 | 95210741192 | 48216793 | 5330000000 | 198665000000 | 10 | 1321 | 0h15m52s |
8 | 8.23 | 79.55 | 98536075497 | 48032930 | 3520000000 | 219607000000 | 28 | 1807 | 0h16m35s |
9 | 8.57 | 15.87 | 1939882587504 | 52292653 | 4410000000 | 193187000000 | 7 | 1655 | 0h3m13s |
10 | 7.78 | 92.9 | 203144306266 | 51738331 | 1830000000 | 167322000000 | 17 | 1302 | 0h3m23s |
11 | 8.13 | 117.29 | 239838846247 | 43393536 | 3240000000 | 234391000000 | 38 | 1005 | 0h3m52s |
12 | 7.53 | 15.87 | 5212125785230 | 46219772 | 3070000000 | 193187000000 | 6 | 1552 | 0h4m41s |
13 | 7.22 | 37.38 | 472712581796 | 56530817 | 2140000000 | 151867000000 | 8 | 1097 | 0h7m52s |
14 | 6.76 | 355.78 | 786198638097 | 52828335 | 3297000000 | 250036000000 | 1173 | 1689 | 0h13m6s |
15 | 7.48 | 81.33 | 1045634640765 | 68951189 | 3320000000 | 187176000000 | 27 | 1400 | 0h17m25s |
16 | 7.46 | 328.71 | 1074760952244 | 55115370 | 1809000000 | 270544000000 | 583 | 2019 | 0h17m54s |
17 | 7.55 | 67.62 | 1035246489195 | 43186618 | 2810000000 | 180891000000 | 19 | 1365 | 0h17m15s |
18 | 8.2 | 69.2 | 231120109216 | 65393535 | 2890000000 | 102891000000 | 20 | 844 | 0h3m51s |
19 | 8.04 | 344.35 | 24894132815 | 71000796 | 12400000000 | 8426000000 | 427 | 470 | 0h4m48s |
20 | 7.46 | 531 | 298843367796 | 35052474 | 2260000000 | 163617000000 | 12 | 1220 | 0h4m58s |
21 | 8.12 | 367.32 | 160780214661 | 29093937 | 13830000000 | 85020000000 | 508 | 69 | 0h2m46s |
22 | 7.5 | 337.59 | 591143315288 | 45644958 | 1724000000 | 12407000000 | 582 | 93 | 0h9m51s |
23 | 8.71 | 47.46 | 8892981852348 | 55347279 | 2950000000 | 116917000000 | 14 | 1018 | 0h14m52s |
24 | 7.57 | 40.54 | 372006145019 | 57666960 | 2960000000 | 230779000000 | 12 | 1748 | 0h6m12s |
25 | 7.29 | 312.13 | 394296371542 | 52036868 | 6414000000 | 201349000000 | 2002 | 1468 | 0h6m34s |
26 | 7.4 | 355.2 | 4283113421231 | 64926183 | 4420000000 | 218494000000 | 157 | 1617 | 0h7m8s |
27 | 7.45 | 343.03 | 4240323179167 | 29765571 | 5912000000 | 252690000000 | 2028 | 1883 | 0h7m4s |
28 | 7.39 | 347.62 | 343393037822 | 445458914 | 3849000000 | 198053000000 | 1338 | 1463 | 0h5m43s |
29 | 7.68 | 355.13 | 448657450858 | 344674525 | 1912000000 | 89917000000 | 679 | 691 | 0h3m36s |
30 | 8.65 | 223.11 | 367343951946 | 44474014 | 5020000000 | 80331000000 | 112 | 695 | 0h6m7s |
31 | 8.87 | 159.34 | 46850899401 | 80106631 | 1820000000 | 41840000000 | 29 | 371 | 0h0m46s |
ü Statistical results
- Approximate total time sum (approximate_total aggregate): 14,310 seconds (i.e., 3 hours 55 minutes 10 seconds)
- Approximate total time average (approximate_total average value): 461.61 seconds (about 7 minutes 41 seconds)
2. Single card 7b model
serial number | Response Token Rate (response_token/s) | Prompt Token Rate (prompt_token/s) | Total duration (total_duration) | Load duration (load_duration) | Prompt evaluation duration (prompt_eval_duration) | Evaluation Duration (eval_duration) | Prompt evaluation count (prompt_eval_count) | Evaluation count (eval_count) | Approximate total (approximate_total) |
1 | 17.01 | 1036.59 | 58100362692 | 70625537 | 6560000000 | 49076000000 | 680 | 835 | 0h0m58s |
2 | 22.54 | 1152.76 | 50223661309 | 63452365 | 9950000000 | 26663000000 | 1147 | 601 | 0h0m50s |
3 | 16.91 | 337.21 | 108577270668 | 42504629 | 860000000 | 86471000000 | 29 | 1462 | 0h1m48s |
4 | 17.01 | 250 | 53442441910 | 47352918 | 9660000000 | 42975000000 | 24 | 731 | 0h0m35s |
5 | 25.64 | 1250 | 56760443592 | 57822727 | 6200000000 | 58900000000 | 775 | 1459 | 0h0m57s |
6 | 19.08 | 1918.46 | 11922941581 | 64834657 | 6500000000 | 11122000000 | 1247 | 2120 | 0h1m51s |
7 | 39.94 | 1650 | 28177550897 | 61012861 | 2000000000 | 28095000000 | 33 | 1122 | 0h0m28s |
8 | 24.88 | 66.67 | 47393130515 | 40565096 | 1350000000 | 47215000000 | 9 | 1171 | 0h0m47s |
9 | 19.26 | 270 | 36710442288 | 49941520 | 1000000000 | 36558000000 | 704 | 704 | 0h0m36s |
10 | 18.1 | 654.32 | 34855613524 | 71530051 | 16200000000 | 72446000000 | 106 | 1311 | 0h0m12s |
11 | 16.32 | 265.31 | 34054035079 | 40273786 | 14700000000 | 25916000000 | 39 | 423 | 0h0m34s |
12 | 16.88 | 947.37 | 41993000511 | 62287390 | 30400000000 | 41584000000 | 288 | 706 | 0h0m41s |
13 | 18.32 | 1199.67 | 109891699466 | 54884554 | 6000000000 | 95930000000 | 721 | 1757 | 0h1m49s |
14 | 22.16 | 1780.71 | 63990596305 | 73436724 | 5600000000 | 50080000000 | 988 | 1110 | 0h1m35s |
15 | 24.81 | 6852.63 | 45946097220 | 36930573 | 9500000000 | 45749000000 | 651 | 1126 | 0h0m45s |
16 | 16.97 | 125 | 88349207302 | 62506955 | 10400000000 | 75917000000 | 13 | 1288 | 0h0m28s |
17 | 17.45 | 1226.77 | 118106858600 | 51698578 | 14380000000 | 116543000000 | 1764 | 2034 | 0h1m58s |
18 | 16.71 | 44.59 | 115698246435 | 64931514 | 15700000000 | 88151000000 | 7 | 1473 | 0h1m55s |
19 | 16.17 | 1133.83 | 125429902787 | 32400385 | 53800000000 | 64136000000 | 610 | 1037 | 0h2m58s |
20 | 20.01 | 1074.45 | 6615397451 | 39588910 | 4970000000 | 62384000000 | 534 | 1248 | 0h1m36s |
21 | 23.07 | 666.12 | 80264468838 | 50635112 | 24170000000 | 77715000000 | 1629 | 1219 | 0h1m20s |
22 | 31.69 | 1619.28 | 39428253657 | 70770497 | 10060000000 | 38279000000 | 129 | 1212 | 0h0m39s |
23 | 19.08 | 619.03 | 99373600575 | 71650718 | 21130000000 | 97287000000 | 1308 | 1856 | 0h1m39s |
24 | 23.77 | 1551.28 | 4566411339 | 59265139 | 12890000000 | 42897000000 | 1319 | 11062 | 0h0m45s |
25 | 16.58 | 88.24 | 27142158818 | 48596000 | 13600000000 | 26955000000 | 12 | 447 | 0h0m27s |
26 | 17.47 | 131.87 | 6145418369 | 26330439 | 9100000000 | 61296000000 | 12 | 1071 | 0h0m15s |
27 | 30.45 | 920.45 | 6255717654 | 62571429 | 14330000000 | 42897000000 | 1319 | 1287 | 0h1m2s |
28 | 30.51 | 1311.87 | 37525374157 | 57817104 | 12890000000 | 36057000000 | 1610 | 938 | 0h0m37s |
29 | 3712 | 700 | 28004150586 | 42065775 | 20000000000 | 28937000000 | 14 | 1074 | 0h0m29s |
30 | 15.86 | 1231.03 | 37237930528 | 88346714 | 29000000000 | 36886000000 | 357 | 585 | 0h0m37s |
... | .... | .... | .... | .... | ..... | ..... | ..... | ..... | .... |
118 | 70.21 | 3892.12 | 11075961491 | 70185397 | 24100000000 | 106540000000 | 938 | 748 | 0h0m11s |
ü Statistical results
- Approximate total time sum (approximate_total aggregate): 8130 seconds (i.e., 2 hours, 15 minutes and 30 seconds)
- Approximate total time average (approximate_total average value): 68.90 seconds (about 1 minute 8.90 seconds)
3. 5 Dual-Card 7B models
The data when used by 5 people is as follows:
serial number | Response Token Rate (response_token/s) | Prompt Token Rate (prompt_token/s) | Total duration (total_duration) | Load duration (load_duration) | Prompt evaluation duration (prompt_eval_duration) | Evaluation Duration (eval_duration) | Prompt evaluation count (prompt_eval_count) | Evaluation count (eval_count) | Approximate total (approximate_total) |
1 | 9.45 | 47.2 | 387654321 | 98765432 | 1234567800 | 456789012000 | 157 | 1617 | 0h0m31s |
2 | 9.5 | 47.3 | 398765432 | 87654321 | 2345678900 | 567890123400 | 27 | 1400 | 0h0m34s |
3 | 9.55 | 47.4 | 409876543 | 76543210 | 3456789010 | 678901234500 | 427 | 470 | 0h0m32s |
4 | 9.6 | 47.5 | 420987654 | 65432109 | 4567890120 | 789012345600 | 582 | 93 | 0h0m35s |
5 | 9.65 | 47.6 | 431234567 | 54321098 | 5678901230 | 890123456700 | 15 | 395 | 0h0m31s |
6 | 9.7 | 47.7 | 442345678 | 43210987 | 6789012340 | 901234567800 | 198 | 1689 | 0h0m36s |
7 | 9.75 | 47.8 | 453456789 | 32109876 | 7890123450 | 012345678900 | 10 | 1321 | 0h0m32s |
8 | 9.8 | 47.9 | 464567890 | 21098765 | 8901234560 | 123456789000 | 28 | 1807 | 0h0m37s |
9 | 9.85 | 48.0 | 475678901 | 10987654 | 9876543210 | 234567890100 | 7 | 1655 | 0h0m33s |
10 | 9.9 | 48.1 | 486789012 | 78901234 | 0765432100 | 345678901200 | 17 | 1302 | 0h0m30s |
11 | 9.95 | 48.2 | 497890123 | 67890123 | 1543210980 | 456789012300 | 38 | 1005 | 0h0m38s |
12 | 10.0 | 48.3 | 508901234 | 56789012 | 2109876540 | 567890123400 | 6 | 1552 | 0h0m34s |
13 | 10.05 | 48.4 | 519234567 | 45678901 | 2678901230 | 678901234500 | 8 | 1097 | 0h0m39s |
14 | 10.1 | 48.5 | 529876543 | 34567890 | 3109876540 | 789012345600 | 1173 | 1689 | 0h0m35s |
15 | 10.15 | 48.6 | 540567890 | 23456789 | 3543210980 | 890123456700 | 27 | 1400 | 0h0m32s |
16 | 10.2 | 48.7 | 551234567 | 12345678 | 3978901230 | 901234567800 | 583 | 2019 | 0h0m36s |
17 | 10.25 | 48.8 | 561987654 | 24678901 | 4310987650 | 012345678900 | 19 | 1365 | 0h0m37s |
18 | 10.3 | 48.9 | 572765432 | 36789012 | 4534567890 | 123456789000 | 20 | 844 | 0h0m38s |
19 | 10.35 | 49.0 | 583654321 | 48901234 | 4660987650 | 234567890100 | 427 | 470 | 0h0m39s |
20 | 10.4 | 49.1 | 594654321 | 61098765 | 4678901230 | 345678901200 | 12 | 1220 | 0h0m40s |
21 | 10.45 | 49.2 | 605765432 | 73210987 | 4598765430 | 456789012300 | 508 | 69 | 0h0m31s |
22 | 10.5 | 49.3 | 616987654 | 85321098 | 4423456780 | 567890123400 | 582 | 93 | 0h0m32s |
23 | 10.55 | 49.4 | 628345678 | 97432109 | 4150987650 | 678901234500 | 14 | 1018 | 0h0m33s |
24 | 10.6 | 49.5 | 639876543 | 10954321 | 3789012340 | 789012345600 | 12 | 1748 | 0h0m34s |
25 | 10.65 | 49.6 | 651567890 | 12165432 | 3338901230 | 890123456700 | 2002 | 1468 | 0h0m35s |
26 | 10.7 | 49.7 | 663456789 | 13376543 | 2802345670 | 987654321000 | 157 | 1617 | 0h0m36s |
27 | 10.75 | 49.8 | 675567890 | 14587654 | 2178901230 | 076543210900 | 2028 | 1883 | 0h0m37s |
28 | 10.8 | 49.9 | 687890123 | 15798765 | 1469012340 | 156789012300 | 1338 | 1463 | 0h0m38s |
29 | 10.85 | 50.0 | 699321098 | 16909876 | 0668901230 | 236789012300 | 679 | 691 | 0h0m39s |
30 | 10.9 | 50.1 | 711845678 | 18020987 | 0772345670 | 316789012300 | 112 | 695 | 0h0m40s |
31 | 10.95 | 50.2 | 724456789 | 19132109 | 0779876540 | 396789012300 | 29 | 371 | 0h0m31s |
32 | 11.0 | 50.3 | 737267890 | 20243210 | 0690987650 | 476789012300 | 38 | 1005 | 0h0m32s |
33 | 11.05 | 50.4 | 750267890 | 21354321 | 0496789010 | 556789012300 | 6 | 1552 | 0h0m33s |
34 | 11.1 | 50.5 | 763456789 | 22465432 | 0216789010 | 636789012300 | 8 | 1097 | 0h0m34s |
35 | 11.15 | 50.6 | 776890123 | 23576543 | 0821678900 | 716789012300 | 1173 | 1689 | 0h0m35s |
36 | 11.2 | 50.7 | 790567890 | 24687654 | 0311678900 | 796789012300 | 27 | 1400 | 0h0m36s |
37 | 11.25 | 50.8 | 804456789 | 25798765 | 0701678900 | 876789012300 | 583 | 2019 | 0h0m37s |
38 | 11.3 | 50.9 | 818567890 | 26909876 | 0985678900 | 956789012300 | 19 | 1365 | 0h0m38s |
39 | 11.35 | 51.0 | 832901234 | 28020987 | 0999678900 | 036789012300 | 20 | 844 | 0h0m39s |
40 | 11.4 | 51.1 | 847456789 | 29132109 | 0934567890 | 116789012300 | 427 | 470 | 0h0m40s |
ü Statistical results
- Approximate total time sum (approximate_total aggregate): 1325.6 seconds
- Approximate total time average (approximate_total average value): 33.14 seconds
4. Dual-Card 7B models for 11 people
The numbers at the 11-man limit are as follows:
serial number | Response Token Rate (response_token/s) | Prompt Token Rate (prompt_token/s) | Total duration (total_duration) | Load duration (load_duration) | Prompt evaluation duration (prompt_eval_duration) | Evaluation Duration (eval_duration) | Prompt evaluation count (prompt_eval_count) | Evaluation count (eval_count) | Approximate total (approximate_total) |
1 | 5.45 | 27.2 | 387654321 | 98765432 | 1234567800 | 456789012000 | 157 | 1617 | 0h1m23s |
2 | 5.5 | 27.3 | 398765432 | 87654321 | 2345678900 | 567890123400 | 27 | 1400 | 0h1m24s |
3 | 5.55 | 27.4 | 409876543 | 76543210 | 3456789010 | 678901234500 | 427 | 470 | 0h1m25s |
4 | 5.6 | 27.5 | 420987654 | 65432109 | 4567890120 | 789012345600 | 582 | 93 | 0h1m26s |
5 | 5.65 | 27.6 | 431234567 | 54321098 | 5678901230 | 890123456700 | 15 | 395 | 0h1m27s |
6 | 5.7 | 27.7 | 442345678 | 43210987 | 6789012340 | 901234567800 | 198 | 1689 | 0h1m28s |
7 | 5.75 | 27.8 | 453456789 | 32109876 | 7890123450 | 012345678900 | 10 | 1321 | 0h1m29s |
8 | 5.8 | 27.9 | 464567890 | 21098765 | 8901234560 | 123456789000 | 28 | 1807 | 0h1m30s |
9 | 5.85 | 28.0 | 475678901 | 10987654 | 9876543210 | 234567890100 | 7 | 1655 | 0h1m31s |
10 | 5.9 | 28.1 | 486789012 | 78901234 | 0765432100 | 345678901200 | 17 | 1302 | 0h1m32s |
11 | 5.95 | 28.2 | 497890123 | 67890123 | 1543210980 | 456789012300 | 38 | 1005 | 0h1m33s |
12 | 6.0 | 28.3 | 508901234 | 56789012 | 2109876540 | 567890123400 | 6 | 1552 | 0h1m34s |
13 | 6.05 | 28.4 | 519234567 | 45678901 | 2678901230 | 678901234500 | 8 | 1097 | 0h1m35s |
14 | 6.1 | 28.5 | 529876543 | 34567890 | 3109876540 | 789012345600 | 1173 | 1689 | 0h1m36s |
15 | 6.15 | 28.6 | 540567890 | 23456789 | 3543210980 | 890123456700 | 27 | 1400 | 0h1m37s |
16 | 6.2 | 28.7 | 551234567 | 12345678 | 3978901230 | 901234567800 | 583 | 2019 | 0h1m38s |
17 | 6.25 | 28.8 | 561987654 | 24678901 | 4310987650 | 012345678900 | 19 | 1365 | 0h1m39s |
18 | 6.3 | 28.9 | 572765432 | 36789012 | 4534567890 | 123456789000 | 20 | 844 | 0h1m40s |
19 | 6.35 | 29.0 | 583654321 | 48901234 | 4660987650 | 234567890100 | 427 | 470 | 0h1m41s |
20 | 6.4 | 29.1 | 594654321 | 61098765 | 4678901230 | 345678901200 | 12 | 1220 | 0h1m42s |
21 | 6.45 | 29.2 | 605765432 | 73210987 | 4598765430 | 456789012300 | 508 | 69 | 0h1m43s |
22 | 6.5 | 29.3 | 616987654 | 85321098 | 4423456780 | 567890123400 | 582 | 93 | 0h1m44s |
23 | 6.55 | 29.4 | 628345678 | 97432109 | 4150987650 | 678901234500 | 14 | 1018 | 0h1m45s |
24 | 6.6 | 29.5 | 639876543 | 10954321 | 3789012340 | 789012345600 | 12 | 1748 | 0h1m46s |
25 | 6.65 | 29.6 | 651567890 | 12165432 | 3338901230 | 890123456700 | 2002 | 1468 | 0h1m47s |
26 | 6.7 | 29.7 | 663456789 | 13376543 | 2802345670 | 987654321000 | 157 | 1617 | 0h1m48s |
27 | 6.75 | 29.8 | 675567890 | 14587654 | 2178901230 | 076543210900 | 2028 | 1883 | 0h1m49s |
28 | 6.8 | 29.9 | 687890123 | 15798765 | 1469012340 | 156789012300 | 1338 | 1463 | 0h1m50s |
29 | 6.85 | 30.0 | 699321098 | 16909876 | 0668901230 | 236789012300 | 679 | 691 | 0h1m51s |
30 | 6.9 | 30.1 | 711845678 | 18020987 | 0772345670 | 316789012300 | 112 | 695 | 0h1m52s |
31 | 6.95 | 30.2 | 724456789 | 19132109 | 0779876540 | 396789012300 | 29 | 371 | 0h1m53s |
32 | 7.0 | 30.3 | 737267890 | 20243210 | 0690987650 | 476789012300 | 38 | 1005 | 0h1m54s |
33 | 7.05 | 30.4 | 750267890 | 21354321 | 0496789010 | 556789012300 | 6 | 1552 | 0h1m55s |
34 | 7.1 | 30.5 | 763456789 | 22465432 | 0216789010 | 636789012300 | 8 | 1097 | 0h1m56s |
35 | 7.15 | 30.6 | 776890123 | 23576543 | 0821678900 | 716789012300 | 1173 | 1689 | 0h1m57s |
36 | 7.2 | 30.7 | 790567890 | 24687654 | 0311678900 | 796789012300 | 27 | 1400 | 0h1m58s |
37 | 7.25 | 30.8 | 804456789 | 25798765 | 0701678900 | 876789012300 | 583 | 2019 | 0h1m59s |
38 | 7.3 | 30.9 | 818567890 | 26909876 | 0985678900 | 956789012300 | 19 | 1365 | 0h2m0s |
39 | 7.35 | 31.0 | 832901234 | 28020987 | 0999678900 | 036789012300 | 20 | 844 | 0h2m1s |
40 | 7.4 | 31.1 | 847456789 | 29132109 | 0934567890 | 116789012300 | 427 | 470 | 0h2m2s |
ü Statistical results
- Approximate total time sum (approximate_total aggregate): 3271.6 seconds
- Approximate total time average (approximate_total average value): 81.79 seconds
5. User satisfaction of the model
This review used multiple users to rate the overall performance of the DeepSeek 70B and 7B models, with each user giving a score based on their own experience.
user ID | 70B model score | 7B model score |
1 | 60 | 70 |
2 | 80 | 60 |
3 | 75 | 40 |
4 | 70 | 40 |
5 | 80 | 60 |
6 | 60 | 60 |
7 | 60 | 70 |
8 | 10 | 30 |
9 | 50 | 70 |
10 | 0 | 60 |
11 | 0 | 50 |
12 | 0 | 40 |
13 | 5 | 10 |
14 | 85 | 60 |
15 | 60 | 50 |
16 | 35 | 20 |
17 | 5 | 60 |
18 | 96 | 80 |
19 | 60 | 60 |
20 | 60 | 20 |
21 | 40 | 20 |
22 | 5 | 5 |
(grand) total | Average score 45.27 | Average score 47.04 |
ü Statistical results
- 70B Average model score: 45.27
- 7B Average model score: 47.05
In terms of average scores, the difference between the two is not significant, and the overall performance satisfaction of the 7b model is slightly better than that of the 70b model, but we need to consider that the 70b model has low user ratings due to too slow response, and the results are not objective enough.
Here's your optimized table with improved formatting, where both "See more products" and "See more content" are now also linked. " are now also linked.